summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-11-13i2c: isch: Unify the name of the variable to hold an error codeAndy Shevchenko
There are two different names used for the variable that holds an error code. Unify to use one variant in all cases. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-11-13i2c: isch: Use read_poll_timeout()Andy Shevchenko
Simplify the code by using read_poll_timeout(). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-11-13i2c: isch: Utilize temporary variable to hold device pointerAndy Shevchenko
Introduce a temporary variable to hold a device pointer. It can be utilized in the ->probe() and save a bit of LoCs. To make it consistent, rename currently used dev to pdev. While at it, convert the only error message to dev_err_probe(). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-11-13i2c: isch: switch i2c registration to devm functionsAndy Shevchenko
Switch from i2c_add_adapter() to resource managed devm_i2c_add_adapter() for matching rest of driver initialization, and more concise code. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-11-13i2c: isch: Use custom private data structureAndy Shevchenko
Use custom private data structure instead of global variables. With that, remove not anymore true comment. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-11-13i2c: isch: Switch to memory mapped IO accessorsAndy Shevchenko
Convert driver to use memory mapped IO accessors. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-11-13i2c: isch: Use string_choices API instead of ternary operatorAndy Shevchenko
Use modern string_choices API instead of manually determining the output using ternary operator. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-11-13i2c: isch: Pass pointer to struct i2c_adapter downAndy Shevchenko
There are a lot of messaging calls that use global variable of struct i2c_adapter. Instead, to make code better and flexible for further improvements, pass the pointer to the actual adapter used for transfers. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-11-13i2c: cadence: Add atomic transfer support for controller version 1.4Manikanta Guntupalli
Rework the read and write code paths in the driver to support operation in atomic contexts in master mode. This change does not apply to slave mode because there is no way to handle interruptions in that context. Adjust the message timeout to include some extra time. For non-atomic contexts, 500 ms is added to the timeout. For atomic contexts, 2000 ms is added because transfers happen in polled mode, requiring more time to account for the polling overhead. Similar changes have been implemented in other drivers, including: commit 3a5ee18d2a32 ("i2c: imx: implement master_xfer_atomic callback") commit 445094c8a9fb ("i2c: exynos5: add support for atomic transfers") commit ede2299f7101 ("i2c: tegra: Support atomic transfers") commit fe402bd09049 ("i2c: meson: implement the master_xfer_atomic callback") Signed-off-by: Manikanta Guntupalli <manikanta.guntupalli@amd.com> Reviewed-by: Andi Shyti <andi.shyti@kernel.org> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-11-13i2c: cadence: Split cdns_i2c_master_xfer for Atomic ModeManikanta Guntupalli
The cdns_i2c_master_xfer function has been refactored to separate the common code. This change facilitates better support for atomic mode operations by isolating the shared logic. Signed-off-by: Manikanta Guntupalli <manikanta.guntupalli@amd.com> Reviewed-by: Andi Shyti <andi.shyti@kernel.org> Acked-by: Michal Simek <michal.simek@amd.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-11-13i2c: cadence: Relocate cdns_i2c_runtime_suspend and cdns_i2c_runtime_resume ↵Manikanta Guntupalli
to facilitate atomic mode Relocate cdns_i2c_runtime_suspend, cdns_i2c_runtime_resume and cdns_i2c_init functions to avoid prototype statement in atomic mode changes. Signed-off-by: Manikanta Guntupalli <manikanta.guntupalli@amd.com> Reviewed-by: Andi Shyti <andi.shyti@kernel.org> Acked-by: Michal Simek <michal.simek@amd.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-11-13Merge branch 'bpf-range_tree-for-bpf-arena'Andrii Nakryiko
Alexei Starovoitov says: ==================== bpf: range_tree for bpf arena From: Alexei Starovoitov <ast@kernel.org> Introduce range_tree (interval tree plus rbtree) to track unallocated ranges in bpf arena and replace maple_tree with it. This is a step towards making bpf_arena|free_alloc_pages non-sleepable. The previous approach to reuse drm_mm to replace maple_tree reached dead end, since sizeof(struct drm_mm_node) = 168 and sizeof(struct maple_node) = 256 while sizeof(struct range_node) = 64 introduced in this patch. Not only it's smaller, but the algorithm splits and merges adjacent ranges. Ultimate performance doesn't matter. The main objective of range_tree is to work in context where kmalloc/kfree are not safe. It achieves that via bpf_mem_alloc. ==================== Link: https://patch.msgid.link/20241108025616.17625-1-alexei.starovoitov@gmail.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-11-13selftests/bpf: Add a test for arena range tree algorithmAlexei Starovoitov
Add a test that verifies specific behavior of arena range tree algorithm and adjust existing big_alloc1 test due to use of global data in arena. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/bpf/20241108025616.17625-3-alexei.starovoitov@gmail.com
2024-11-13bpf: Introduce range_tree data structure and use it in bpf arenaAlexei Starovoitov
Introduce range_tree data structure and use it in bpf arena to track ranges of allocated pages. range_tree is a large bitmap that is implemented as interval tree plus rbtree. The contiguous sequence of bits represents unallocated pages. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/bpf/20241108025616.17625-2-alexei.starovoitov@gmail.com
2024-11-13Merge tag 'at91-defconfig-6.13' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into soc/defconfig Microchip AT91 defconfig updates for v6.13 It contains: - enable PAC1934 power monitor driver for the Microchip AT91 defconfigs * tag 'at91-defconfig-6.13' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux: ARM: configs: at91: enable PAC1934 driver as module Link: https://lore.kernel.org/r/20241113182050.2176500-1-claudiu.beznea@tuxon.dev Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-11-13Merge tag 'pm-6.12-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Fix a locking issue in the asymmetric CPU capacity setup code in the intel_pstate driver that may lead to a deadlock if CPU online/offline runs in parallel with the code in question, which is unlikely but not impossible (Rafael Wysocki)" * tag 'pm-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: intel_pstate: Rearrange locking in hybrid_init_cpu_capacity_scaling()
2024-11-13Merge tag 'tpmdd-next-6.12-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull tpm fixes from Jarkko Sakkinen: "Two bug fixes for TPM bus encryption (the remaining reported issues in the feature)" * tag 'tpmdd-next-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: tpm: Disable TPM on tpm2_create_primary() failure tpm: Opt-in in disable PCR integrity protection
2024-11-13block/genhd: use seq_put_decimal_ull for diskstats decimal valuesDavid Wang
seq_printf is costly. For each block device, 19 decimal values are yielded in /proc/diskstats via seq_printf. On a system with 16 logical block devices, profiling for open/read/close sequences shows seq_printf took ~75% samples of diskstats_show: diskstats_show(92.626% 2269372/2450040) seq_printf(76.026% 1725313/2269372) vsnprintf(99.163% 1710866/1725313) format_decode(26.597% 455040/1710866) number(19.554% 334542/1710866) memcpy_orig(4.183% 71570/1710866) ... srso_return_thunk(0.009% 148/1725313) part_stat_read_all(8.030% 182236/2269372) One million rounds of open/read/close /proc/diskstats takes: real 0m37.687s user 0m0.264s sys 0m32.911s On average, each sequence tooks ~0.032ms With this patch, most decimal values are yield via seq_put_decimal_ull, performance is significantly improved: real 0m20.792s user 0m0.316s sys 0m20.463s On average, each sequence tooks ~0.020ms, a ~37.5% improvement. Signed-off-by: David Wang <00107082@163.com> Link: https://lore.kernel.org/r/20241108054500.4251-1-00107082@163.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfAlexei Starovoitov
Cross-merge bpf fixes after downstream PR. In particular to bring the fix in commit aa30eb3260b2 ("bpf: Force checkpoint when jmp history is too long"). The follow up verifier work depends on it. And the fix in commit 6801cf7890f2 ("selftests/bpf: Use -4095 as the bad address for bits iterator"). It's fixing instability of BPF CI on s390 arch. No conflicts. Adjacent changes in: Auto-merging arch/Kconfig Auto-merging kernel/bpf/helpers.c Auto-merging kernel/bpf/memalloc.c Auto-merging kernel/bpf/verifier.c Auto-merging mm/slab_common.c Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-11-13samples/bpf: Remove unused variable in xdp2skb_meta_kern.cZhu Jun
The variable is never referenced in the code, just remove it. Signed-off-by: Zhu Jun <zhujun2@cmss.chinamobile.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241111061514.3257-1-zhujun2@cmss.chinamobile.com
2024-11-13samples/bpf: Remove unused variables in tc_l2_redirect_kern.cZhu Jun
These variables are never referenced in the code, just remove them. Signed-off-by: Zhu Jun <zhujun2@cmss.chinamobile.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241111062312.3541-1-zhujun2@cmss.chinamobile.com
2024-11-13bpftool: Cast variable `var` to long longLuo Yifan
When the SIGNED condition is met, the variable `var` should be cast to `long long` instead of `unsigned long long`. Signed-off-by: Luo Yifan <luoyifan@cmss.chinamobile.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Quentin Monnet <qmo@kernel.org> Link: https://lore.kernel.org/bpf/20241112073701.283362-1-luoyifan@cmss.chinamobile.com
2024-11-13Merge tag 'timers-v6.13-rc1' of ↵Thomas Gleixner
https://git.linaro.org/people/daniel.lezcano/linux into timers/core Pull clocksource/event updates from Daniel Lezcano: - Remove unused dw_apb_clockevent_[pause|resume|stop] functions as they are unused since 2021 (David Alan Gilbert) - Make the sp804 driver user selectable as they may be unused on some platforms (Mark Brown) - Don't fail if the ti-dm does not describe an interrupt in the DT as this could be a normal situation if the PWM is used (Judith Mendez) - Always use cluster 0 counter as a clocksource on a multi-cluster system to prevent problems related to the time shifting between clusters if multiple per cluster clocksource is used (Paul Burton) - Move the RaLink system tick counter from the arch directory to the clocksource directory (Sergio Paracuellos) - Convert the owl-timer bindings into yaml schema (Ivaylo Ivanov) - Fix child node refcount handling on the TI DM by relying on the __free annotation to automatically release the refcount on the node (Javier Carrasco) - Remove pointless cast in the GPX driver as PTR_ERR already does that (Tang Bin) - Use of_property_present() for non-boolean properties where it is possible in the different drivers (Rob Herring) Link: https://lore.kernel.org/lkml/8d402321-96f1-47f7-9347-a850350d60de@linaro.org
2024-11-13hwmon: (pmbus/isl68137) add support for voltage divider on VoutGrant Peltier
Some applications require Vout to be higher than the detectable voltage range of the Vsense pin for a given rail. In such applications, a voltage divider may be placed between Vout and the Vsense pin, but this results in erroneous telemetry being read back from the part. This change adds support for a voltage divider to be defined in the devicetree for a (or multiple) specific rail(s) for a supported digital multiphase device and for the applicable Vout telemetry to be scaled based on the voltage divider configuration. This change copies the implementation of the vout-voltage-divider devicetree property defined in the maxim,max20730 bindings schema since it is the best fit for the use case of scaling hwmon PMBus telemetry. The generic voltage-divider property used by many iio drivers was determined to be a poor fit because that schema is tied directly to iio and the isl68137 driver is not an iio driver. Signed-off-by: Grant Peltier <grantpeltier93@gmail.com> Message-ID: <8c2d048f87282bcf66313afbf5e923d8fc17b4d7.1731439797.git.grantpeltier93@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-13dt-bindings: hwmon: isl68137: add bindings to support voltage dividersGrant Peltier
Add devicetree bindings to support declaring optional voltage dividers to the rail outputs of supported digital multiphase regulators. Some applications require Vout to exceed the voltage range that the Vsense pin can detect. This binding definition allows users to define the characteristics of a voltage divider placed between Vout and the Vsense pin for any rail powered by the device. These bindings copy the vout-voltage-divider property defined in the maxim,max20730 bindings schema since it is the best fit for the use case of scaling hwmon PMBus telemetry. The generic voltage-divider property used by many iio drivers was determined to be a poor fit because that schema is tied directly to iio for the purpose of scaling io-channel voltages and the isl68137 driver is not an iio driver. New schema file named isil,isl68137.yaml to align with the corresponding driver name and pre-existing bindings ported from trivial bindings. However, all new device bindings use renesas as the vendor prefix since Renesas acquired Intersil and now maintains all documentation for the devices. Signed-off-by: Grant Peltier <grantpeltier93@gmail.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Message-ID: <f7ac200e982961ff733de27a5c4505c04d68b6f3.1731439797.git.grantpeltier93@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-13hwmon: tmp108: fix I3C dependencyArnd Bergmann
It's possible to build a kernel with tmp108 built-in but i3c support in a loadable module, but that results in a link failure: x86_64-linux-ld: drivers/hwmon/tmp108.o: in function `p3t1085_i3c_probe': tmp108.c:(.text+0x5f9): undefined reference to `i3cdev_to_dev' Add a Kconfig dependency to ensure only the working configurations are allowed. Fixes: c40655e33106 ("hwmon: (tmp108) Add support for I3C device") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Message-ID: <20241113175615.2442851-1-arnd@kernel.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-13KVM: x86: expose MSR_PLATFORM_INFO as a feature MSRPaolo Bonzini
For userspace that wants to disable KVM_X86_QUIRK_STUFF_FEATURE_MSRS, it is useful to know what bits can be set to 1 in MSR_PLATFORM_INFO (apart from the TSC ratio). The right way to do that is via /dev/kvm's feature MSR mechanism. In fact, MSR_PLATFORM_INFO is already a feature MSR for the purpose of blocking updates after the vCPU is run, but KVM_GET_MSRS did not return a valid value for it. Just like in a VM that leaves KVM_X86_QUIRK_STUFF_FEATURE_MSRS enabled, the TSC ratio field is left to 0. Only bit 31 is set. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-11-13x86: KVM: Advertise CPUIDs for new instructions in Clearwater ForestTao Su
Latest Intel platform Clearwater Forest has introduced new instructions enumerated by CPUIDs of SHA512, SM3, SM4 and AVX-VNNI-INT16. Advertise these CPUIDs to userspace so that guests can query them directly. SHA512, SM3 and SM4 are on an expected-dense CPUID leaf and some other bits on this leaf have kernel usages. Considering they have not truly kernel usages, hide them in /proc/cpuinfo. These new instructions only operate in xmm, ymm registers and have no new VMX controls, so there is no additional host enabling required for guests to use these instructions, i.e. advertising these CPUIDs to userspace is safe. Tested-by: Jiaan Lu <jiaan.lu@intel.com> Tested-by: Xuelian Guo <xuelian.guo@intel.com> Signed-off-by: Tao Su <tao1.su@linux.intel.com> Message-ID: <20241105054825.870939-1-tao1.su@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-11-13drm/xe/oa: Fix "Missing outer runtime PM protection" warningAshutosh Dixit
Fix the following drm_WARN: [953.586396] xe 0000:00:02.0: [drm] Missing outer runtime PM protection ... <4> [953.587090] ? xe_pm_runtime_get_noresume+0x8d/0xa0 [xe] <4> [953.587208] guc_exec_queue_add_msg+0x28/0x130 [xe] <4> [953.587319] guc_exec_queue_fini+0x3a/0x40 [xe] <4> [953.587425] xe_exec_queue_destroy+0xb3/0xf0 [xe] <4> [953.587515] xe_oa_release+0x9c/0xc0 [xe] Suggested-by: John Harrison <john.c.harrison@intel.com> Suggested-by: Matthew Brost <matthew.brost@intel.com> Fixes: e936f885f1e9 ("drm/xe/oa/uapi: Expose OA stream fd") Cc: stable@vger.kernel.org Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241109032003.3093811-1-ashutosh.dixit@intel.com (cherry picked from commit b107c63d2953907908fd0cafb0e543b3c3167b75) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-11-13xen: Fix the issue of resource not being properly released in xenbus_dev_probe()Qiu-ji Chen
This patch fixes an issue in the function xenbus_dev_probe(). In the xenbus_dev_probe() function, within the if (err) branch at line 313, the program incorrectly returns err directly without releasing the resources allocated by err = drv->probe(dev, id). As the return value is non-zero, the upper layers assume the processing logic has failed. However, the probe operation was performed earlier without a corresponding remove operation. Since the probe actually allocates resources, failing to perform the remove operation could lead to problems. To fix this issue, we followed the resource release logic of the xenbus_dev_remove() function by adding a new block fail_remove before the fail_put block. After entering the branch if (err) at line 313, the function will use a goto statement to jump to the fail_remove block, ensuring that the previously acquired resources are correctly released, thus preventing the reference count leak. This bug was identified by an experimental static analysis tool developed by our team. The tool specializes in analyzing reference count operations and detecting potential issues where resources are not properly managed. In this case, the tool flagged the missing release operation as a potential problem, which led to the development of this patch. Fixes: 4bac07c993d0 ("xen: add the Xenbus sysfs and virtual device hotplug driver") Cc: stable@vger.kernel.org Signed-off-by: Qiu-ji Chen <chenqiuji666@gmail.com> Reviewed-by: Juergen Gross <jgross@suse.com> Message-ID: <20241105130919.4621-1-chenqiuji666@gmail.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-11-13tpm: Disable TPM on tpm2_create_primary() failureJarkko Sakkinen
The earlier bug fix misplaced the error-label when dealing with the tpm2_create_primary() return value, which the original completely ignored. Cc: stable@vger.kernel.org Reported-by: Christoph Anton Mitterer <calestyo@scientia.org> Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1087331 Fixes: cc7d8594342a ("tpm: Rollback tpm2_load_null()") Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-11-13tpm: Opt-in in disable PCR integrity protectionJarkko Sakkinen
The initial HMAC session feature added TPM bus encryption and/or integrity protection to various in-kernel TPM operations. This can cause performance bottlenecks with IMA, as it heavily utilizes PCR extend operations. In order to mitigate this performance issue, introduce a kernel command-line parameter to the TPM driver for disabling the integrity protection for PCR extend operations (i.e. TPM2_PCR_Extend). Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Link: https://lore.kernel.org/linux-integrity/20241015193916.59964-1-zohar@linux.ibm.com/ Fixes: 6519fea6fd37 ("tpm: add hmac checks to tpm2_pcr_extend()") Tested-by: Mimi Zohar <zohar@linux.ibm.com> Co-developed-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Co-developed-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-11-13block: don't reorder requests in blk_mq_add_to_batchChristoph Hellwig
LIFO ordering for batched completions is a bit unexpected and also defeats some merging optimizations in e.g. the XFS buffered write code. Now that we can easily add the request to the tail of the list do that. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241113152050.157179-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13block: don't reorder requests in blk_add_rq_to_plugChristoph Hellwig
Add requests to the tail of the list instead of the front so that they are queued up in submission order. Remove the re-reordering in blk_mq_dispatch_plug_list, virtio_queue_rqs and nvme_queue_rqs now that the list is ordered as expected. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241113152050.157179-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13block: add a rq_list typeChristoph Hellwig
Replace the semi-open coded request list helpers with a proper rq_list type that mirrors the bio_list and has head and tail pointers. Besides better type safety this actually allows to insert at the tail of the list, which will be useful soon. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241113152050.157179-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13block: remove rq_list_moveChristoph Hellwig
Unused now. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241113152050.157179-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13virtio_blk: reverse request order in virtio_queue_rqsChristoph Hellwig
blk_mq_flush_plug_list submits requests in the reverse order that they were submitted, which leads to a rather suboptimal I/O pattern especially in rotational devices. Fix this by rewriting virtio_queue_rqs so that it always pops the requests from the passed in request list, and then adds them to the head of a local submit list. This actually simplifies the code a bit as it removes the complicated list splicing, at the cost of extra updates of the rq_next pointer. As that should be cache hot anyway it should be an easy price to pay. Fixes: 0e9911fa768f ("virtio-blk: support mq_ops->queue_rqs()") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241113152050.157179-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13nvme-pci: reverse request order in nvme_queue_rqsChristoph Hellwig
blk_mq_flush_plug_list submits requests in the reverse order that they were submitted, which leads to a rather suboptimal I/O pattern especially in rotational devices. Fix this by rewriting nvme_queue_rqs so that it always pops the requests from the passed in request list, and then adds them to the head of a local submit list. This actually simplifies the code a bit as it removes the complicated list splicing, at the cost of extra updates of the rq_next pointer. As that should be cache hot anyway it should be an easy price to pay. Fixes: d62cbcf62f2f ("nvme: add support for mq_ops->queue_rqs()") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241113152050.157179-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13btrfs: validate queue limitsChristoph Hellwig
Call blk_validate_limits on the queue limits used for zone append splitting so that calculated values get filled in and any stacking conflicts get cought. Without this there isn't a max_zone_append_sectors limits as of commit 559218d43ec9 ("block: pre-calculate max_zone_append_sectors"). Fixes: 559218d43ec9 ("block: pre-calculate max_zone_append_sectors") Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20241113084541.34315-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13block: export blk_validate_limitsChristoph Hellwig
While block drivers do the validation as part of committing them to the queue, users that use the limit outside of a block device context have to validate the limits and fill in the calculated values as well. So far btrfs is the only user of queue limits without a block device, and it has gotten away with that more or less by accident. But with commit 559218d43ec9 ("block: pre-calculate max_zone_append_sectors") this became fatal for setups that have small max zone append size, as it won't be limited now. Export blk_validate_limits so that it can be called directly from btrfs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20241113084541.34315-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-13e1000: Hold RTNL when e1000_down can be calledJoe Damato
e1000_down calls netif_queue_set_napi, which assumes that RTNL is held. There are a few paths for e1000_down to be called in e1000 where RTNL is not currently being held: - e1000_shutdown (pci shutdown) - e1000_suspend (power management) - e1000_reinit_locked (via e1000_reset_task delayed work) - e1000_io_error_detected (via pci error handler) Hold RTNL in three places to fix this issue: - e1000_reset_task: igc, igb, and e100e all hold rtnl in this path. - e1000_io_error_detected (pci error handler): e1000e and ixgbe hold rtnl in this path. A patch has been posted for igc to do the same [1]. - __e1000_shutdown (which is called from both e1000_shutdown and e1000_suspend): igb, ixgbe, and e1000e all hold rtnl in the same path. The other paths which call e1000_down seemingly hold RTNL and are OK: - e1000_close (ndo_stop) - e1000_change_mtu (ndo_change_mtu) Based on the above analysis and mailing list discussion [2], I believe adding rtnl in the three places mentioned above is correct. Fixes: 8f7ff18a5ec7 ("e1000: Link NAPI instances to queues and IRQs") Reported-by: Dmitry Antipov <dmantipov@yandex.ru> Closes: https://lore.kernel.org/netdev/8cf62307-1965-46a0-a411-ff0080090ff9@yandex.ru/ Link: https://lore.kernel.org/netdev/20241022215246.307821-3-jdamato@fastly.com/ [1] Link: https://lore.kernel.org/netdev/ZxgVRX7Ne-lTjwiJ@LQ3V64L9R2/ [2] Signed-off-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13igbvf: remove unused spinlockWander Lairson Costa
tx_queue_lock and stats_lock are declared and initialized, but never used. Remove them. Signed-off-by: Wander Lairson Costa <wander@redhat.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13igb: Fix 2 typos in comments in igb_main.cJohnny Park
Fix 2 spelling mistakes in comments in `igb_main.c`. Signed-off-by: Johnny Park <pjohnny0508@gmail.com> Acked-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13igc: remove autoneg parameter from igc_mac_infoVitaly Lifshits
Since the igc driver doesn't support forced speed configuration and its current related hardware doesn't support it either, there is no use of the mac.autoneg parameter. Moreover, in one case this usage might result in a NULL pointer dereference due to an uninitialized function pointer, phy.ops.force_speed_duplex. Therefore, remove this parameter from the igc code. Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ixgbe: Break include dependency cycleDiomidis Spinellis
Header ixgbe_type.h includes ixgbe_mbx.h. Also, header ixgbe_mbx.h included ixgbe_type.h, thus introducing a circular dependency. - Remove ixgbe_mbx.h inclusion from ixgbe_type.h. - ixgbe_mbx.h requires the definition of struct ixgbe_mbx_operations so move its definition there. While at it, add missing argument identifier names. - Add required forward structure declarations. - Include ixgbe_mbx.h in the .c files that need it, for the following reasons: ixgbe_sriov.c uses ixgbe_check_for_msg ixgbe_main.c uses ixgbe_init_mbx_params_pf ixgbe_82599.c uses mbx_ops_generic ixgbe_x540.c uses mbx_ops_generic ixgbe_x550.c uses mbx_ops_generic Signed-off-by: Diomidis Spinellis <dds@aueb.gr> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: Unbind the workqueueFrederic Weisbecker
The ice workqueue doesn't seem to rely on any CPU locality and should therefore be able to run on any CPU. In practice this is already happening through the unbound ice_service_timer that may fire anywhere and queue the workqueue accordingly to any CPU. Make this official so that the ice workqueue is only ever queued to housekeeping CPUs on nohz_full. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: use stack variable for virtchnl_supported_rxdidsJacob Keller
The ice_vc_query_rxdid() function allocates memory to store the virtchnl_supported_rxdids structure used to communicate the bitmap of supported RXDIDs to a VF. This structure is only 8 bytes in size. The function must hold the allocated length on the stack as well as the pointer to the structure which itself is 8 bytes. Allocating this storage on the heap adds unnecessary overhead including a potential error path that must be handled in case kzalloc fails. Because this structure is so small, we're not saving stack space. Additionally, because we must ensure that we free the allocated memory, the return value from ice_vc_send_msg_to_vf() must also be saved in the stack ret variable. Depending on compiler optimization, this means allocating the 8-byte structure is requiring up to 16-bytes of stack memory! Simplify this function to keep the rxdid variable on the stack, saving memory and removing a potential failure exit path from this function. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: initialize pf->supported_rxdids immediately after loading DDPJacob Keller
The pf->supported_rxdids field is used to populate the list of valid RXDIDs that a VF may use when negotiating VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC. The set of supported RXDIDs is dependent on the DDP, and can be read from the GLXFLXP_RXDID_FLAGS register. The PF needs to send this list to the VF upon receiving the VIRTCHNL_OP_GET_SUPPORTED_RXDIDs. It also needs to use this list to validate the requested descriptor ID from the VF when programming the Rx queues. A future update to support VF live migration will also want to validate that the target VF can support the same descriptor ID when migrating. Currently, pf->supported_rxdids is initialized inside the ice_vc_query_rxdid() function. This means that it is only ever initialized if at least one VF actually tries to negotiate VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC. It is also unnecessarily re-initialized every time the VF loads and requests the descriptor list. This worked before because the PF only checks pf->suppported_rxdids when programming the Rx queue if the VF actually negotiates the VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC feature. This will be problematic for VF live migration. We need the list of supported Rx descriptor IDs when migrating. It is possible that no VF on the target PF has ever actually issued a VIRTCHNL_OP_GET_SUPPORTED_RXDIDs. Refactor the driver to initialize pf->supported_rxdids during driver initialization after the DDP is loaded. This is simpler, avoids unnecessary duplicate work, and avoids issues with the live migration process. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: only allow Tx promiscuous for multicastBrett Creeley
Currently when any VF is trusted and true promiscuous mode is enabled on the PF, the VF will receive all unicast traffic directed to the device's internal switch. This includes traffic external to the NIC and also from other VSI (i.e. VFs). This does not match the expected behavior as unicast traffic should only be visible from external sources in this case. Disable the Tx promiscuous mode bits for unicast promiscuous mode. Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: Add support for persistent NAPI configJoe Damato
Use netif_napi_add_config to assign persistent per-NAPI config when initializing NAPIs. This preserves NAPI config settings when queue counts are adjusted. Tested with an E810-2CQDA2 NIC. Begin by setting the queue count to 4: $ sudo ethtool -L eth4 combined 4 Check the queue settings: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 4}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8452, 'ifindex': 4, 'irq': 2782}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8451, 'ifindex': 4, 'irq': 2781}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8450, 'ifindex': 4, 'irq': 2780}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8449, 'ifindex': 4, 'irq': 2779}] Now, set the queue with NAPI ID 8451 to have a gro-flush-timeout of 1111: $ sudo ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/netdev.yaml \ --do napi-set --json='{"id": 8451, "gro-flush-timeout": 1111}' None Check that worked: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 4}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8452, 'ifindex': 4, 'irq': 2782}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 1111, 'id': 8451, 'ifindex': 4, 'irq': 2781}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8450, 'ifindex': 4, 'irq': 2780}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8449, 'ifindex': 4, 'irq': 2779}] Now reduce the queue count to 2, which would destroy the queue with NAPI ID 8451: $ sudo ethtool -L eth4 combined 2 Check the queue settings, noting that NAPI ID 8451 is gone: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 4}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8450, 'ifindex': 4, 'irq': 2780}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8449, 'ifindex': 4, 'irq': 2779}] Now, increase the number of queues back to 4: $ sudo ethtool -L eth4 combined 4 Dump the settings, expecting to see the same NAPI IDs as above and for NAPI ID 8451 to have its gro-flush-timeout set to 1111: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 4}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8452, 'ifindex': 4, 'irq': 2782}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 1111, 'id': 8451, 'ifindex': 4, 'irq': 2781}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8450, 'ifindex': 4, 'irq': 2780}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8449, 'ifindex': 4, 'irq': 2779}] Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>