linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-11-12	hwmon: (jc42) Drop of_match_ptr() protection	Andy Shevchenko
	This prevents use of this driver with ACPI via PRP0001 and is an example of an anti pattern I'm trying to remove from the kernel. Hence drop from this driver. Also switch of.h for mod_devicetable.h include given use of struct of_device_id which is defined in that header. Reported-by: Konstantin Aladyshev <aladyshev22@gmail.com> Closes: https://lore.kernel.org/r/CACSj6VW7WKv5tiAkLCvSujENJvXq1Mc7_7vtkQsRSz3JGY0i3Q@mail.gmail.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Message-ID: <20241108124348.1392473-1-andriy.shevchenko@linux.intel.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-12	hwmon: (pmbus/core) clear faults after setting smbalert mask	Jerome Brunet
	pmbus_write_smbalert_mask() ignores the errors if the chip can't set smbalert mask the standard way. It is not necessarily a problem for the irq support if the chip is otherwise properly setup but it may leave an uncleared fault behind. pmbus_core will pick the fault on the next register_check(). The register check will fails regardless of the actual register support by the chip. This leads to missing attributes or debugfs entries for chips that should provide them. We cannot rely on register_check() as PMBUS_SMBALERT_MASK may be read-only. Unconditionally clear the page fault after setting PMBUS_SMBALERT_MASK to avoid the problem. Suggested-by: Guenter Roeck <linux@roeck-us.net> Fixes: 221819ca4c36 ("hwmon: (pmbus/core) Add interrupt support") Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Message-ID: <20241105-tps25990-v4-5-0e312ac70b62@baylibre.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-12	hwmon: (pmbus/core) allow drivers to override WRITE_PROTECT	Jerome Brunet
	Use _pmbus_read_byte_data() rather than calling smbus directly to check the write protection status. This give a chance to device implementing write protection differently to report back on the actual write protection status. Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Message-ID: <20241105-tps25990-v4-2-0e312ac70b62@baylibre.com> [groeck: Fix page parameter of _pmbus_read_byte_data()] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-12	block: remove the write_hint field from struct request	Christoph Hellwig
	The write_hint is only used for read/write requests, which must have a bio attached to them. Just use the bio field instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20241112170050.1612998-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-12	Merge tag 'for-6.12/dm-fixes-3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mikulas Patocka: - fix warnings about duplicate slab cache names * tag 'for-6.12/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm-cache: fix warnings about duplicate slab caches dm-bufio: fix warnings about duplicate slab caches
2024-11-12	thermal: testing: Simplify tt_get_tt_zone()	Rafael J. Wysocki
	Notice that tt_get_tt_zone() need not use the ret variable in the tt_thermal_zones list walk and make it return right after incrementing the reference counter of the matching entry. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/6120304.lOV4Wx5bFT@rjwysocki.net
2024-11-12	cpufreq: ACPI: Simplify MSR read on the boot CPU	Chang S. Bae
	Replace the 32-bit MSR access function with a 64-bit variant to simplify the call site, eliminating unnecessary 32-bit value manipulations. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Link: https://patch.msgid.link/20241106182313.165297-1-chang.seok.bae@intel.com [ rjw: Subject edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-11-12	ACPI: Switch back to struct platform_driver::remove()	Uwe Kleine-König
	After commit 0edb555a65d1 ("platform: Make platform_driver::remove() return void") .remove() is (again) the right callback to implement for platform drivers. Convert all platform drivers below drivers/acpi to use .remove(), with the eventual goal to drop struct platform_driver::remove_new(). As .remove() and .remove_new() have the same prototypes, conversion is done by just changing the structure member name in the driver initializer. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://patch.msgid.link/9ee1a9813f53698be62aab9d810b2d97a2a9f186.1731397722.git.u.kleine-koenig@baylibre.com [ rjw: Subject edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-11-12	ACPI: x86: Add adev NULL check to acpi_quirk_skip_serdev_enumeration()	Hans de Goede
	acpi_dev_hid_match() does not check for adev == NULL, dereferencing it unconditional. Add a check for adev being NULL before calling acpi_dev_hid_match(). At the moment acpi_quirk_skip_serdev_enumeration() is never called with a controller_parent without an ACPI companion, but better safe than sorry. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://patch.msgid.link/20241109220028.83047-1-hdegoede@redhat.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-11-12	ACPI: x86: Make UART skip quirks work on PCI UARTs without an UID	Hans de Goede
	The Vexia EDU ATLA 10 tablet (9V version) which shipped with Android 4.2 as factory OS has the usual broken DSDT issues for x86 Android tablets. On top of that this tablet is special because all its LPSS island peripherals are enumerated as PCI devices rather then as ACPI devices as they typically are. For the x86-android-tablets kmod to be able to instantiate a serdev client for the Bluetooth HCI on this tablet, an ACPI_QUIRK_UART1_SKIP quirk is necessary. Modify acpi_dmi_skip_serdev_enumeration() to work with PCI enumerated UARTs without an UID, such as the UARTs on this tablet. Also make acpi_dmi_skip_serdev_enumeration() exit early if there are no quirks, since there is nothing to do then. And add the necessary quirks for the Vexia EDU ATLA 10 tablet. This should compile with CONFIG_PCI being unset without issues because dev_is_pci() is defined as "(false)" then. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://patch.msgid.link/20241109215936.83004-1-hdegoede@redhat.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-11-12	mailbox: qcom-cpucp: Mark the irq with IRQF_NO_SUSPEND flag	Sibi Sankar
	The qcom-cpucp mailbox irq is expected to function during suspend-resume cycle particularly when the scmi cpufreq driver can query the current frequency using the get_level message after the cpus are brought up during resume. Hence mark the irq with IRQF_NO_SUSPEND flag to fix the do_xfer failures we see during resume. Err Logs: arm-scmi firmware:scmi: timed out in resp(caller:do_xfer+0x164/0x568) cpufreq: cpufreq_online: ->get() failed Reported-by: Johan Hovold <johan+linaro@kernel.org> Closes: https://lore.kernel.org/lkml/ZtgFj1y5ggipgEOS@hovoldconsulting.com/ Fixes: 0e2a9a03106c ("mailbox: Add support for QTI CPUCP mailbox controller") Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Tested-by: Johan Hovold <johan+linaro@kernel.org> Cc: stable@vger.kernel.org Message-ID: <20241030125512.2884761-7-quic_sibis@quicinc.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-11-12	firmware: arm_scmi: Report duplicate opps as firmware bugs	Sibi Sankar
	Duplicate opps reported by buggy SCP firmware currently show up as warnings even though the only functional impact is that the level/index remain inaccessible. Make it less scary for the end user by using dev_info instead, along with FW_BUG tag. Suggested-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com> Reviewed-by: Cristian Marussi <cristian.marussi@arm.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Cc: stable@vger.kernel.org Message-ID: <20241030125512.2884761-4-quic_sibis@quicinc.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-11-12	firmware: arm_scmi: Skip opp duplicates	Cristian Marussi
	Buggy firmware can reply with duplicated PERF opps descriptors. Ensure that the bad duplicates reported by the platform firmware doesn't get added to the opp-tables. Reported-by: Johan Hovold <johan+linaro@kernel.org> Closes: https://lore.kernel.org/lkml/ZoQjAWse2YxwyRJv@hovoldconsulting.com/ Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> Tested-by: Johan Hovold <johan+linaro@kernel.org> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Cc: stable@vger.kernel.org Message-ID: <20241030125512.2884761-3-quic_sibis@quicinc.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-11-12	Revert "mmc: dw_mmc: Fix IDMAC operation with pages bigger than 4K"	Aurelien Jarno
	The commit 8396c793ffdf ("mmc: dw_mmc: Fix IDMAC operation with pages bigger than 4K") increased the max_req_size, even for 4K pages, causing various issues: - Panic booting the kernel/rootfs from an SD card on Rockchip RK3566 - Panic booting the kernel/rootfs from an SD card on StarFive JH7100 - "swiotlb buffer is full" and data corruption on StarFive JH7110 At this stage no fix have been found, so it's probably better to just revert the change. This reverts commit 8396c793ffdf28bb8aee7cfe0891080f8cab7890. Cc: stable@vger.kernel.org Cc: Sam Protsenko <semen.protsenko@linaro.org> Fixes: 8396c793ffdf ("mmc: dw_mmc: Fix IDMAC operation with pages bigger than 4K") Closes: https://lore.kernel.org/linux-mmc/614692b4-1dbe-31b8-a34d-cb6db1909bb7@w6rz.net/ Closes: https://lore.kernel.org/linux-mmc/CAC8uq=Ppnmv98mpa1CrWLawWoPnu5abtU69v-=G-P7ysATQ2Pw@mail.gmail.com/ Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-ID: <20241110114700.622372-1-aurelien@aurel32.net> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-11-12	mmc: sunxi-mmc: Fix A100 compatible description	Andre Przywara
	It turns out that the Allwinner A100/A133 SoC only supports 8K DMA blocks (13 bits wide), for both the SD/SDIO and eMMC instances. And while this alone would make a trivial fix, the H616 falls back to the A100 compatible string, so we have to now match the H616 compatible string explicitly against the description advertising 64K DMA blocks. As the A100 is now compatible with the D1 description, let the A100 compatible string point to that block instead, and introduce an explicit match against the H616 string, pointing to the old description. Also remove the redundant setting of clk_delays to NULL on the way. Fixes: 3536b82e5853 ("mmc: sunxi: add support for A100 mmc controller") Cc: stable@vger.kernel.org Signed-off-by: Andre Przywara <andre.przywara@arm.com> Tested-by: Parthiban Nallathambi <parthiban@linumiz.com> Reviewed-by: Chen-Yu Tsai <wens@csie.org> Message-ID: <20241107014240.24669-1-andre.przywara@arm.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-11-12	Bluetooth: btintel: Direct exception event to bluetooth stack	Kiran K
	Have exception event part of HCI traces which helps for debug. snoop traces: > HCI Event: Vendor (0xff) plen 79 Vendor Prefix (0x8780) Intel Extended Telemetry (0x03) Unknown extended telemetry event type (0xde) 01 01 de Unknown extended subevent 0x07 01 01 de 07 01 de 06 1c ef be ad de ef be ad de ef be ad de ef be ad de ef be ad de ef be ad de ef be ad de 05 14 ef be ad de ef be ad de ef be ad de ef be ad de ef be ad de 43 10 ef be ad de ef be ad de ef be ad de ef be ad de Fixes: af395330abed ("Bluetooth: btintel: Add Intel devcoredump support") Signed-off-by: Kiran K <kiran.k@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-11-12	drivers: perf: Fix wrong put_cpu() placement	Alexandre Ghiti
	Unfortunately, the wrong patch version was merged which places the put_cpu() after enabling a static key, which is not safe as pointed by Will [1], so move put_cpu() before to avoid this. Fixes: 2840dadf0dde ("drivers: perf: Fix smp_processor_id() use in preemptible code") Reported-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/all/20240827125335.GD4772@willie-the-truck/ [1] Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20241112113422.617954-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-11-12	Revert "RDMA/core: Fix ENODEV error for iWARP test over vlan"	Leon Romanovsky
	The citied commit in Fixes line caused to regression for udaddy [1] application. It doesn't work over VLANs anymore. Client: ifconfig eth2 1.1.1.1 ip link add link eth2 name p0.3597 type vlan protocol 802.1Q id 3597 ip link set dev p0.3597 up ip addr add 2.2.2.2/16 dev p0.3597 udaddy -S 847 -C 220 -c 2 -t 0 -s 2.2.2.3 -b 2.2.2.2 Server: ifconfig eth2 1.1.1.3 ip link add link eth2 name p0.3597 type vlan protocol 802.1Q id 3597 ip link set dev p0.3597 up ip addr add 2.2.2.3/16 dev p0.3597 udaddy -S 847 -C 220 -c 2 -t 0 -b 2.2.2.3 [1] https://github.com/linux-rdma/rdma-core/blob/master/librdmacm/examples/udaddy.c Fixes: 5069d7e202f6 ("RDMA/core: Fix ENODEV error for iWARP test over vlan") Reported-by: Leon Romanovsky <leonro@nvidia.com> Closes: https://lore.kernel.org/all/20241110130746.GA48891@unreal Link: https://patch.msgid.link/bb9d403419b2b9566da5b8bf0761fa8377927e49.1731401658.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2024-11-12	acpi/arm64: remove unnecessary cast	Min-Hua Chen
	DEFINE_RES_IRQ returns struct resource type, so it is unnecessary to cast it to struct resource. Remove the unnecessary cast to fix the following sparse warnings: drivers/acpi/arm64/gtdt.c:355:19: sparse: warning: cast to non-scalar drivers/acpi/arm64/gtdt.c:355:19: sparse: warning: cast from non-scalar No functional changes intended. Signed-off-by: Min-Hua Chen <minhuadotchen@gmail.com> Acked-by: Hanjun Guo <guohanjun@huawei.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20240917233827.73167-1-minhuadotchen@gmail.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-12	TC: Fix the wrong format specifier	zhang jiao
	The format specifier of "unsigned int" in pr_info() should be "%u", not "%d". Signed-off-by: zhang jiao <zhangjiao2@cmss.chinamobile.com> Acked-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2024-11-12	s390/con3270: Use NULL instead of 0 for pointers	Heiko Carstens
	Get rid of sparse warnings: CHECK drivers/s390/char/con3270.c drivers/s390/char/con3270.c:531:15: warning: Using plain integer as NULL pointer drivers/s390/char/con3270.c:749:15: warning: Using plain integer as NULL pointer Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-12	drm/i915: Grab intel_display from the encoder to avoid potential oopsies	Ville Syrjälä
	Grab the intel_display from 'encoder' rather than 'state' in the encoder hooks to avoid the massive footgun that is intel_sanitize_encoder(), which passes NULL as the 'state' argument to encoder .disable() and .post_disable(). TODO: figure out how to actually fix intel_sanitize_encoder()... Fixes: ab0b0eb5c85c ("drm/i915/tv: convert to struct intel_display") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241107161123.16269-2-ville.syrjala@linux.intel.com Reviewed-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit dc3806d9eb66d0105f8d55d462d4ef681d9eac59) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2024-11-12	drm/i915/gsc: ARL-H and ARL-U need a newer GSC FW.	Daniele Ceraolo Spurio
	All MTL and ARL SKUs share the same GSC FW, but the newer platforms are only supported in newer blobs. In particular, ARL-S is supported starting from 102.0.10.1878 (which is already the minimum required version for ARL in the code), while ARL-H and ARL-U are supported from 102.1.15.1926. Therefore, the driver needs to check which specific ARL subplatform its running on when verifying that the GSC FW is new enough for it. Fixes: 2955ae8186c8 ("drm/i915: ARL requires a newer GSC firmware") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241028233132.149745-1-daniele.ceraolospurio@intel.com (cherry picked from commit 3c1d5ced18db8a67251c8436cf9bdc061f972bdb) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2024-11-11	net/mlx5e: Disable loopback self-test on multi-PF netdev	Carolina Jubran
	In Multi-PF (Socket Direct) configurations, when a loopback packet is sent through one of the secondary devices, it will always be received on the primary device. This causes the loopback layer to fail in identifying the loopback packet as the devices are different. To avoid false test failures, disable the loopback self-test in Multi-PF configurations. Fixes: ed29705e4ed1 ("net/mlx5: Enable SD feature") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107183527.676877-8-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	net/mlx5e: CT: Fix null-ptr-deref in add rule err flow	Moshe Shemesh
	In error flow of mlx5_tc_ct_entry_add_rule(), in case ct_rule_add() callback returns error, zone_rule->attr is used uninitiated. Fix it to use attr which has the needed pointer value. Kernel log: BUG: kernel NULL pointer dereference, address: 0000000000000110 RIP: 0010:mlx5_tc_ct_entry_add_rule+0x2b1/0x2f0 [mlx5_core] … Call Trace: <TASK> ? __die+0x20/0x70 ? page_fault_oops+0x150/0x3e0 ? exc_page_fault+0x74/0x140 ? asm_exc_page_fault+0x22/0x30 ? mlx5_tc_ct_entry_add_rule+0x2b1/0x2f0 [mlx5_core] ? mlx5_tc_ct_entry_add_rule+0x1d5/0x2f0 [mlx5_core] mlx5_tc_ct_block_flow_offload+0xc6a/0xf90 [mlx5_core] ? nf_flow_offload_tuple+0xd8/0x190 [nf_flow_table] nf_flow_offload_tuple+0xd8/0x190 [nf_flow_table] flow_offload_work_handler+0x142/0x320 [nf_flow_table] ? finish_task_switch.isra.0+0x15b/0x2b0 process_one_work+0x16c/0x320 worker_thread+0x28c/0x3a0 ? __pfx_worker_thread+0x10/0x10 kthread+0xb8/0xf0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2d/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Fixes: 7fac5c2eced3 ("net/mlx5: CT: Avoid reusing modify header context for natted entries") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107183527.676877-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	net/mlx5e: clear xdp features on non-uplink representors	William Tu
	Non-uplink representor port does not support XDP. The patch clears the xdp feature by checking the net_device_ops.ndo_bpf is set or not. Verify using the netlink tool: $ tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml --dump dev-get Representor netdev before the patch: {'ifindex': 8, 'xdp-features': {'basic', 'ndo-xmit', 'ndo-xmit-sg', 'redirect', 'rx-sg', 'xsk-zerocopy'}, 'xdp-rx-metadata-features': set(), 'xdp-zc-max-segs': 1, 'xsk-features': set()}, With the patch: {'ifindex': 8, 'xdp-features': set(), 'xdp-rx-metadata-features': set(), 'xsk-features': set()}, Fixes: 4d5ab0ad964d ("net/mlx5e: take into account device reconfiguration for xdp_features flag") Signed-off-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107183527.676877-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	net/mlx5e: kTLS, Fix incorrect page refcounting	Dragos Tatulea
	The kTLS tx handling code is using a mix of get_page() and page_ref_inc() APIs to increment the page reference. But on the release path (mlx5e_ktls_tx_handle_resync_dump_comp()), only put_page() is used. This is an issue when using pages from large folios: the get_page() references are stored on the folio page while the page_ref_inc() references are stored directly in the given page. On release the folio page will be dereferenced too many times. This was found while doing kTLS testing with sendfile() + ZC when the served file was read from NFS on a kernel with NFS large folios support (commit 49b29a573da8 ("nfs: add support for large folios")). Fixes: 84d1bb2b139e ("net/mlx5e: kTLS, Limit DUMP wqe size") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107183527.676877-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	net/mlx5: fs, lock FTE when checking if active	Mark Bloch
	The referenced commits introduced a two-step process for deleting FTEs: - Lock the FTE, delete it from hardware, set the hardware deletion function to NULL and unlock the FTE. - Lock the parent flow group, delete the software copy of the FTE, and remove it from the xarray. However, this approach encounters a race condition if a rule with the same match value is added simultaneously. In this scenario, fs_core may set the hardware deletion function to NULL prematurely, causing a panic during subsequent rule deletions. To prevent this, ensure the active flag of the FTE is checked under a lock, which will prevent the fs_core layer from attaching a new steering rule to an FTE that is in the process of deletion. [ 438.967589] MOSHE: 2496 mlx5_del_flow_rules del_hw_func [ 438.968205] ------------[ cut here ]------------ [ 438.968654] refcount_t: decrement hit 0; leaking memory. [ 438.969249] WARNING: CPU: 0 PID: 8957 at lib/refcount.c:31 refcount_warn_saturate+0xfb/0x110 [ 438.970054] Modules linked in: act_mirred cls_flower act_gact sch_ingress openvswitch nsh mlx5_vdpa vringh vhost_iotlb vdpa mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core zram zsmalloc fuse [last unloaded: cls_flower] [ 438.973288] CPU: 0 UID: 0 PID: 8957 Comm: tc Not tainted 6.12.0-rc1+ #8 [ 438.973888] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 438.974874] RIP: 0010:refcount_warn_saturate+0xfb/0x110 [ 438.975363] Code: 40 66 3b 82 c6 05 16 e9 4d 01 01 e8 1f 7c a0 ff 0f 0b c3 cc cc cc cc 48 c7 c7 10 66 3b 82 c6 05 fd e8 4d 01 01 e8 05 7c a0 ff <0f> 0b c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 [ 438.976947] RSP: 0018:ffff888124a53610 EFLAGS: 00010286 [ 438.977446] RAX: 0000000000000000 RBX: ffff888119d56de0 RCX: 0000000000000000 [ 438.978090] RDX: ffff88852c828700 RSI: ffff88852c81b3c0 RDI: ffff88852c81b3c0 [ 438.978721] RBP: ffff888120fa0e88 R08: 0000000000000000 R09: ffff888124a534b0 [ 438.979353] R10: 0000000000000001 R11: 0000000000000001 R12: ffff888119d56de0 [ 438.979979] R13: ffff888120fa0ec0 R14: ffff888120fa0ee8 R15: ffff888119d56de0 [ 438.980607] FS: 00007fe6dcc0f800(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000 [ 438.983984] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 438.984544] CR2: 00000000004275e0 CR3: 0000000186982001 CR4: 0000000000372eb0 [ 438.985205] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 438.985842] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 438.986507] Call Trace: [ 438.986799] <TASK> [ 438.987070] ? __warn+0x7d/0x110 [ 438.987426] ? refcount_warn_saturate+0xfb/0x110 [ 438.987877] ? report_bug+0x17d/0x190 [ 438.988261] ? prb_read_valid+0x17/0x20 [ 438.988659] ? handle_bug+0x53/0x90 [ 438.989054] ? exc_invalid_op+0x14/0x70 [ 438.989458] ? asm_exc_invalid_op+0x16/0x20 [ 438.989883] ? refcount_warn_saturate+0xfb/0x110 [ 438.990348] mlx5_del_flow_rules+0x2f7/0x340 [mlx5_core] [ 438.990932] __mlx5_eswitch_del_rule+0x49/0x170 [mlx5_core] [ 438.991519] ? mlx5_lag_is_sriov+0x3c/0x50 [mlx5_core] [ 438.992054] ? xas_load+0x9/0xb0 [ 438.992407] mlx5e_tc_rule_unoffload+0x45/0xe0 [mlx5_core] [ 438.993037] mlx5e_tc_del_fdb_flow+0x2a6/0x2e0 [mlx5_core] [ 438.993623] mlx5e_flow_put+0x29/0x60 [mlx5_core] [ 438.994161] mlx5e_delete_flower+0x261/0x390 [mlx5_core] [ 438.994728] tc_setup_cb_destroy+0xb9/0x190 [ 438.995150] fl_hw_destroy_filter+0x94/0xc0 [cls_flower] [ 438.995650] fl_change+0x11a4/0x13c0 [cls_flower] [ 438.996105] tc_new_tfilter+0x347/0xbc0 [ 438.996503] ? ___slab_alloc+0x70/0x8c0 [ 438.996929] rtnetlink_rcv_msg+0xf9/0x3e0 [ 438.997339] ? __netlink_sendskb+0x4c/0x70 [ 438.997751] ? netlink_unicast+0x286/0x2d0 [ 438.998171] ? __pfx_rtnetlink_rcv_msg+0x10/0x10 [ 438.998625] netlink_rcv_skb+0x54/0x100 [ 438.999020] netlink_unicast+0x203/0x2d0 [ 438.999421] netlink_sendmsg+0x1e4/0x420 [ 438.999820] __sock_sendmsg+0xa1/0xb0 [ 439.000203] ____sys_sendmsg+0x207/0x2a0 [ 439.000600] ? copy_msghdr_from_user+0x6d/0xa0 [ 439.001072] ___sys_sendmsg+0x80/0xc0 [ 439.001459] ? ___sys_recvmsg+0x8b/0xc0 [ 439.001848] ? generic_update_time+0x4d/0x60 [ 439.002282] __sys_sendmsg+0x51/0x90 [ 439.002658] do_syscall_64+0x50/0x110 [ 439.003040] entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: 718ce4d601db ("net/mlx5: Consolidate update FTE for all removal changes") Fixes: cefc23554fc2 ("net/mlx5: Fix FTE cleanup") Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107183527.676877-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	net/mlx5: Fix msix vectors to respect platform limit	Parav Pandit
	The number of PCI vectors allocated by the platform (which may be fewer than requested) is currently not honored when creating the SF pool; only the PCI MSI-X capability is considered. As a result, when a platform allocates fewer vectors (in non-dynamic mode) than requested, the PF and SF pools end up with an invalid vector range. This causes incorrect SF vector accounting, which leads to the following call trace when an invalid IRQ vector is allocated. This issue is resolved by ensuring that the platform's vector limit is respected for both the SF and PF pools. Workqueue: mlx5_vhca_event0 mlx5_sf_dev_add_active_work [mlx5_core] RIP: 0010:pci_irq_vector+0x23/0x80 RSP: 0018:ffffabd5cebd7248 EFLAGS: 00010246 RAX: ffff980880e7f308 RBX: ffff9808932fb880 RCX: 0000000000000001 RDX: 00000000000001ff RSI: 0000000000000200 RDI: ffff980880e7f308 RBP: 0000000000000200 R08: 0000000000000010 R09: ffff97a9116f0860 R10: 0000000000000002 R11: 0000000000000228 R12: ffff980897cd0160 R13: 0000000000000000 R14: ffff97a920fec0c0 R15: ffffabd5cebd72d0 FS: 0000000000000000(0000) GS:ffff97c7ff9c0000(0000) knlGS:0000000000000000 ? rescuer_thread+0x350/0x350 kthread+0x11b/0x140 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x22/0x30 mlx5_core 0000:a1:00.0: mlx5_irq_alloc:321:(pid 6781): Failed to request irq. err = -22 mlx5_core 0000:a1:00.0: mlx5_irq_alloc:321:(pid 6781): Failed to request irq. err = -22 mlx5_core.sf mlx5_core.sf.6: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 enhanced) mlx5_core.sf mlx5_core.sf.7: firmware version: 32.43.356 mlx5_core.sf mlx5_core.sf.6 enpa1s0f0s4: renamed from eth0 mlx5_core.sf mlx5_core.sf.7: Rate limit: 127 rates are supported, range: 0Mbps to 195312Mbps mlx5_core 0000:a1:00.0: mlx5_irq_alloc:321:(pid 6781): Failed to request irq. err = -22 mlx5_core 0000:a1:00.0: mlx5_irq_alloc:321:(pid 6781): Failed to request irq. err = -22 mlx5_core 0000:a1:00.0: mlx5_irq_alloc:321:(pid 6781): Failed to request irq. err = -22 Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation") Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Amir Tzin <amirtz@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107183527.676877-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	net/mlx5: E-switch, unload IB representors when unloading ETH representors	Chiara Meiohas
	IB representors depend on ETH representors, so the IB representors should not exist without the ETH ones. When unloading the ETH representors, the corresponding IB representors should be also unloaded. The commit 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev functions") introduced the use of the ib_device_set_netdev API in IB repsresentors. ib_device_set_netdev() increments the refcount of the representor's netdev when loading an IB representor and decrements it when unloading. Without the unloading of the IB representor, the refcount of the representor's netdev remains greater than 0, preventing it from being unregistered. The patch uncovered an underlying bug where the eth representor is unloaded, without unloading the IB representor. This issue happened when using multiport E-switch and rebooting, causing the shutdown to hang when unloading the ETH representor because the refcount of the representor's netdevice was greater than 0. Call trace: unregister_netdevice: waiting for eth3 to become free. Usage count = 2 ref_tracker: eth%d@00000000661d60f7 has 1/1 users at ib_device_set_netdev+0x160/0x2d0 [ib_core] mlx5_ib_vport_rep_load+0x104/0x3f0 [mlx5_ib] mlx5_eswitch_reload_ib_reps+0xfc/0x110 [mlx5_core] mlx5_mpesw_work+0x236/0x330 [mlx5_core] process_one_work+0x169/0x320 worker_thread+0x288/0x3a0 kthread+0xb8/0xe0 ret_from_fork+0x2d/0x50 ret_from_fork_asm+0x11/0x20 Fixes: 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev functions") Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241107183527.676877-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11	drm/amdgpu/mes12: correct kiq unmap latency	Jack Xiao
	Correct kiq unmap queue timeout value. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit cfe98204a06329b6b7fce1b828b7d620473181ff) Cc: stable@vger.kernel.org # 6.11.x
2024-11-11	drm/amdgpu: fix check in gmc_v9_0_get_vm_pte()	Christian König
	The coherency flags can only be determined when the BO is locked and that in turn is only guaranteed when the mapping is validated. Fix the check, move the resource check into the function and add an assert that the BO is locked. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: d1a372af1c3d ("drm/amdgpu: Set MTYPE in PTE based on BO flags") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1b4ca8546f5b5c482717bedb8e031227b1541539) Cc: stable@vger.kernel.org
2024-11-11	drm/amd/pm: print pp_dpm_mclk in ascending order on SMU v14.0.0	Tim Huang
	Currently, the pp_dpm_mclk values are reported in descending order on SMU IP v14.0.0/1/4. Adjust to ascending order for consistency with other clock interfaces. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d4be16ccfd5bf822176740a51ff2306679a2247e) Cc: stable@vger.kernel.org
2024-11-11	drm/amdgpu: Fix video caps for H264 and HEVC encode maximum size	David Rosca
	H264 supports 4096x4096 starting from Polaris. HEVC also supports 4096x4096, with VCN 3 and newer 8192x4352 is supported. Signed-off-by: David Rosca <david.rosca@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 69e9a9e65b1ea542d07e3fdd4222b46e9f5a3a29) Cc: stable@vger.kernel.org
2024-11-11	drm/amd/display: Adjust VSDB parser for replay feature	Rodrigo Siqueira
	At some point, the IEEE ID identification for the replay check in the AMD EDID was added. However, this check causes the following out-of-bounds issues when using KASAN: [ 27.804016] BUG: KASAN: slab-out-of-bounds in amdgpu_dm_update_freesync_caps+0xefa/0x17a0 [amdgpu] [ 27.804788] Read of size 1 at addr ffff8881647fdb00 by task systemd-udevd/383 ... [ 27.821207] Memory state around the buggy address: [ 27.821215] ffff8881647fda00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 27.821224] ffff8881647fda80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 27.821234] >ffff8881647fdb00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 27.821243] ^ [ 27.821250] ffff8881647fdb80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 27.821259] ffff8881647fdc00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 27.821268] ================================================================== This is caused because the ID extraction happens outside of the range of the edid lenght. This commit addresses this issue by considering the amd_vsdb_block size. Cc: ChiaHsuan Chung <chiahsuan.chung@amd.com> Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b7e381b1ccd5e778e3d9c44c669ad38439a861d8) Cc: stable@vger.kernel.org
2024-11-11	drm/amd/display: Require minimum VBlank size for stutter optimization	Dillon Varone
	If the nominal VBlank is too small, optimizing for stutter can cause the prefetch bandwidth to increase drasticaly, resulting in higher clock and power requirements. Only optimize if it is >3x the stutter latency. Reviewed-by: Austin Zheng <austin.zheng@amd.com> Signed-off-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 003215f962cdf2265f126a3f4c9ad20917f87fca) Cc: stable@vger.kernel.org
2024-11-11	drm/amd/display: Handle dml allocation failure to avoid crash	Ryan Seto
	[Why] In the case where a dml allocation fails for any reason, the current state's dml contexts would no longer be valid. Then subsequent calls dc_state_copy_internal would shallow copy invalid memory and if the new state was released, a double free would occur. [How] Reset dml pointers in new_state to NULL and avoid invalid pointer Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Ryan Seto <ryanseto@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bcafdc61529a48f6f06355d78eb41b3aeda5296c) Cc: stable@vger.kernel.org
2024-11-11	drm/amd/display: Fix Panel Replay not update screen correctly	Tom Chung
	[Why] In certain use case such as KDE login screen, there will be no atomic commit while do the frame update. If the Panel Replay enabled, it will cause the screen not updated and looks like system hang. [How] Delay few atomic commits before enabled the Panel Replay just like PSR. Fixes: be64336307a6c ("drm/amd/display: Re-enable panel replay feature") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3686 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3682 Tested-By: Corey Hickey <bugfood-c@fatooh.org> Tested-By: James Courtier-Dutton <james.dutton@gmail.com> Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ca628f0eddd73adfccfcc06b2a55d915bca4a342) Cc: stable@vger.kernel.org # 6.11+
2024-11-11	drm/amd/display: Change some variable name of psr	Tom Chung
	Panel Replay feature may also use the same variable with PSR. Change the variable name and make it not specify for PSR. Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c7fafb7a46b38a11a19342d153f505749bf56f3e) Cc: stable@vger.kernel.org # 6.11+
2024-11-11	nvme: check ns's volatile write cache not present	Guixin Liu
	When the VWC of a namespace does not exist, the BLK_FEAT_WRITE_CACHE flag should not be set when registering the block device, regardless of whether the controller supports VWC. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-11	nvme: add rotational support	Wang Yugui
	Rotational devices, such as hard-drives, can be detected using the rotational bit in the namespace independent identify namespace data structure. Make the bit visible to the block layer through the rotational queue setting. Signed-off-by: Wang Yugui <wangyugui@e16-tech.com> Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-11	nvme: use command set independent id ns if available	Matias Bjørling
	The NVMe 2.0 specification adds an independent identify namespace data structure that contains generic attributes that apply to all namespace types. Some attributes carry over from the NVM command set identify namespace data structure, and others are new. Currently, the data structure only considered when CRIMS is enabled or when the namespace type is key-value. However, the independent namespace data structure is mandatory for devices that implement features from the 2.0+ specification. Therefore, we can check this data structure first. If unavailable, retrieve the generic attributes from the NVM command set identify namespace data structure. Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-11	nvmet: support for csi identify ns	Keith Busch
	Implements reporting the I/O Command Set Independent Identify Namespace command. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-11	nvmet: implement rotational media information log	Keith Busch
	Most of the information is stubbed. Supporting these commands is a requirement for supporting rotational media. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-11	nvmet: implement endurance groups	Keith Busch
	Most of the returned information is just stubbed data. The target must support these in order to report rotational media. Since this driver doesn't know any better, each namespace is its own endurance group with the engid value matching the nsid. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-11	nvmet: declare 2.1 version compliance	Keith Busch
	The target driver implements all the mandatory logs, identifications, features, and properties up to nvme sepcification 2.1. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-11	nvmet: implement crto property	Keith Busch
	This property is required for nvme 2.1. The target only supports ready with media, so this is just the same value as CAP.TO. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-11	nvmet: implement supported features log	Keith Busch
	This log is required for nvme 2.1. Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-11	nvmet: implement supported log pages	Keith Busch
	This log is required for nvme 2.1. Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-11-11	nvmet: implement active command set ns list	Keith Busch
	This is required for nvme 2.1 for targets that support multiple command sets. We support NVM and ZNS, so are required to support this identification. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org>