linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-07-12	iommufd: Fix error pointer checking	Lu Baolu
	Smatch static checker reported below warning: drivers/iommu/iommufd/fault.c:131 iommufd_device_get_attach_handle() warn: 'handle' is an error pointer or valid Fix it by checking 'handle' with IS_ERR(). Fixes: b7d8833677ba ("iommufd: Fault-capable hwpt attach/detach/replace") Link: https://lore.kernel.org/r/20240712025819.63147-1-baolu.lu@linux.intel.com Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/linux-iommu/8bb4f37a-4514-4dea-aabb-7380be303895@stanley.mountain/ Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-07-12	octeontx2-af: fix issue with IPv4 match for RSS	Satheesh Paul
	While performing RSS based on IPv4, packets with IPv4 options are not being considered. Adding changes to match both plain IPv4 and IPv4 with option header. Fixes: 41a7aa7b800d ("octeontx2-af: NIX Rx flowkey configuration for RSS") Signed-off-by: Satheesh Paul <psatheesh@marvell.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-12	octeontx2-af: fix issue with IPv6 ext match for RSS	Kiran Kumar K
	While performing RSS based on IPv6, extension ltype is not being considered. This will be problem for fragmented packets or packets with extension header. Adding changes to match IPv6 ext header along with IPv6 ltype. Fixes: 41a7aa7b800d ("octeontx2-af: NIX Rx flowkey configuration for RSS") Signed-off-by: Kiran Kumar K <kirankumark@marvell.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-12	octeontx2-af: fix detection of IP layer	Michal Mazur
	Checksum and length checks are not enabled for IPv4 header with options and IPv6 with extension headers. To fix this a change in enum npc_kpu_lc_ltype is required which will allow adjustment of LTYPE_MASK to detect all types of IP headers. Fixes: 21e6699e5cd6 ("octeontx2-af: Add NPC KPU profile") Signed-off-by: Michal Mazur <mmazur2@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-12	octeontx2-af: fix a issue with cpt_lf_alloc mailbox	Srujana Challa
	This patch fixes CPT_LF_ALLOC mailbox error due to incompatible mailbox message format. Specifically, it corrects the `blkaddr` field type from `int` to `u8`. Fixes: de2854c87c64 ("octeontx2-af: Mailbox changes for 98xx CPT block") Signed-off-by: Srujana Challa <schalla@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-12	octeontx2-af: replace cpt slot with lf id on reg write	Nithin Dabilpuram
	Replace slot id with global CPT lf id on reg read/write as CPTPF/VF driver would send slot number instead of global lf id in the reg offset. And also update the mailbox response with the global lf's register offset. Fixes: ae454086e3c2 ("octeontx2-af: add mailbox interface for CPT") Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-12	net: mctp-i2c: invalidate flows immediately on TX errors	Jeremy Kerr
	If we encounter an error on i2c packet transmit, we won't have a valid flow anymore; since we didn't transmit a valid packet sequence, we'll have to wait for the key to timeout instead of dropping it on the reply. This causes the i2c lock to be held for longer than necessary. Instead, invalidate the flow on TX error, and release the i2c lock immediately. Cc: Bonnie Lo <Bonnie_Lo@wiwynn.com> Tested-by: Jerry C Chen <Jerry_C_Chen@wiwynn.com> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-12	iommufd: Add check on user response code	Lu Baolu
	The response code from user space is only allowed to be SUCCESS or INVALID. All other values are treated by the device as a response code of Response Failure according to PCI spec, section 10.4.2.1. This response disables the Page Request Interface for the Function. Add a check in iommufd_fault_fops_write() to avoid invalid response code. Fixes: 07838f7fd529 ("iommufd: Add iommufd fault object") Link: https://lore.kernel.org/r/20240710083341.44617-3-baolu.lu@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-07-12	binder: fix hang of unregistered readers	Carlos Llamas
	With the introduction of binder_available_for_proc_work_ilocked() in commit 1b77e9dcc3da ("ANDROID: binder: remove proc waitqueue") a binder thread can only "wait_for_proc_work" after its thread->looper has been marked as BINDER_LOOPER_STATE_{ENTERED\|REGISTERED}. This means an unregistered reader risks waiting indefinitely for work since it never gets added to the proc->waiting_threads. If there are no further references to its waitqueue either the task will hang. The same applies to readers using the (e)poll interface. I couldn't find the rationale behind this restriction. So this patch restores the previous behavior of allowing unregistered threads to "wait_for_proc_work". Note that an error message for this scenario, which had previously become unreachable, is now re-enabled. Fixes: 1b77e9dcc3da ("ANDROID: binder: remove proc waitqueue") Cc: stable@vger.kernel.org Cc: Martijn Coenen <maco@google.com> Cc: Arve Hjønnevåg <arve@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20240711201452.2017543-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-12	mmc: sdhci-esdhc-imx: obtain the 'per' clock rate after its enablement	Ciprian Costea
	The I.MX SDHCI driver assumes that the frequency of the 'per' clock can be obtained even on disabled clocks, which is not always the case. According to 'clk_get_rate' documentation, it is only valid once the clock source has been enabled. Signed-off-by: Ciprian Costea <ciprianmarian.costea@oss.nxp.com> Reviewed-by: Haibo Chen <haibo.chen@nxp.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20240708121018.246476-3-ciprianmarian.costea@oss.nxp.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-07-12	mmc: sdhci-esdhc-imx: disable card detect wake for S32G based platforms	Ciprian Costea
	In case of S32G based platforms, GPIO CD used for card detect wake mechanism is not available. For this scenario the newly introduced flag 'ESDHC_FLAG_SKIP_CD_WAKE' is used. Signed-off-by: Ciprian Costea <ciprianmarian.costea@oss.nxp.com> Reviewed-by: Haibo Chen <haibo.chen@nxp.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20240708121018.246476-2-ciprianmarian.costea@oss.nxp.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-07-12	sysfs/cpu: Make crash_hotplug attribute world-readable	Petr Tesarik
	There is no reason to restrict access to this attribute, as it merely reports whether crash elfcorehdr is automatically updated on CPU hot plug/unplug and/or online/offline events. Note that since commit 79365026f8694 ("crash: add a new kexec flag for hotplug support"), this maps to the same flag which is world-accessible through /sys/devices/system/memory/crash_hotplug. Signed-off-by: Petr Tesarik <ptesarik@suse.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Sourabh Jain <sourabhjain@linux.ibm.com> Link: https://lore.kernel.org/r/20240711103409.319673-1-petr.tesarik@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-12	platform: arm64: EC_LENOVO_YOGA_C630 should depend on ARCH_QCOM	Geert Uytterhoeven
	The Lenovo Yoga C630 Embedded Controller is only present on the Qualcomm Snapdragon-based Lenovo Yoga C630 laptop. Hence add a dependency on ARCH_QCOM, to prevent asking the user about this driver when configuring a kernel without Qualcomm SoC support. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/0e4c9ffdc8a5caffcda2afb8d5480900f7adebf6.1720707932.git.geert+renesas@glider.be Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-07-12	platform: arm64: EC_ACER_ASPIRE1 should depend on ARCH_QCOM	Geert Uytterhoeven
	The Acer Aspire 1 Embedded Controller is only present on the Qualcomm Snapdragon-based Acer Aspire 1 laptop. Hence add a dependency on ARCH_QCOM, to prevent asking the user about this driver when configuring a kernel without Qualcomm SoC support. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Nikita Travkin <nikita@trvn.ru> Link: https://lore.kernel.org/r/f5f38709c01d369ed9e375ceb2a9a12986457a1a.1720707932.git.geert+renesas@glider.be Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-07-12	misc: Kconfig: add a new dependency for MARVELL_CN10K_DPI	Vamsi Attunuru
	DPI hardware is an on-chip PCIe device on Marvell's arm64 SoC platforms. As Arnd suggested, CN10K belongs to ARCH_THUNDER lineage. Patch makes mrvl_cn10k_dpi driver dependent on CONFIG_ARCH_THUNDER. Signed-off-by: Vamsi Attunuru <vattunuru@marvell.com> Link: https://lore.kernel.org/r/20240711120115.4069401-1-vattunuru@marvell.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-12	virtio: add missing MODULE_DESCRIPTION() macro	Jeff Johnson
	make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/virtio/virtio_dma_buf.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20240602-md-virtio_dma_buf-v1-1-ce602d47e257@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-12	agp: uninorth: add missing MODULE_DESCRIPTION() macro	Jeff Johnson
	With ARCH=powerpc, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/char/agp/uninorth-agp.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Link: https://lore.kernel.org/r/20240615-md-powerpc-drivers-char-agp-v1-1-b79bfd07da42@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-12	spmi: add missing MODULE_DESCRIPTION() macros	Jeff Johnson
	make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/spmi/hisi-spmi-controller.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/spmi/spmi-pmic-arb.o Add the missing invocations of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20240609-md-drivers-spmi-v1-1-f1d5b24e7a66@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-12	media: raspberrypi: Switch to remove_new	Stephen Rothwell
	The remove callback's return value is about to change from int to void, this is done by commit 0edb555a65d1 ("platform: Make platform_driver::remove() return void"). Prepare for merging the patch by switching the PiSP driver from remove to remove_new callback. Fixes: 12187bd5d4f8 ("media: raspberrypi: Add support for PiSP BE") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Naushir Patuck <naush@raspberrypi.com> Acked-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-07-12	efi: Replace efi_memory_attributes_table_t 0-sized array with flexible array	Kees Cook
	While efi_memory_attributes_table_t::entry isn't used directly as an array, it is used as a base for pointer arithmetic. The type is wrong as it's not technically an array of efi_memory_desc_t's; they could be larger. Regardless, leave the type unchanged and remove the old style "0" array size. Additionally replace the open-coded entry offset code with the existing efi_memdesc_ptr() helper. Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-07-12	efi: Rename efi_early_memdesc_ptr() to efi_memdesc_ptr()	Kees Cook
	The "early" part of the helper's name isn't accurate[1]. Drop it in preparation for adding a new (not early) usage. Suggested-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/lkml/CAMj1kXEyDjH0uu3Z4eBesV3PEnKGi5ArXXMp7R-hn8HdRytiPg@mail.gmail.com [1] Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-07-12	leds: leds-lp5569: Enable chip after chip configuration	Christian Marangi
	Documentation say that clock internal config needs to be set BEFORE chip is enabled. Align code to this and move the CHIP enable after the CHIP is configured. While at it also make use of STATUS reg and check when STARTUP is completed instead of sleep for 1-2 ms. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Link: https://lore.kernel.org/r/20240712004556.15601-3-ansuelsmth@gmail.com Signed-off-by: Lee Jones <lee@kernel.org>
2024-07-12	leds: leds-lp5569: Better handle enabling clock internal setting	Christian Marangi
	Better handle enabling clock internal setting. In further testing it was notice that internal clock config MUST be set before clock output config or the LED CHIP might stop working. This wasn't documented and was actually found on devices that have 2 chip chained where one chip provide clock for the other. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Link: https://lore.kernel.org/r/20240712004556.15601-2-ansuelsmth@gmail.com Signed-off-by: Lee Jones <lee@kernel.org>
2024-07-12	leds: leds-lp5569: Fix typo in driver name	Christian Marangi
	Remove extra x from driver name as this was a typo from copy-paste error. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Link: https://lore.kernel.org/r/20240712004556.15601-1-ansuelsmth@gmail.com Signed-off-by: Lee Jones <lee@kernel.org>
2024-07-12	i2c: testunit: avoid re-issued work after read message	Wolfram Sang
	The to-be-fixed commit rightfully prevented that the registers will be cleared. However, the index must be cleared. Otherwise a read message will re-issue the last work. Fix it and add a comment describing the situation. Fixes: c422b6a63024 ("i2c: testunit: don't erase registers after STOP") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Andi Shyti <andi.shyti@kernel.org> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2024-07-12	firewire: core: move copy_port_status() helper function to TP_fast_assign() ↵	Takashi Sakamoto
	block It would be possible to put any statement in TP_fast_assign(). This commit obsoletes the helper function and put its statements to TP_fast_assign() for the code simplicity. Link: https://lore.kernel.org/r/20240712003010.87341-1-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-07-12	Merge tag 'amd-drm-fixes-6.10-2024-07-11' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.10-2024-07-11: amdgpu: - PSR-SU fix - Reseved VMID fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240712005534.803064-1-alexander.deucher@amd.com
2024-07-12	Merge tag 'drm-xe-fixes-2024-07-11' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes UAPI Changes: - Use write-back caching mode for system memory on DGFX (Thomas) Driver Changes: - Do not leak object when finalizing hdcp gsc (Nirmoy) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/vgqz35btnxdddko3byrgww5ii36wig2tvondg2p3j3b3ourj4i@rqgolll3wwkh
2024-07-12	drm/omap: Restrict compile testing to PAGE_SIZE less than 64KB	Nathan Chancellor
	Prior to commit dc6fcaaba5a5 ("drm/omap: Allow build with COMPILE_TEST=y"), it was only possible to build the omapdrm driver with a 4KB page size. After that change, when the PAGE_SIZE is 64KB or larger, clang points out that the driver has some assumptions around the page size implicitly by passing PAGE_SIZE to a parameter with a type of u16: drivers/gpu/drm/omapdrm/omap_gem.c:758:7: error: implicit conversion from 'unsigned long' to 'u16' (aka 'unsigned short') changes value from 65536 to 0 [-Werror,-Wconstant-conversion] 757 \| block = tiler_reserve_2d(fmt, omap_obj->width, omap_obj->height, \| ~~~~~~~~~~~~~~~~ 758 \| PAGE_SIZE); \| ^~~~~~~~~ arch/powerpc/include/asm/page.h:25:34: note: expanded from macro 'PAGE_SIZE' 25 \| #define PAGE_SIZE (ASM_CONST(1) << PAGE_SHIFT) \| ~~~~~~~~~~~~~^~~~~~~~~~~~~ drivers/gpu/drm/omapdrm/omap_gem.c:1504:44: error: implicit conversion from 'unsigned long' to 'u16' (aka 'unsigned short') changes value from 65536 to 0 [-Werror,-Wconstant-conversion] 1504 \| block = tiler_reserve_2d(fmts[i], w, h, PAGE_SIZE); \| ~~~~~~~~~~~~~~~~ ^~~~~~~~~ arch/powerpc/include/asm/page.h:25:34: note: expanded from macro 'PAGE_SIZE' 25 \| #define PAGE_SIZE (ASM_CONST(1) << PAGE_SHIFT) \| ~~~~~~~~~~~~~^~~~~~~~~~~~~ 2 errors generated. As there is a lot of use of a u16 type throughout this driver and it will only ever be run on hardware that has a 4KB page size, just restrict compile testing to when the page size is less than 64KB (as no other issues have been discussed and it keeps compile testing relatively more available). Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240620-omapdrm-restrict-compile-test-to-sub-64kb-page-size-v1-1-5e56de71ffca@kernel.org
2024-07-12	Merge tag 'drm-xe-next-fixes-2024-07-11' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/drm/xe/kernel into drm-next UAPI Changes: - Rename xe perf layer as xe observation layer (Ashutosh) Driver Changes: - Drop trace_xe_hw_fence_free (Brost) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Zo_3ustogPDVKZwu@intel.com
2024-07-12	Merge tag 'drm-misc-next-fixes-2024-07-11' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/drm/misc/kernel into drm-next A fix for fbdev on big endian systems, a condition fix for a sharp panel at removal, and a fix for qxl to prevent unpinned buffer access under certain conditions. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240711-benign-rich-mouflon-2eeafe@houat
2024-07-11	net: netconsole: Eliminate redundant setting of enabled field	Breno Leitao
	When disabling a netconsole target, enabled_store() is called with enabled=false. Currently, this results in updating the nt->enabled field twice: 1. Inside the if/else block, with the target_list_lock spinlock held 2. Later, without the target_list_lock This patch eliminates the redundancy by setting the field only once, improving efficiency and reducing potential race conditions. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20240709144403.544099-3-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-11	net: netconsole: Remove unnecessary cast from bool	Breno Leitao
	The 'enabled' variable is already a bool, so casting it to its value is redundant. Remove the superfluous cast, improving code clarity without changing functionality. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20240709144403.544099-2-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-12	md/raid1: set max_sectors during early return from choose_slow_rdev()	Mateusz Jończyk
	Linux 6.9+ is unable to start a degraded RAID1 array with one drive, when that drive has a write-mostly flag set. During such an attempt, the following assertion in bio_split() is hit: BUG_ON(sectors <= 0); Call Trace: ? bio_split+0x96/0xb0 ? exc_invalid_op+0x53/0x70 ? bio_split+0x96/0xb0 ? asm_exc_invalid_op+0x1b/0x20 ? bio_split+0x96/0xb0 ? raid1_read_request+0x890/0xd20 ? __call_rcu_common.constprop.0+0x97/0x260 raid1_make_request+0x81/0xce0 ? __get_random_u32_below+0x17/0x70 ? new_slab+0x2b3/0x580 md_handle_request+0x77/0x210 md_submit_bio+0x62/0xa0 __submit_bio+0x17b/0x230 submit_bio_noacct_nocheck+0x18e/0x3c0 submit_bio_noacct+0x244/0x670 After investigation, it turned out that choose_slow_rdev() does not set the value of max_sectors in some cases and because of it, raid1_read_request calls bio_split with sectors == 0. Fix it by filling in this variable. This bug was introduced in commit dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()") but apparently hidden until commit 0091c5a269ec ("md/raid1: factor out helpers to choose the best rdev from read_balance()") shortly thereafter. Cc: stable@vger.kernel.org # 6.9.x+ Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl> Fixes: dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()") Cc: Song Liu <song@kernel.org> Cc: Yu Kuai <yukuai3@huawei.com> Cc: Paul Luse <paul.e.luse@linux.intel.com> Cc: Xiao Ni <xni@redhat.com> Cc: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> Link: https://lore.kernel.org/linux-raid/20240706143038.7253-1-mat.jonczyk@o2.pl/ -- Tested on both Linux 6.10 and 6.9.8. Inside a VM, mdadm testsuite for RAID1 on 6.10 did not find any problems: ./test --dev=loop --no-error --raidtype=raid1 (on 6.9.8 there was one failure, caused by external bitmap support not compiled in). Notes: - I was reliably getting deadlocks when adding / removing devices on such an array - while the array was loaded with fsstress with 20 concurrent processes. When the array was idle or loaded with fsstress with 8 processes, no such deadlocks happened in my tests. This occurred also on unpatched Linux 6.8.0 though, but not on 6.1.97-rc1, so this is likely an independent regression (to be investigated). - I was also getting deadlocks when adding / removing the bitmap on the array in similar conditions - this happened on Linux 6.1.97-rc1 also though. fsstress with 8 concurrent processes did cause it only once during many tests. - in my testing, there was once a problem with hot adding an internal bitmap to the array: mdadm: Cannot add bitmap while array is resyncing or reshaping etc. mdadm: failed to set internal bitmap. even though no such reshaping was happening according to /proc/mdstat. This seems unrelated, though. Reviewed-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240711202316.10775-1-mat.jonczyk@o2.pl
2024-07-12	md-cluster: fix no recovery job when adding/re-adding a disk	Heming Zhao
	The commit db5e653d7c9f ("md: delay choosing sync action to md_start_sync()") delays the start of the sync action. In a clustered environment, this will cause another node to first activate the spare disk and skip recovery. As a result, no nodes will perform recovery when a disk is added or re-added. Before db5e653d7c9f: ``` node1 node2 ---------------------------------------------------------------- md_check_recovery + md_update_sb \| sendmsg: METADATA_UPDATED + md_choose_sync_action process_metadata_update \| remove_and_add_spares //node1 has not finished adding + call mddev->sync_work //the spare disk:do nothing md_start_sync starts md_do_sync md_do_sync + grabbed resync_lockres:DLM_LOCK_EX + do syncing job md_check_recovery sendmsg: METADATA_UPDATED process_metadata_update //activate spare disk ... ... md_do_sync waiting to grab resync_lockres:EX ``` After db5e653d7c9f: (note: if 'cmd:idle' sets MD_RECOVERY_INTR after md_check_recovery starts md_start_sync, setting the INTR action will exacerbate the delay in node1 calling the md_do_sync function.) ``` node1 node2 ---------------------------------------------------------------- md_check_recovery + md_update_sb \| sendmsg: METADATA_UPDATED + calls mddev->sync_work process_metadata_update //node1 has not finished adding //the spare disk:do nothing md_start_sync + md_choose_sync_action \| remove_and_add_spares + calls md_do_sync md_check_recovery md_update_sb sendmsg: METADATA_UPDATED process_metadata_update //activate spare disk ... ... ... ... md_do_sync + grabbed resync_lockres:EX + raid1_sync_request skip sync under conf->fullsync:0 md_do_sync 1. waiting to grab resync_lockres:EX 2. when node1 could grab EX lock, node1 will skip resync under recovery_offset:MaxSector ``` How to trigger: ```(commands @node1) # to easily watch the recovery status echo 2000 > /proc/sys/dev/raid/speed_limit_max ssh root@node2 "echo 2000 > /proc/sys/dev/raid/speed_limit_max" mdadm -CR /dev/md0 -l1 -b clustered -n 2 /dev/sda /dev/sdb --assume-clean ssh root@node2 mdadm -A /dev/md0 /dev/sda /dev/sdb mdadm --manage /dev/md0 --fail /dev/sda --remove /dev/sda mdadm --manage /dev/md0 --add /dev/sdc === "cat /proc/mdstat" on both node, there are no recovery action. === ``` How to fix: because md layer code logic is hard to restore for speeding up sync job on local node, we add new cluster msg to pending the another node to active disk. Signed-off-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Su Yue <glass.su@suse.com> Acked-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240709104120.22243-2-heming.zhao@suse.com
2024-07-12	md-cluster: fix hanging issue while a new disk adding	Heming Zhao
	The commit 1bbe254e4336 ("md-cluster: check for timeout while a new disk adding") is correct in terms of code syntax but not suite real clustered code logic. When a timeout occurs while adding a new disk, if recv_daemon() bypasses the unlock for ack_lockres:CR, another node will be waiting to grab EX lock. This will cause the cluster to hang indefinitely. How to fix: 1. In dlm_lock_sync(), change the wait behaviour from forever to a timeout, This could avoid the hanging issue when another node fails to handle cluster msg. Another result of this change is that if another node receives an unknown msg (e.g. a new msg_type), the old code will hang, whereas the new code will timeout and fail. This could help cluster_md handle new msg_type from different nodes with different kernel/module versions (e.g. The user only updates one leg's kernel and monitors the stability of the new kernel). 2. The old code for __sendmsg() always returns 0 (success) under the design (must successfully unlock ->message_lockres). This commit makes this function return an error number when an error occurs. Fixes: 1bbe254e4336 ("md-cluster: check for timeout while a new disk adding") Signed-off-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Su Yue <glass.su@suse.com> Acked-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240709104120.22243-1-heming.zhao@suse.com
2024-07-12	bna: adjust 'name' buf size of bna_tcb and bna_ccb structures	Alexey Kodanev
	To have enough space to write all possible sprintf() args. Currently 'name' size is 16, but the first '%s' specifier may already need at least 16 characters, since 'bnad->netdev->name' is used there. For '%d' specifiers, assume that they require: * 1 char for 'tx_id + tx_info->tcb[i]->id' sum, BNAD_MAX_TXQ_PER_TX is 8 * 2 chars for 'rx_id + rx_info->rx_ctrl[i].ccb->id', BNAD_MAX_RXP_PER_RX is 16 And replace sprintf with snprintf. Detected using the static analysis tool - Svace. Fixes: 8b230ed8ec96 ("bna: Brocade 10Gb Ethernet device driver") Signed-off-by: Alexey Kodanev <aleksei.kodanev@bell-sw.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-11	Revert "drm/amd/display: Reset freesync config before update new state"	Leo Li
	This change caused PSR SU panels to not read from their remote fb, preventing us from entering self-refresh. It is a regression. This reverts commit 6b8487cdf9fc7bae707519ac5b5daeca18d1e85b. Signed-off-by: Leo Li <sunpeng.li@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-11	i40e: fix: remove needless retries of NVM update	Aleksandr Loktionov
	Remove wrong EIO to EGAIN conversion and pass all errors as is. After commit 230f3d53a547 ("i40e: remove i40e_status"), which should only replace F/W specific error codes with Linux kernel generic, all EIO errors suddenly started to be converted into EAGAIN which leads nvmupdate to retry until it timeouts and sometimes fails after more than 20 minutes in the middle of NVM update, so NVM becomes corrupted. The bug affects users only at the time when they try to update NVM, and only F/W versions that generate errors while nvmupdate. For example, X710DA2 with 0x8000ECB7 F/W is affected, but there are probably more... Command for reproduction is just NVM update: ./nvmupdate64 In the log instead of: i40e_nvmupd_exec_aq err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_ENOMEM) appears: i40e_nvmupd_exec_aq err -EIO aq_err I40E_AQ_RC_ENOMEM i40e: eeprom check failed (-5), Tx/Rx traffic disabled The problematic code did silently convert EIO into EAGAIN which forced nvmupdate to ignore EAGAIN error and retry the same operation until timeout. That's why NVM update takes 20+ minutes to finish with the fail in the end. Fixes: 230f3d53a547 ("i40e: remove i40e_status") Co-developed-by: Kelvin Kang <kelvin.kang@intel.com> Signed-off-by: Kelvin Kang <kelvin.kang@intel.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20240710224455.188502-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-11	Merge tag 'wireless-next-2024-07-11' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.11 Most likely the last "new features" pull request for v6.11 with changes both in stack and in drivers. The big thing is the multiple radios for wiphy feature which makes it possible to better advertise radio capabilities to user space. mt76 enabled MLO and iwlwifi re-enabled MLO, ath12k and rtw89 Wi-Fi 6 devices got WoWLAN support. Major changes: cfg80211/mac80211 * remove DEAUTH_NEED_MGD_TX_PREP flag * multiple radios per wiphy support mac80211_hwsim * multi-radio wiphy support ath12k * DebugFS support for datapath statistics * WCN7850: support for WoW (Wake on WLAN) * WCN7850: device-tree bindings ath11k * QCA6390: device-tree bindings iwlwifi * mvm: re-enable Multi-Link Operation (MLO) * aggregation (A-MSDU) optimisations rtw89 * preparation for RTL8852BE-VT support * WoWLAN support for WiFi 6 chips * 36-bit PCI DMA support mt76 * mt7925 Multi-Link Operation (MLO) support * tag 'wireless-next-2024-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (204 commits) wifi: mac80211: fix AP chandef capturing in CSA wifi: iwlwifi: correctly reference TSO page information wifi: mt76: mt792x: fix scheduler interference in drv own process wifi: mt76: mt7925: enabling MLO when the firmware supports it wifi: mt76: mt7925: remove the unused mt7925_mcu_set_chan_info wifi: mt76: mt7925: update mt7925_mac_link_bss_add for MLO wifi: mt76: mt7925: update mt7925_mcu_bss_basic_tlv for MLO wifi: mt76: mt7925: update mt7925_mcu_set_timing for MLO wifi: mt76: mt7925: update mt7925_mcu_sta_phy_tlv for MLO wifi: mt76: mt7925: update mt7925_mcu_sta_rate_ctrl_tlv for MLO wifi: mt76: mt7925: add mt7925_mcu_sta_eht_mld_tlv for MLO wifi: mt76: mt7925: update mt7925_mcu_sta_update for MLO wifi: mt76: mt7925: update mt7925_mcu_add_bss_info for MLO wifi: mt76: mt7925: update mt7925_mcu_bss_mld_tlv for MLO wifi: mt76: mt7925: update mt7925_mcu_sta_mld_tlv for MLO wifi: mt76: mt7925: add mt7925_[assign,unassign]_vif_chanctx wifi: mt76: add def_wcid to struct mt76_wcid wifi: mt76: mt7925: report link information in rx status wifi: mt76: mt7925: update rate index according to link id wifi: mt76: mt7925: add link handling in the mt7925_ipv6_addr_change ... ==================== Link: https://patch.msgid.link/20240711102353.0C849C116B1@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-11	Merge branch 'for-6.11/xor_fixes' into cxl-for-next	Dave Jiang
	Series to fix XOR math for DPA to SPA translation - Refactor and fold cxl_trace_hpa() into cxl_dpa_to_hpa() - Complete DPA->HPA->SPA translation and correct XOR translation issue - Add new method to verify a CXL target position - Remove old method of CXL target position verifiation
2024-07-12	i2c: rcar: ensure Gen3+ reset does not disturb local targets	Wolfram Sang
	R-Car Gen3+ needs a reset before every controller transfer. That erases configuration of a potentially in parallel running local target instance. To avoid this disruption, avoid controller transfers if a local target is running. Also, disable SMBusHostNotify because it requires being a controller and local target at the same time. Fixes: 3b770017b03a ("i2c: rcar: handle RXDMA HW behaviour on Gen3") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-07-11	cxl: Remove defunct code calculating host bridge target positions	Alison Schofield
	The CXL Spec 3.1 Table 9-22 requires that the BIOS populate the CFMWS target list in interleave target order. This means the calculations the CXL driver added to determine positions when XOR math is in use, along with the entire XOR vs Modulo call back setup is not needed. A prior patch added a common method to verify positions. Remove the now unused code related to the cxl_calc_hb_fn. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/2e2c32a2d0f1007e920b58712d15edad2e48d857.1719980933.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-07-11	cxl/region: Verify target positions using the ordered target list	Alison Schofield
	When a root decoder is configured the interleave target list is read from the BIOS populated CFMWS structure. Per the CXL spec 3.1 Table 9-22 the target list is in interleave order. The CXL driver populates its decoder target list in the same order and stores it in 'struct cxl_switch_decoder' field "@target: active ordered target list in current decoder configuration" Given the promise of an ordered list, the driver can stop duplicating the work of BIOS and simply check target positions against the ordered list during region configuration. The simplified check against the ordered list is presented here. A follow-on patch will remove the unused code. For Modulo arithmetic this is not a fix, only a simplification. For XOR arithmetic this is a fix for HB IW of 3,6,12. Fixes: f9db85bfec0d ("cxl/acpi: Support CXL XOR Interleave Math (CXIMS)") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/35d08d3aba08fee0f9b86ab1cef0c25116ca8a55.1719980933.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-07-11	cxl: Restore XOR'd position bits during address translation	Alison Schofield
	When a device reports a DPA in events like poison, general_media, and dram, the driver translates that DPA back to an HPA. Presently, the CXL driver translation only considers the Modulo position and will report the wrong HPA for XOR configured root decoders. Add a helper function that restores the XOR'd bits during DPA->HPA address translation. Plumb a root decoder callback to the new helper when XOR interleave arithmetic is in use. For Modulo arithmetic, just let the callback be NULL - as in no extra work required. Upon completion of a DPA->HPA translation a couple of checks are performed on the result. One simply confirms that the calculated HPA is within the address range of the region. That test is useful for both Modulo and XOR interleave arithmetic decodes. A second check confirms that the HPA is within an expected chunk based on the endpoints position in the region and the region granularity. An XOR decode disrupts the Modulo pattern making the chunk check useless. To align the checks with the proper decode, pull the region range check inline and use the helper to do the chunk check for Modulo decodes only. A cxl-test unit test is posted for upstream review here: https://lore.kernel.org/20240624210644.495563-1-alison.schofield@intel.com/ Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Tested-by: Diego Garcia Rodriguez <diego.garcia.rodriguez@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/1a1ac880d9f889bd6384e657e810431b9a0a72e5.1719980933.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-07-11	cxl/core: Fold cxl_trace_hpa() into cxl_dpa_to_hpa()	Alison Schofield
	Although cxl_trace_hpa() is used to populate TRACE EVENTs with HPA addresses the work it performs is a DPA to HPA translation not a trace. Tidy up this naming by moving the minimal work done in cxl_trace_hpa() into cxl_dpa_to_hpa() and use cxl_dpa_to_hpa() for trace event callbacks. Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Robert Richter <rrichter@amd.com> Link: https://patch.msgid.link/452a9b0c525b774c72d9d5851515ffa928750132.1719980933.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-07-11	Merge tag 'for-6.10/dm-fixes-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fix from Mikulas Patocka: - Fix broken discard for device mapper VDO target * tag 'for-6.10/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm vdo: replace max_discard_sectors with max_hw_discard_sectors
2024-07-11	drm/vboxvideo: fix mapping leaks	Philipp Stanner
	When the PCI devres API was introduced to this driver, it was wrongly assumed that initializing the device with pcim_enable_device() instead of pci_enable_device() will make all PCI functions managed. This is wrong and was caused by the quite confusing PCI devres API in which some, but not all, functions become managed that way. The function pci_iomap_range() is never managed. Replace pci_iomap_range() with the managed function pcim_iomap_range(). Fixes: 8558de401b5f ("drm/vboxvideo: use managed pci functions") Link: https://lore.kernel.org/r/20240613115032.29098-14-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com>
2024-07-11	PCI: Add managed pcim_iomap_range()	Philipp Stanner
	The only managed mapping function currently is pcim_iomap() which doesn't allow for mapping an area starting at a certain offset, which many drivers want. Add pcim_iomap_range() as an exported function. Link: https://lore.kernel.org/r/20240613115032.29098-13-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-11	PCI: Remove legacy pcim_release()	Philipp Stanner
	Thanks to preceding cleanup steps, pcim_release() is now not needed anymore and can be replaced by pcim_disable_device(), which is the exact counterpart to pcim_enable_device(). This permits removing further parts of the old PCI devres implementation. Replace pcim_release() with pcim_disable_device(). Remove the now unused function get_pci_dr(). Remove the struct pci_devres from pci.h. Link: https://lore.kernel.org/r/20240613115032.29098-12-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>