linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-03-14	cxl/pci: Support Global Persistent Flush (GPF)	Davidlohr Bueso
	Add support for GPF flows. It is found that the CXL specification around this to be a bit too involved from the driver side. And while this should really all handled by the hardware, this patch takes things with a grain of salt. Upon respective port enumeration, both phase timeouts are set to a max of 20 seconds, which is the NMI watchdog default for lockup detection. The premise is that the kernel does not have enough information to set anything better than a max across the board and hope devices finish their GPF flows within the platform energy budget. Timeout detection is based on dirty Shutdown semantics. The driver will mark it as dirty, expecting that the device clear it upon a successful GPF event. The admin may consult the device Health and check the dirty shutdown counter to see if there was a problem with data integrity. [ davej: Explicitly set return to 0 in update_gpf_port_dvsec() ] [ davej: Add spec reference for 'struct cxl_mbox_set_shutdown_state_in ] [ davej: Fix 0-day reported issue ] Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20250124233533.910535-1-dave@stgolabs.net Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-03-14	cxl/pmem: debug invalid serial number data	Yuquan Wang
	In a nvdimm interleave-set each device with an invalid or zero serial number may cause pmem region initialization to fail, but in cxl case such device could still set cookies of nd_interleave_set and create a nvdimm pmem region. This adds the validation of serial number in cxl pmem region creation. The event of no serial number would cause to fail to set the cookie and pmem region. For cxl-test to work properly, always +1 on mock device's serial number. Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/20250219040029.515451-2-wangyuquan1236@phytium.com.cn Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-03-14	cxl/cdat: Remove redundant gp_port initialization	Li Ming
	gp_port is already pointed to the grandparent port during its definition, remove a redundant code to let gp_port point to the grandparent port again. Signed-off-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://patch.msgid.link/20250211062054.300108-1-ming.li@zohomail.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-03-14	cxl/memdev: Remove unused partition values	Ira Weiny
	The next volatile and next persistent values are unused and are cluttering the cxl_memdev_state. Remove these values. Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/20250206-cxl-cleanup-v1-1-9ddf26dd8433@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-03-14	cxl/region: Drop goto pattern of construct_region()	Li Ming
	Some operations need to be protected by the cxl_region_rwsem in construct_region(). Currently, construct_region() uses down_write() and up_write() for the cxl_region_rwsem locking, so there is a goto pattern after down_write() invoked to release cxl_region_rwsem. construct region() can be optimized to remove the goto pattern. The changes are creating a new function called __construct_region() which will include all checking and operations protected by the cxl_region_rwsem, and using guard(rwsem_write) to replace down_write() and up_write() in __construct_region(). Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Li Ming <ming.li@zohomail.com> Link: https://patch.msgid.link/20250221013205.126419-1-ming.li@zohomail.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-03-14	cxl/region: Drop goto pattern in cxl_dax_region_alloc()	Li Ming
	In cxl_dax_region_alloc(), there is a goto pattern to release the rwsem cxl_region_rwsem when the function returns, the down_read() and up_read can be replaced by a guard(rwsem_read) then the goto pattern can be removed. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Li Ming <ming.li@zohomail.com> Link: https://patch.msgid.link/20250221012453.126366-7-ming.li@zohomail.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-03-14	cxl/core: Use guard() to drop goto pattern of cxl_dpa_alloc()	Li Ming
	In cxl_dpa_alloc(), some checking and operations need to be protected by a rwsem called cxl_dpa_rwsem, so there is a goto pattern in cxl_dpa_alloc() to release the rwsem. The goto pattern can be optimized by using guard() to hold the rwsem. Creating a new function called __cxl_dpa_alloc() to include all checking and operations needed to be protected by cxl_dpa_rwsem. Using guard(rwsem_write()) to hold cxl_dpa_rwsem at the beginning of the new function. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Li Ming <ming.li@zohomail.com> Link: https://patch.msgid.link/20250221012453.126366-6-ming.li@zohomail.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-03-14	cxl/core: Use guard() to drop the goto pattern of cxl_dpa_free()	Li Ming
	cxl_dpa_free() has a goto pattern to call up_write() for cxl_dpa_rwsem, it can be removed by using a guard() to replace the down_write() and up_write(). Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Li Ming <ming.li@zohomail.com> Link: https://patch.msgid.link/20250221012453.126366-5-ming.li@zohomail.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-03-14	cxl/memdev: cxl_memdev_ioctl() cleanup	Li Ming
	In cxl_memdev_ioctl(), the down_read(&cxl_memdev_rwsem) and up_read(&cxl_memdev_rwsem) can be replaced by a guard(rwsem_read)(&cxl_memdev_rwsem), it helps to remove the open-coded up_read(&cxl_memdev_rwsem). Besides, the local var 'rc' can be also removed to make the code more cleaner. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Li Ming <ming.li@zohomail.com> Link: https://patch.msgid.link/20250221012453.126366-4-ming.li@zohomail.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-03-14	cxl/core: cxl_mem_sanitize() cleanup	Li Ming
	In cxl_mem_sanitize(), the down_read() and up_read() for cxl_region_rwsem can be simply replaced by a guard(rwsem_read), and the local variable 'rc' can be removed. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Li Ming <ming.li@zohomail.com> Link: https://patch.msgid.link/20250221012453.126366-3-ming.li@zohomail.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-03-14	cxl/core: Use guard() to replace open-coded down_read/write()	Li Ming
	Some down/up_read() and down/up_write() cases can be replaced by a guard() simply to drop explicit unlock invoked. It helps to align coding style with current CXL subsystem's. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Li Ming <ming.li@zohomail.com> Link: https://patch.msgid.link/20250221012453.126366-2-ming.li@zohomail.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-03-14	Merge tag 'for-6.14/dm-fixes-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fix from Mikulas Patocka: - dm-flakey: fix memory corruption in optional corrupt_bio_byte feature * tag 'for-6.14/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm-flakey: Fix memory corruption in optional corrupt_bio_byte feature
2025-03-14	Merge branch 'for-6.15/fw-first-error-logging' into cxl-for-next2	Dave Jiang
	Add logging support for CXL CPER endpoint and port protocol errors. Including the 2 patches that was completed later. Link: https://lore.kernel.org/linux-cxl/20250123084421.127697-1-Smita.KoralahalliChannabasappa@amd.com/ Link: https://lore.kernel.org/linux-cxl/20250310223839.31342-1-Smita.KoralahalliChannabasappa@amd.com/
2025-03-14	cxl/pci: Add trace logging for CXL PCIe Port RAS errors	Smita Koralahalli
	The CXL drivers use kernel trace functions for logging endpoint and Restricted CXL host (RCH) Downstream Port RAS errors. Similar functionality is required for CXL Root Ports, CXL Downstream Switch Ports, and CXL Upstream Switch Ports. Introduce trace logging functions for both RAS correctable and uncorrectable errors specific to CXL PCIe Ports. Use them to trace FW-First Protocol errors. Co-developed-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20250310223839.31342-3-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-03-14	Merge tag 'block-6.14-20250313' of git://git.kernel.dk/linux	Linus Torvalds
	Pull block fixes from Jens Axboe: - NVMe pull request via Keith: - Concurrent pci error and hotplug handling fix (Keith) - Endpoint function fixes (Damien) - Fix for a regression introduced in this cycle with error checking for batched request completions (Shin'ichiro) * tag 'block-6.14-20250313' of git://git.kernel.dk/linux: block: change blk_mq_add_to_batch() third argument type to bool nvme: move error logging from nvme_end_req() to __nvme_end_req() nvmet: pci-epf: Do not add an IRQ vector if not needed nvmet: pci-epf: Set NVMET_PCI_EPF_Q_LIVE when a queue is fully created nvme-pci: fix stuck reset on concurrent DPC and HP
2025-03-14	acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors	Smita Koralahalli
	When PCIe AER is in FW-First, OS should process CXL Protocol errors from CPER records. Introduce support for handling and logging CXL Protocol errors. The defined trace events cxl_aer_uncorrectable_error and cxl_aer_correctable_error trace native CXL AER endpoint errors. Reuse them to trace FW-First Protocol errors. Since the CXL code is required to be called from process context and GHES is in interrupt context, use workqueues for processing. Similar to CXL CPER event handling, use kfifo to handle errors as it simplifies queue processing by providing lock free fifo operations. Add the ability for the CXL sub-system to register a workqueue to process CXL CPER protocol errors. [DJ: return cxl_cper_register_prot_err_work() directly in cxl_ras_init()] Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20250310223839.31342-2-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-03-14	Merge tag 'platform-drivers-x86-v6.14-5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Ilpo Järvinen: "Fixes and new HW support. The diff is a bit larger than I'd prefer at this point due to unwinding the amd/pmf driver's error handling properly instead of calling a deinit function that was a can full of worms. Summary: - amd/pmf: - Fix error handling in amd_pmf_init_smart_pc() - Fix missing hidden options for Smart PC - surface: aggregator_registry: Add Support for Surface Pro 11" * tag 'platform-drivers-x86-v6.14-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: MAINTAINERS: Update Ike Panhc's email address platform/x86/amd: pmf: Fix missing hidden options for Smart PC platform/surface: aggregator_registry: Add Support for Surface Pro 11 platform/x86/amd/pmf: fix cleanup in amd_pmf_init_smart_pc()
2025-03-14	Merge tag 'gpio-fixes-for-v6.14-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: "The first fix is a backport from my v6.15-rc1 queue that turned out to be needed in v6.14 as well but as the former diverged from my fixes branch I had to adjust the patch a bit. The second one fixes a regression observed in user-space where closing a file descriptor associated with a GPIO device results in a ~10ms delay due to the atomic notifier calling rcu_synchronize() when unregistering. Summary: - don't check the return value of gpio_chip::get_direction() when registering a GPIO chip - use raw notifier for line state events" * tag 'gpio-fixes-for-v6.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: cdev: use raw notifier for line state events gpiolib: don't check the retval of get_direction() when registering a chip
2025-03-14	remoteproc: imx_dsp_rproc: Document run_stall struct member	Daniel Baluta
	Add documentation for 'run_stall' imx_dsp_rproc struct member. This also fixes the following warning: warning: Function parameter or struct member 'run_stall' not described in 'imx_dsp_rproc' Fixes: 0184b4fdbad1 ("imx_dsp_rproc: Use reset controller API to control the DSP") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202503142125.IE33sCto-lkp@intel.com/ Signed-off-by: Daniel Baluta <daniel.baluta@nxp.com> Link: https://lore.kernel.org/r/20250314151720.1793719-1-daniel.baluta@nxp.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2025-03-14	arch_topology: Support SMT control for OF based system	Yicong Yang
	On building the topology from the devicetree, we've already gotten the SMT thread number of each core. Update the largest SMT thread number and enable the SMT control by the end of topology parsing. The framework's SMT control provides two interface to the users through /sys/devices/system/cpu/smt/control (Documentation/ABI/testing/sysfs-devices-system-cpu): 1) enable SMT by writing "on" and disable by "off" 2) enable SMT by writing max_thread_number or disable by writing 1 Both method support to completely disable/enable the SMT cores so both work correctly for symmetric SMT platform and asymmetric platform with non-SMT and one type SMT cores like: core A: 1 thread core B: X (X!=1) threads Note that for a theoretically possible multiple SMT-X (X>1) core platform the SMT control is also supported as expected but only by writing the "on/off" method. Reviewed-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20250311075143.61078-3-yangyicong@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-03-14	Merge tag 'reset-fixes-for-v6.14' of git://git.pengutronix.de/pza/linux into ↵	Arnd Bergmann
	arm/fixes Reset controller fixes for v6.14 * Fix lan966x boot with internal CPU by stopping reset-microchip-sparx5 from indirectly calling devm_request_mem_region() on a memory region shared with other devices. * tag 'reset-fixes-for-v6.14' of git://git.pengutronix.de/pza/linux: reset: mchp: sparx5: Fix for lan966x Link: https://lore.kernel.org/r/20250314164401.743984-1-p.zabel@pengutronix.de Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-03-14	soc: hisilicon: kunpeng_hccs: Fix incorrect string assembly	Huisong Li
	String assembly should use sysfs_emit_at() instead of sysfs_emit(). Fixes: 23fe8112a231 ("soc: hisilicon: kunpeng_hccs: Add used HCCS types sysfs") Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Huisong Li <lihuisong@huawei.com> Link: https://lore.kernel.org/r/20250314100143.3377268-1-lihuisong@huawei.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-03-14	Merge tag 'qcom-drivers-fixes-for-6.14' of ↵	Arnd Bergmann
	https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes Qualcomm driver fixes for v6.14 Fixes a locking issue in the PDR implementation, which manifest itself as transaction timeouts during the startup procedure for some remoteprocs. A registration race is fixed in the custom efivars implementation, resolving reported NULL pointer dereferences. Error handling related to tzmem allocation is corrected, to ensure that the allocation error is propagated. Lastly a trivial merge mistake in pmic_glink is addressed. * tag 'qcom-drivers-fixes-for-6.14' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: soc: qcom: pdr: Fix the potential deadlock firmware: qcom: uefisecapp: fix efivars registration race firmware: qcom: scm: Fix error code in probe() soc: qcom: pmic_glink: Drop redundant pg assignment before taking lock Link: https://lore.kernel.org/r/20250311022509.1232678-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-03-14	memory: omap-gpmc: drop no compatible check	Roger Quadros
	We are no longer depending on legacy device trees so drop the no compatible check for NAND and OneNAND nodes. Suggested-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://lore.kernel.org/r/20250114-omap-gpmc-drop-no-compatible-check-v1-1-262c8d549732@kernel.org Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-03-14	accel/qaic: Fix integer overflow in qaic_validate_req()	Dan Carpenter
	These are u64 variables that come from the user via qaic_attach_slice_bo_ioctl(). Use check_add_overflow() to ensure that the math doesn't have an integer wrapping bug. Cc: stable@vger.kernel.org Fixes: ff13be830333 ("accel/qaic: Add datapath") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Link: https://patchwork.freedesktop.org/patch/msgid/176388fa-40fe-4cb4-9aeb-2c91c22130bd@stanley.mountain
2025-03-14	accel/qaic: Fix possible data corruption in BOs > 2G	Jeffrey Hugo
	When slicing a BO, we need to iterate through the BO's sgt to find the right pieces to construct the slice. Some of the data types chosen for this process are incorrectly too small, and can overflow. This can result in the incorrect slice construction, which can lead to data corruption in workload execution. The device can only handle 32-bit sized transfers, and the scatterlist struct only supports 32-bit buffer sizes, so our upper limit for an individual transfer is an unsigned int. Using an int is incorrect due to the reservation of the sign bit. Upgrade the length of a scatterlist entry and the offsets into a scatterlist entry to unsigned int for a correct representation. While each transfer may be limited to 32-bits, the overall BO may exceed that size. For counting the total length of the BO, we need a type that can represent the largest allocation possible on the system. That is the definition of size_t, so use it. Fixes: ff13be830333 ("accel/qaic: Add datapath") Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> Reviewed-by: Troy Hanson <quic_thanson@quicinc.com> Reviewed-by: Youssef Samir <quic_yabdulra@quicinc.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306171959.853466-1-jeff.hugo@oss.qualcomm.com
2025-03-14	PCI: dw-rockchip: Hide broken ATS capability for RK3588 running in EP mode	Niklas Cassel
	When running the RK3588 in Endpoint mode, with an Intel host with IOMMU enabled, the host side prints: DMAR: VT-d detected Invalidation Time-out Error: SID 0 When running the RK3588 in Endpoint mode, with an AMD host with IOMMU enabled, the host side prints: iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT device=63:00.0 address=0x42b5b01a0] Rockchip has confirmed that the ATS support for RK3588 only works when running the PCIe controller in Root Complex (RC) mode, see: https://lore.kernel.org/linux-pci/93cdce39-1ae6-4939-a3fc-db10be7564e5@rock-chips.com Usually, to handle these issues, we add a quirk for the PCI vendor and device ID in drivers/pci/quirks.c with quirk_no_ats(). That is because we cannot usually modify the capabilities on the EP side. In this case, we can modify the capabilities on the EP side. Thus, hide the broken ATS capability on RK3588 when running in EP mode. That way, we don't need any quirk on the host side, and we see no errors on the host side, and we can run pci_endpoint_test successfully, with the IOMMU enabled on the host side. Acked-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Niklas Cassel <cassel@kernel.org> [kwilczynski: commit log, tidy up code comments and error message] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20250310094826.842681-6-cassel@kernel.org
2025-03-14	PCI: dwc: ep: Add dw_pcie_ep_hide_ext_capability()	Niklas Cassel
	Add dw_pcie_ep_hide_ext_capability() which can be used by an endpoint controller driver to hide a capability. This can be useful to hide a capability that is buggy, such that the host side does not try to enable the buggy capability. Suggested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Link: https://lore.kernel.org/r/20250310094826.842681-5-cassel@kernel.org
2025-03-14	PCI: dwc: ep: Return -ENOMEM for allocation failures	Dan Carpenter
	If the bitmap or memory allocations fail, then dw_pcie_ep_init_registers() will incorrectly return a success. Return -ENOMEM instead. Fixes: 869bc5253406 ("PCI: dwc: ep: Fix DBI access failure for drivers requiring refclk from host") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Krzysztof Wilczyński <kw@linux.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/36dcb6fc-f292-4dd5-bd45-a8c6f9dc3df7@stanley.mountain
2025-03-14	misc: pci_endpoint_test: Use IRQ_TYPE_* defines from UAPI header	Niklas Cassel
	Use the IRQ_TYPE_* defines from the UAPI header rather than duplicating these defines in the driver itself. No functional change. Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Link: https://lore.kernel.org/r/20250310111016.859445-11-cassel@kernel.org
2025-03-14	misc: pci_endpoint_test: Do not use managed IRQ functions	Kunihiko Hayashi
	The pci_endpoint_test_request_irq() and pci_endpoint_test_release_irq() are called repeatedly by the users through pci_endpoint_test_set_irq(). So using the managed version of IRQ functions within these functions has no effect. Suggested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20250225110252.28866-7-hayashi.kunihiko@socionext.com Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-14	misc: pci_endpoint_test: Remove global 'irq_type' and 'no_msi'	Kunihiko Hayashi
	The global variable "irq_type" preserves the current value of ioctl(GET_IRQTYPE). However, all tests that use interrupts first call ioctl(SET_IRQTYPE) to set "test->irq_type", then write the value of test->irq_type into the register pointed by test_reg_bar, and request the interrupt to the endpoint. The endpoint function driver, pci-epf-test, refers to the register, and determine which type of interrupt to raise. The global variable "irq_type" is never used in the actual test, so remove the variable and replace it with "test->irq_type". Also, for the same reason, the variable "no_msi" can be removed. Initially, "test->irq_type" has IRQ_TYPE_UNDEFINED, and the ioctl(GET_IRQTYPE) before calling ioctl(SET_IRQTYPE) will return an error. Suggested-by: Niklas Cassel <cassel@kernel.org> Suggested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20250225110252.28866-6-hayashi.kunihiko@socionext.com
2025-03-14	misc: pci_endpoint_test: Fix 'irq_type' to convey the correct type	Kunihiko Hayashi
	There are two variables that indicate the interrupt type to be used in the next test execution, "irq_type" as global and "test->irq_type". The global is referenced from pci_endpoint_test_get_irq() to preserve the current type for ioctl(PCITEST_GET_IRQTYPE). The type set in this function isn't reflected in the global "irq_type", so ioctl(PCITEST_GET_IRQTYPE) returns the previous type. As a result, the wrong type is displayed in old version of "pcitest" as follows: - Result of running "pcitest -i 0" SET IRQ TYPE TO LEGACY: OKAY - Result of running "pcitest -I" GET IRQ TYPE: MSI Whereas running the new version of "pcitest" in kselftest results in an error as follows: # RUN pci_ep_basic.LEGACY_IRQ_TEST ... # pci_endpoint_test.c:104:LEGACY_IRQ_TEST:Expected 0 (0) == ret (1) # pci_endpoint_test.c:104:LEGACY_IRQ_TEST:Can't get Legacy IRQ type Fix this issue by propagating the current type to the global "irq_type". Fixes: b2ba9225e031 ("misc: pci_endpoint_test: Avoid using module parameter to determine irqtype") Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250225110252.28866-5-hayashi.kunihiko@socionext.com
2025-03-14	reset: imx: fix incorrect module device table	Arnd Bergmann
	The ID table is for of_device_id, not platform_device_id: ERROR: modpost: drivers/reset/reset-imx-scu: type mismatch between imx_scu_reset_ids[] and MODULE_DEVICE_TABLE(platform, ...) Fixes: 6b64fde5c183 ("reset: imx: Add SCU reset driver for i.MX8QXP and i.MX8QM") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20250314153541.3555813-1-arnd@kernel.org Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-03-14	PCI: Check BAR index for validity	Philipp Stanner
	Many functions in PCI use accessor macros such as pci_resource_len(), which take a BAR index. That index, however, is never checked for validity, potentially resulting in undefined behavior by overflowing the array pci_dev.resource in the macro pci_resource_n(). Since many users of those macros directly assign the accessed value to an unsigned integer, the macros cannot be changed easily anymore to return -EINVAL for invalid indexes. Consequently, the problem has to be mitigated in higher layers. Add pci_bar_index_valid(). Use it where appropriate. Link: https://lore.kernel.org/r/20250312080634.13731-4-phasta@kernel.org Closes: https://lore.kernel.org/all/adb53b1f-29e1-3d14-0e61-351fd2d3ff0d@linux.intel.com/ Reported-by: Bingbu Cao <bingbu.cao@linux.intel.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> [kwilczynski: correct if-statement condition the pci_bar_index_is_valid() helper function uses, tidy up code comments] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: fix typo] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-14	clk: amlogic: a1: fix a typo	Jian Hu
	Fix a typo in MODULE_DESCRIPTION for a1 PLL driver, S4 should be A1. Signed-off-by: Jian Hu <jian.hu@amlogic.com> Reviewed-by: Dmitry Rokosov <ddrokosov@salutedevices.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20241231062552.2982266-1-jian.hu@amlogic.com Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
2025-03-14	clk: amlogic: gxbb: drop non existing 32k clock parent	Jerome Brunet
	The 32k clock reference a parent 'cts_slow_oscin' with a fixme note saying that this clock should be provided by AO controller. The HW probably has this clock but it does not exist at the moment in any controller implementation. Furthermore, referencing clock by the global name should be avoided whenever possible. There is no reason to keep this hack around, at least for now. Fixes: 14c735c8e308 ("clk: meson-gxbb: Add EE 32K Clock for CEC") Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20241220-amlogic-clk-gxbb-32k-fixes-v1-2-baca56ecf2db@baylibre.com Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
2025-03-14	clk: amlogic: gxbb: drop incorrect flag on 32k clock	Jerome Brunet
	gxbb_32k_clk_div sets CLK_DIVIDER_ROUND_CLOSEST in the init_data flag which is incorrect. This is field is not where the divider flags belong. Thankfully, CLK_DIVIDER_ROUND_CLOSEST maps to bit 4 which is an unused clock flag, so there is no unintended consequence to this error. Effectively, the clock has been used without CLK_DIVIDER_ROUND_CLOSEST so far, so just drop it. Fixes: 14c735c8e308 ("clk: meson-gxbb: Add EE 32K Clock for CEC") Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20241220-amlogic-clk-gxbb-32k-fixes-v1-1-baca56ecf2db@baylibre.com Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
2025-03-14	clk: amlogic: g12b: fix cluster A parent data	Jerome Brunet
	Several clocks used by both g12a and g12b use the g12a cpu A clock hw pointer as clock parent. This is incorrect on g12b since the parents of cluster A cpu clock are different. Also the hw clock provided as parent to these children is not even registered clock on g12b. Fix the problem by reverting to the global namespace and let CCF pick the appropriate, as it is already done for other clocks, such as cpu_clk_trace_div. Fixes: 25e682a02d91 ("clk: meson: g12a: migrate to the new parent description method") Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20241213-amlogic-clk-g12a-cpua-parent-fix-v1-1-d8c0f41865fe@baylibre.com Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
2025-03-14	clk: amlogic: g12a: fix mmc A peripheral clock	Jerome Brunet
	The bit index of the peripheral clock for mmc A is wrong This was probably not a problem for mmc A as the peripheral is likely left enabled by the bootloader. No issues has been reported so far but it could be a problem, most likely some form of conflict between the ethernet and mmc A clock, breaking ethernet on init. Use the value provided by the documentation for mmc A before this becomes an actual problem. Fixes: 085a4ea93d54 ("clk: meson: g12a: add peripheral clock controller") Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20241213-amlogic-clk-g12a-mmca-fix-v1-1-5af421f58b64@baylibre.com Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
2025-03-14	PCI: pciehp: Avoid unnecessary device replacement check	Lukas Wunner
	Hot-removal of nested PCI hotplug ports suffers from a long-standing race condition which can lead to a deadlock: A parent hotplug port acquires pci_lock_rescan_remove(), then waits for pciehp to unbind from a child hotplug port. Meanwhile that child hotplug port tries to acquire pci_lock_rescan_remove() as well in order to remove its own children. The deadlock only occurs if the parent acquires pci_lock_rescan_remove() first, not if the child happens to acquire it first. Several workarounds to avoid the issue have been proposed and discarded over the years, e.g.: https://lore.kernel.org/r/4c882e25194ba8282b78fe963fec8faae7cf23eb.1529173804.git.lukas@wunner.de/ A proper fix is being worked on, but needs more time as it is nontrivial and necessarily intrusive. Recent commit 9d573d19547b ("PCI: pciehp: Detect device replacement during system sleep") provokes more frequent occurrence of the deadlock when removing more than one Thunderbolt device during system sleep. The commit sought to detect device replacement, but also triggered on device removal. Differentiating reliably between replacement and removal is impossible because pci_get_dsn() returns 0 both if the device was removed, as well as if it was replaced with one lacking a Device Serial Number. Avoid the more frequent occurrence of the deadlock by checking whether the hotplug port itself was hot-removed. If so, there's no sense in checking whether its child device was replaced. This works because the ->resume_noirq() callback is invoked in top-down order for the entire hierarchy: A parent hotplug port detecting device replacement (or removal) marks all children as removed using pci_dev_set_disconnected() and a child hotplug port can then reliably detect being removed. Link: https://lore.kernel.org/r/02f166e24c87d6cde4085865cce9adfdfd969688.1741674172.git.lukas@wunner.de Fixes: 9d573d19547b ("PCI: pciehp: Detect device replacement during system sleep") Reported-by: Kenneth Crudup <kenny@panix.com> Closes: https://lore.kernel.org/r/83d9302a-f743-43e4-9de2-2dd66d91ab5b@panix.com/ Reported-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com> Closes: https://lore.kernel.org/r/20240926125909.2362244-1-acelan.kao@canonical.com/ Tested-by: Kenneth Crudup <kenny@panix.com> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Cc: stable@vger.kernel.org # v6.11+
2025-03-14	drm: adp: Fix NULL vs IS_ERR() check in adp_plane_new()	Dan Carpenter
	The __drmm_universal_plane_alloc() function doesn't return NULL, it returns error pointers. Update the check to match. Fixes: 332122eba628 ("drm: adp: Add Apple Display Pipe driver") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Sasha Finkelstein <fnkl.kernel@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/14a845e8-54d0-45f8-b8b9-927609d92ede@stanley.mountain Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2025-03-14	PM: sleep: Fix handling devices with direct_complete set on errors	Rafael J. Wysocki
	When dpm_suspend() fails, some devices with power.direct_complete set may not have been handled by device_suspend() yet, so runtime PM has not been disabled for them yet even though power.direct_complete is set. Since device_resume() expects that runtime PM has been disabled for all devices with power.direct_complete set, it will attempt to reenable runtime PM for the devices that have not been processed by device_suspend() which does not make sense. Had those devices had runtime PM disabled before device_suspend() had run, device_resume() would have inadvertently enable runtime PM for them, but this is not expected to happen because it would require ->prepare() callbacks to return positive values for devices with runtime PM disabled, which would be invalid. In practice, this issue is most likely benign because pm_runtime_enable() will not allow the "disable depth" counter to underflow, but it causes a warning message to be printed for each affected device. To allow device_resume() to distinguish the "direct complete" devices that have been processed by device_suspend() from those which have not been handled by it, make device_suspend() set power.is_suspended for "direct complete" devices. Next, move the power.is_suspended check in device_resume() before the power.direct_complete check in it to make it skip the "direct complete" devices that have not been handled by device_suspend(). This change is based on a preliminary patch from Saravana Kannan. Fixes: aae4518b3124 ("PM / sleep: Mechanism to avoid resuming runtime-suspended devices unnecessarily") Link: https://lore.kernel.org/linux-pm/20241114220921.2529905-2-saravanak@google.com/ Reported-by: Saravana Kannan <saravanak@google.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Saravana Kannan <saravanak@google.com> Link: https://patch.msgid.link/12627587.O9o76ZdvQC@rjwysocki.net
2025-03-14	dm: restrict dm device size to 2^63-512 bytes	Mikulas Patocka
	The devices with size >= 2^63 bytes can't be used reliably by userspace because the type off_t is a signed 64-bit integer. Therefore, we limit the maximum size of a device mapper device to 2^63-512 bytes. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-03-14	Coresight: Fix a NULL vs IS_ERR() bug in probe	Dan Carpenter
	The devm_platform_get_and_ioremap_resource() function doesn't return NULL, it returns error pointers. Update the checking to match. Fixes: f78d206f3d73 ("Coresight: Add Coresight TMC Control Unit driver") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/dab039b9-d58a-41be-92f0-ff209cfabfe2@stanley.mountain
2025-03-14	can: flexcan: disable transceiver during system PM	Haibo Chen
	During system PM, if no wakeup requirement, disable transceiver to save power. Fixes: 4de349e786a3 ("can: flexcan: fix resume function") Cc: stable@vger.kernel.org Reviewed-by: Frank Li <frank.li@nxp.com> Signed-off-by: Haibo Chen <haibo.chen@nxp.com> Link: https://patch.msgid.link/20250314110145.899179-2-haibo.chen@nxp.com [mkl: add newlines] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-03-14	can: flexcan: only change CAN state when link up in system PM	Haibo Chen
	After a suspend/resume cycle on a down interface, it will come up as ERROR-ACTIVE. $ ip -details -s -s a s dev flexcan0 3: flexcan0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN group default qlen 10 link/can promiscuity 0 allmulti 0 minmtu 0 maxmtu 0 can state STOPPED (berr-counter tx 0 rx 0) restart-ms 1000 $ sudo systemctl suspend $ ip -details -s -s a s dev flexcan0 3: flexcan0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN group default qlen 10 link/can promiscuity 0 allmulti 0 minmtu 0 maxmtu 0 can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 1000 And only set CAN state to CAN_STATE_ERROR_ACTIVE when resume process has no issue, otherwise keep in CAN_STATE_SLEEPING as suspend did. Fixes: 4de349e786a3 ("can: flexcan: fix resume function") Cc: stable@vger.kernel.org Signed-off-by: Haibo Chen <haibo.chen@nxp.com> Link: https://patch.msgid.link/20250314110145.899179-1-haibo.chen@nxp.com Reported-by: Marc Kleine-Budde <mkl@pengutronix.de> Closes: https://lore.kernel.org/all/20250314-married-polar-elephant-b15594-mkl@pengutronix.de [mkl: add newlines] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-03-14	mfd: cgbc-core: Cleanup signedness in cgbc_session_request()	Dan Carpenter
	This doesn't affect how the code works because there are some implicit casts, but the "ret" variable is used to hold negative error codes so it should be type int. Declare it as "int" instead of "unsigned int". Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/9e812dfa-e216-4e40-bbf0-d0b63b110bb0@stanley.mountain Signed-off-by: Lee Jones <lee@kernel.org>
2025-03-14	mfd: pcf50633: Remove remaining PCF50633 support	Dr. David Alan Gilbert
	Remove the remaining parts of the 50633, the core, headers and glue. The pcf50633 was used as part of the OpenMoko devices but the support for its main chip was recently removed in: commit 61b7f8920b17 ("ARM: s3c: remove all s3c24xx support") See https://lore.kernel.org/all/Z8z236h4B5A6Ki3D@gallifrey/ Remove it. Signed-off-by: "Dr. David Alan Gilbert" <linux@treblig.org> Link: https://lore.kernel.org/r/20250311014959.743322-10-linux@treblig.org Signed-off-by: Lee Jones <lee@kernel.org>
2025-03-14	mfd: pcf50633: Remove unused platform IRQ code	Dr. David Alan Gilbert
	As part of the pcf50633 removal, take out it's irq code (which includes one bit still called from the core, but it'll go soon). Signed-off-by: "Dr. David Alan Gilbert" <linux@treblig.org> Link: https://lore.kernel.org/r/20250311014959.743322-9-linux@treblig.org Signed-off-by: Lee Jones <lee@kernel.org>