summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-07-12octeontx2-af: fix issue with IPv6 ext match for RSSKiran Kumar K
While performing RSS based on IPv6, extension ltype is not being considered. This will be problem for fragmented packets or packets with extension header. Adding changes to match IPv6 ext header along with IPv6 ltype. Fixes: 41a7aa7b800d ("octeontx2-af: NIX Rx flowkey configuration for RSS") Signed-off-by: Kiran Kumar K <kirankumark@marvell.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-12octeontx2-af: fix detection of IP layerMichal Mazur
Checksum and length checks are not enabled for IPv4 header with options and IPv6 with extension headers. To fix this a change in enum npc_kpu_lc_ltype is required which will allow adjustment of LTYPE_MASK to detect all types of IP headers. Fixes: 21e6699e5cd6 ("octeontx2-af: Add NPC KPU profile") Signed-off-by: Michal Mazur <mmazur2@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-12octeontx2-af: fix a issue with cpt_lf_alloc mailboxSrujana Challa
This patch fixes CPT_LF_ALLOC mailbox error due to incompatible mailbox message format. Specifically, it corrects the `blkaddr` field type from `int` to `u8`. Fixes: de2854c87c64 ("octeontx2-af: Mailbox changes for 98xx CPT block") Signed-off-by: Srujana Challa <schalla@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-12octeontx2-af: replace cpt slot with lf id on reg writeNithin Dabilpuram
Replace slot id with global CPT lf id on reg read/write as CPTPF/VF driver would send slot number instead of global lf id in the reg offset. And also update the mailbox response with the global lf's register offset. Fixes: ae454086e3c2 ("octeontx2-af: add mailbox interface for CPT") Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-12MAINTAINERS: Update FREESCALE SOC DRIVERS and QUICC ENGINE LIBRARYChristophe Leroy
FREESCALE SOC DRIVERS has been orphaned since commit eaac25d026a1 ("MAINTAINERS: Drop Li Yang as their email address stopped working") QUICC ENGINE LIBRARY has Qiang Zhao as maintainer but he hasn't responded for years and when Li Yang was still maintaining FREESCALE SOC DRIVERS he was also handling QUICC ENGINE LIBRARY directly. As a maintainer of LINUX FOR POWERPC EMBEDDED PPC8XX AND PPC83XX, I also need FREESCALE SOC DRIVERS to be actively maintained, so add myself as maintainer of FREESCALE SOC DRIVERS and QUICC ENGINE LIBRARY. See below link for more context. Link: https://lore.kernel.org/linuxppc-dev/20240219153016.ntltc76bphwrv6hn@skbuf/T/#mf6d4a5eef79e8eae7ae0456a2794c01e630a6756 Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-12MAINTAINERS: Add more maintainers for omapsTony Lindgren
There are many generations of omaps to maintain, and I will be only active as a hobbyist with time permitting. Let's add more maintainers to ensure continued Linux support. TI is interested in maintaining the active SoCs such as am3, am4 and dra7. And the hobbyists are interested in maintaining some of the older devices, mainly based on omap3 and 4 SoCs. Kevin and Roger have agreed to maintain the active TI parts. Both Kevin and Roger have been working on the omap variants for a long time, and have a good understanding of the hardware. Aaro and Andreas have agreed to maintain the community devices. Both Aaro and Andreas have long experience on working with the earlier TI SoCs. While at it, let's also change me to be a reviewer for the omap1, and drop the link to my old omap web page. Signed-off-by: Tony Lindgren <tony@atomide.com> Acked-by: Kevin Hilman <khilman@baylibre.com> Acked-by: Aaro Koskinen <aaro.koskinen@iki.fi> Acked-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-12MAINTAINERS: add 's32@nxp.com' as relevant mailing list for ↵Ciprian Costea
'sdhci-esdhc-imx' driver Since NXP S32G2 and S32G3 SoCs share the SDHCI controller with I.MX platforms it would be valuable to add 's32@nxp.com' as a relevant mailing list in this area. Signed-off-by: Ciprian Costea <ciprianmarian.costea@oss.nxp.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Haibo Chen <haibo.chen@nxp.com> Link: https://lore.kernel.org/r/20240708121018.246476-4-ciprianmarian.costea@oss.nxp.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-07-12mmc: sdhci-esdhc-imx: obtain the 'per' clock rate after its enablementCiprian Costea
The I.MX SDHCI driver assumes that the frequency of the 'per' clock can be obtained even on disabled clocks, which is not always the case. According to 'clk_get_rate' documentation, it is only valid once the clock source has been enabled. Signed-off-by: Ciprian Costea <ciprianmarian.costea@oss.nxp.com> Reviewed-by: Haibo Chen <haibo.chen@nxp.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20240708121018.246476-3-ciprianmarian.costea@oss.nxp.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-07-12mmc: sdhci-esdhc-imx: disable card detect wake for S32G based platformsCiprian Costea
In case of S32G based platforms, GPIO CD used for card detect wake mechanism is not available. For this scenario the newly introduced flag 'ESDHC_FLAG_SKIP_CD_WAKE' is used. Signed-off-by: Ciprian Costea <ciprianmarian.costea@oss.nxp.com> Reviewed-by: Haibo Chen <haibo.chen@nxp.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20240708121018.246476-2-ciprianmarian.costea@oss.nxp.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-07-12i2c: testunit: avoid re-issued work after read messageWolfram Sang
The to-be-fixed commit rightfully prevented that the registers will be cleared. However, the index must be cleared. Otherwise a read message will re-issue the last work. Fix it and add a comment describing the situation. Fixes: c422b6a63024 ("i2c: testunit: don't erase registers after STOP") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Andi Shyti <andi.shyti@kernel.org> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2024-07-12Merge tag 'amd-drm-fixes-6.10-2024-07-11' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.10-2024-07-11: amdgpu: - PSR-SU fix - Reseved VMID fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240712005534.803064-1-alexander.deucher@amd.com
2024-07-12Merge tag 'drm-xe-fixes-2024-07-11' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes UAPI Changes: - Use write-back caching mode for system memory on DGFX (Thomas) Driver Changes: - Do not leak object when finalizing hdcp gsc (Nirmoy) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/vgqz35btnxdddko3byrgww5ii36wig2tvondg2p3j3b3ourj4i@rqgolll3wwkh
2024-07-11Revert "drm/amd/display: Reset freesync config before update new state"Leo Li
This change caused PSR SU panels to not read from their remote fb, preventing us from entering self-refresh. It is a regression. This reverts commit 6b8487cdf9fc7bae707519ac5b5daeca18d1e85b. Signed-off-by: Leo Li <sunpeng.li@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-11i40e: fix: remove needless retries of NVM updateAleksandr Loktionov
Remove wrong EIO to EGAIN conversion and pass all errors as is. After commit 230f3d53a547 ("i40e: remove i40e_status"), which should only replace F/W specific error codes with Linux kernel generic, all EIO errors suddenly started to be converted into EAGAIN which leads nvmupdate to retry until it timeouts and sometimes fails after more than 20 minutes in the middle of NVM update, so NVM becomes corrupted. The bug affects users only at the time when they try to update NVM, and only F/W versions that generate errors while nvmupdate. For example, X710DA2 with 0x8000ECB7 F/W is affected, but there are probably more... Command for reproduction is just NVM update: ./nvmupdate64 In the log instead of: i40e_nvmupd_exec_aq err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_ENOMEM) appears: i40e_nvmupd_exec_aq err -EIO aq_err I40E_AQ_RC_ENOMEM i40e: eeprom check failed (-5), Tx/Rx traffic disabled The problematic code did silently convert EIO into EAGAIN which forced nvmupdate to ignore EAGAIN error and retry the same operation until timeout. That's why NVM update takes 20+ minutes to finish with the fail in the end. Fixes: 230f3d53a547 ("i40e: remove i40e_status") Co-developed-by: Kelvin Kang <kelvin.kang@intel.com> Signed-off-by: Kelvin Kang <kelvin.kang@intel.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20240710224455.188502-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-11net: ethtool: Fix RSS settingSaeed Mahameed
When user submits a rxfh set command without touching XFRM_SYM_XOR, rxfh.input_xfrm is set to RXH_XFRM_NO_CHANGE, which is equal to 0xff. Testing if (rxfh.input_xfrm & RXH_XFRM_SYM_XOR && !ops->cap_rss_sym_xor_supported) return -EOPNOTSUPP; Will always be true on devices that don't set cap_rss_sym_xor_supported, since rxfh.input_xfrm & RXH_XFRM_SYM_XOR is always true, if input_xfrm was not set, i.e RXH_XFRM_NO_CHANGE=0xff, which will result in failure of any command that doesn't require any change of XFRM, e.g RSS context or hash function changes. To avoid this breakage, test if rxfh.input_xfrm != RXH_XFRM_NO_CHANGE before testing other conditions. Note that the problem will only trigger with XFRM-aware userspace, old ethtool CLI would continue to work. Fixes: 0dd415d15505 ("net: ethtool: add a NO_CHANGE uAPI for new RXFH's input_xfrm") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com> Link: https://patch.msgid.link/20240710225538.43368-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-11bcachefs: bch2_gc_btree() should not use btree_root_lockKent Overstreet
btree_root_lock is for the root keys in btree_root, not the pointers to the nodes themselves; this fixes a lock ordering issue between btree_root_lock and btree node locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-11bcachefs: Set PF_MEMALLOC_NOFS when trans->lockedKent Overstreet
proper lock ordering is: fs_reclaim -> btree node locks Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-11bcachefs; Use trans_unlock_long() when waiting on allocatorKent Overstreet
not using unlock_long() blocks key cache reclaim, and the allocator may take awhile Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-11Revert "bcachefs: Mark bch_inode_info as SLAB_ACCOUNT"Kent Overstreet
This reverts commit 86d81ec5f5f05846c7c6e48ffb964b24cba2e669. This wasn't tested with memcg enabled, it immediately hits a null ptr deref in list_lru_add(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-12i2c: rcar: ensure Gen3+ reset does not disturb local targetsWolfram Sang
R-Car Gen3+ needs a reset before every controller transfer. That erases configuration of a potentially in parallel running local target instance. To avoid this disruption, avoid controller transfers if a local target is running. Also, disable SMBusHostNotify because it requires being a controller and local target at the same time. Fixes: 3b770017b03a ("i2c: rcar: handle RXDMA HW behaviour on Gen3") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-07-12spi: dt-bindings: at91: Add sama7d65 compatible stringNicolas Ferre
Add compatible string for sama7d65. Like sam9x60 and sam9x7, it requires to bind to "atmel,at91rm9200-spi". Group these three under the same enum, sorted alphanumerically, and remove previously added item. Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com> Acked-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20240711165402.373634-1-nicolas.ferre@microchip.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-07-11workqueue: Rename wq_update_pod() to unbound_wq_update_pwq()Lai Jiangshan
What wq_update_pod() does is just to update the pwq of the specific cpu. Rename it and update the comments. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-11workqueue: Remove the arguments @hotplug_cpu and @online from wq_update_pod()Lai Jiangshan
The arguments @hotplug_cpu and @online are not used in wq_update_pod() since the functions called by wq_update_pod() don't need them. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-11workqueue: Remove the argument @cpu_going_down from wq_calc_pod_cpumask()Lai Jiangshan
wq_calc_pod_cpumask() uses wq_online_cpumask, which excludes the cpu going down, so the argument cpu_going_down is unused and can be removed. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-11workqueue: Remove the unneeded cpumask empty check in wq_calc_pod_cpumask()Lai Jiangshan
The cpumask empty check in wq_calc_pod_cpumask() has long been useless. It just works purely as documents which states that the cpumask is not possible empty after the function returns. Now the code above is even more explicit that the cpumask is not empty, so the document-only empty check can be removed. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-11workqueue: Remove cpus_read_lock() from apply_wqattrs_lock()Lai Jiangshan
1726a1713590 ("workqueue: Put PWQ allocation and WQ enlistment in the same lock C.S.") led to the following possible deadlock: WARNING: possible recursive locking detected 6.10.0-rc5-00004-g1d4c6111406c #1 Not tainted -------------------------------------------- swapper/0/1 is trying to acquire lock: c27760f4 (cpu_hotplug_lock){++++}-{0:0}, at: alloc_workqueue (kernel/workqueue.c:5152 kernel/workqueue.c:5730) but task is already holding lock: c27760f4 (cpu_hotplug_lock){++++}-{0:0}, at: padata_alloc (kernel/padata.c:1007) ... stack backtrace: ... cpus_read_lock (include/linux/percpu-rwsem.h:53 kernel/cpu.c:488) alloc_workqueue (kernel/workqueue.c:5152 kernel/workqueue.c:5730) padata_alloc (kernel/padata.c:1007 (discriminator 1)) pcrypt_init_padata (crypto/pcrypt.c:327 (discriminator 1)) pcrypt_init (crypto/pcrypt.c:353) do_one_initcall (init/main.c:1267) do_initcalls (init/main.c:1328 (discriminator 1) init/main.c:1345 (discriminator 1)) kernel_init_freeable (init/main.c:1364) kernel_init (init/main.c:1469) ret_from_fork (arch/x86/kernel/process.c:153) ret_from_fork_asm (arch/x86/entry/entry_32.S:737) entry_INT80_32 (arch/x86/entry/entry_32.S:944) This is caused by pcrypt allocating a workqueue while holding cpus_read_lock(), so workqueue code can't do it again as that can lead to deadlocks if down_write starts after the first down_read. The pwq creations and installations have been reworked based on wq_online_cpumask rather than cpu_online_mask making cpus_read_lock() is unneeded during wqattrs changes. Fix the deadlock by removing cpus_read_lock() from apply_wqattrs_lock(). tj: Updated changelog. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Fixes: 1726a1713590 ("workqueue: Put PWQ allocation and WQ enlistment in the same lock C.S.") Link: http://lkml.kernel.org/r/202407081521.83b627c1-lkp@intel.com Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-11workqueue: Simplify wq_calc_pod_cpumask() with wq_online_cpumaskLai Jiangshan
Avoid relying on cpu_online_mask for wqattrs changes so that cpus_read_lock() can be removed from apply_wqattrs_lock(). And with wq_online_cpumask, attrs->__pod_cpumask doesn't need to be reused as a temporary storage to calculate if the pod have any online CPUs @attrs wants since @cpu_going_down is not in the wq_online_cpumask. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-11workqueue: Add wq_online_cpumaskLai Jiangshan
The new wq_online_mask mirrors the cpu_online_mask except during hotplugging; specifically, it differs between the hotplugging stages of workqueue_offline_cpu() and workqueue_online_cpu(), during which the transitioning CPU is not represented in the mask. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-11Merge tag 'for-6.10/dm-fixes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fix from Mikulas Patocka: - Fix broken discard for device mapper VDO target * tag 'for-6.10/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm vdo: replace max_discard_sectors with max_hw_discard_sectors
2024-07-12Merge tag 'drm-misc-fixes-2024-07-11' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes drm-misc-fixes for v6.10: - EDID irq fix for bridge/adv7511. - gma500 null mode fixes. - Cleanup meson binding. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/8abff46f-eae6-4521-8434-7c6240f9091c@linux.intel.com
2024-07-11dm vdo: replace max_discard_sectors with max_hw_discard_sectorsBruce Johnston
Commit 4f563a64732d ("block: add a max_user_discard_sectors queue limit") changed block core to set max_discard_sectors to: min(lim->max_hw_discard_sectors, lim->max_user_discard_sectors) Commit 825d8bbd2f32 ("dm: always manage discard support in terms of max_hw_discard_sectors") fixed most dm targetss to deal with this, by replacing max_discard_sectors with max_hw_discard_sectors. Unfortunately, dm-vdo did not get fixed at that time. Fixes: 825d8bbd2f32 ("dm: always manage discard support in terms of max_hw_discard_sectors") Signed-off-by: Bruce Johnston <bjohnsto@redhat.com> Signed-off-by: Matthew Sakai <msakai@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2024-07-11Merge tag 'spi-fix-v6.10-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "This fixes two regressions that have been bubbling along for a large part of this release. One is a revert of the multi mode support for the OMAP SPI controller, this introduced regressions on a number of systems and while there has been progress on fixing those we've not got something that works for everyone yet so let's just drop the change for now. The other is a series of fixes from David Lechner for his recent message optimisation work, this interacted badly with spi-mux which is altogether too clever with recursive use of the bus and creates situations that hadn't been considered. There are also a couple of small driver specific fixes, including one more patch from David for sleep duration calculations in the AXI driver" * tag 'spi-fix-v6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: mux: set ctlr->bits_per_word_mask spi: add defer_optimize_message controller flag spi: don't unoptimize message in spi_async() spi: omap2-mcspi: Revert multi mode support spi: davinci: Unset POWERDOWN bit when releasing resources spi: axi-spi-engine: fix sleep calculation spi: imx: Don't expect DMA for i.MX{25,35,50,51,53} cspi devices
2024-07-11Merge branch 'for-next/vcpu-hotplug' into for-next/coreCatalin Marinas
* for-next/vcpu-hotplug: (21 commits) : arm64 support for virtual CPU hotplug (ACPI) irqchip/gic-v3: Fix 'broken_rdists' unused warning when !SMP and !ACPI arm64: Kconfig: Fix dependencies to enable ACPI_HOTPLUG_CPU cpumask: Add enabled cpumask for present CPUs that can be brought online arm64: document virtual CPU hotplug's expectations arm64: Kconfig: Enable hotplug CPU on arm64 if ACPI_PROCESSOR is enabled. arm64: arch_register_cpu() variant to check if an ACPI handle is now available. arm64: psci: Ignore DENIED CPUs irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs irqchip/gic-v3: Don't return errors from gic_acpi_match_gicc() arm64: acpi: Harden get_cpu_for_acpi_id() against missing CPU entry arm64: acpi: Move get_cpu_for_acpi_id() to a header ACPI: Add post_eject to struct acpi_scan_handler for cpu hotplug ACPI: scan: switch to flags for acpi_scan_check_and_detach() ACPI: processor: Register deferred CPUs from acpi_processor_get_info() ACPI: processor: Add acpi_get_processor_handle() helper ACPI: processor: Move checks and availability of acpi_processor earlier ACPI: processor: Fix memory leaks in error paths of processor_add() ACPI: processor: Return an error if acpi_processor_get_info() fails in processor_add() ACPI: processor: Drop duplicated check on _STA (enabled + present) cpu: Do not warn on arch_register_cpu() returning -EPROBE_DEFER ...
2024-07-11Merge branches 'for-next/cpufeature', 'for-next/misc', 'for-next/kselftest', ↵Catalin Marinas
'for-next/mte', 'for-next/errata', 'for-next/acpi', 'for-next/gic-v3-pmr' and 'for-next/doc', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: perf: add missing MODULE_DESCRIPTION() macros perf: arm_pmuv3: Include asm/arm_pmuv3.h from linux/perf/arm_pmuv3.h perf: arm_v6/7_pmu: Drop non-DT probe support perf/arm: Move 32-bit PMU drivers to drivers/perf/ perf: arm_pmuv3: Drop unnecessary IS_ENABLED(CONFIG_ARM64) check perf: arm_pmuv3: Avoid assigning fixed cycle counter with threshold perf: imx_perf: add support for i.MX95 platform perf: imx_perf: fix counter start and config sequence perf: imx_perf: refactor driver for imx93 perf: imx_perf: let the driver manage the counter usage rather the user perf: imx_perf: add macro definitions for parsing config attr dt-bindings: perf: fsl-imx-ddr: Add i.MX95 compatible perf: pmuv3: Add new Cortex and Neoverse PMUs dt-bindings: arm: pmu: Add new Cortex and Neoverse cores perf/arm-cmn: Enable support for tertiary match group perf/arm-cmn: Decouple wp_config registers from filter group number * for-next/cpufeature: : Various cpufeature infrastructure patches arm64/cpufeature: Replace custom macros with fields from ID_AA64PFR0_EL1 KVM: arm64: Replace custom macros with fields from ID_AA64PFR0_EL1 arm64/cpufeatures/kvm: Add ARMv8.9 FEAT_ECBHB bits in ID_AA64MMFR1 register * for-next/misc: : Miscellaneous patches arm64: smp: Fix missing IPI statistics arm64: Cleanup __cpu_set_tcr_t0sz() arm64/mm: Stop using ESR_ELx_FSC_TYPE during fault arm64: Kconfig: fix typo in __builtin_return_adddress ARM64: reloc_test: add missing MODULE_DESCRIPTION() macro arm64: implement raw_smp_processor_id() using thread_info arm64/arch_timer: include <linux/percpu.h> * for-next/kselftest: : arm64 kselftest updates selftests: arm64: tags: remove the result script selftests: arm64: tags_test: conform test to TAP output kselftest/arm64: Fix a couple of spelling mistakes kselftest/arm64: Fix redundancy of a testcase kselftest/arm64: Include kernel mode NEON in fp-stress * for-next/mte: : MTE updates arm64: mte: Make mte_check_tfsr_*() conditional on KASAN instead of MTE * for-next/errata: : Arm CPU errata workarounds arm64: errata: Expand speculative SSBS workaround arm64: errata: Unify speculative SSBS errata logic arm64: cputype: Add Cortex-X925 definitions arm64: cputype: Add Cortex-A720 definitions arm64: cputype: Add Cortex-X3 definitions * for-next/acpi: : arm64 ACPI patches ACPI: Add acpi=nospcr to disable ACPI SPCR as default console on ARM64 ACPI / amba: Drop unnecessary check for registered amba_dummy_clk arm64: FFH: Move ACPI specific code into drivers/acpi/arm64/ arm64: cpuidle: Move ACPI specific code into drivers/acpi/arm64/ ACPI: arm64: Sort entries alphabetically * for-next/gic-v3-pmr: : arm64: irqchip/gic-v3: Use compiletime constant PMR values arm64: irqchip/gic-v3: Select priorities at boot time irqchip/gic-v3: Detect GICD_CTRL.DS and SCR_EL3.FIQ earlier irqchip/gic-v3: Make distributor priorities variables irqchip/gic-common: Remove sync_access callback wordpart.h: Add REPEAT_BYTE_U32() * for-next/doc: : arm64 documentation updates Documentation: arm64: Update memory.rst for TBI
2024-07-11selftests: arm64: tags: remove the result scriptMuhammad Usama Anjum
The run_tags_test.sh script is used to run tags_test and print out if the test succeeded or failed. As tags_test has been TAP conformed, this script is unneeded and hence can be removed. Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20240602132502.4186771-2-usama.anjum@collabora.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-07-11selftests: arm64: tags_test: conform test to TAP outputMuhammad Usama Anjum
Conform the layout, informational and status messages to TAP. No functional change is intended other than the layout of output messages. Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20240602132502.4186771-1-usama.anjum@collabora.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-07-11Merge tag 'net-6.10-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf and netfilter. Current release - regressions: - core: fix rc7's __skb_datagram_iter() regression Current release - new code bugs: - eth: bnxt: fix crashes when reducing ring count with active RSS contexts Previous releases - regressions: - sched: fix UAF when resolving a clash - skmsg: skip zero length skb in sk_msg_recvmsg2 - sunrpc: fix kernel free on connection failure in xs_tcp_setup_socket - tcp: avoid too many retransmit packets - tcp: fix incorrect undo caused by DSACK of TLP retransmit - udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port(). - eth: ks8851: fix deadlock with the SPI chip variant - eth: i40e: fix XDP program unloading while removing the driver Previous releases - always broken: - bpf: - fix too early release of tcx_entry - fail bpf_timer_cancel when callback is being cancelled - bpf: fix order of args in call to bpf_map_kvcalloc - netfilter: nf_tables: prefer nft_chain_validate - ppp: reject claimed-as-LCP but actually malformed packets - wireguard: avoid unaligned 64-bit memory accesses" * tag 'net-6.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (33 commits) net, sunrpc: Remap EPERM in case of connection failure in xs_tcp_setup_socket net/sched: Fix UAF when resolving a clash net: ks8851: Fix potential TX stall after interface reopen udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port(). netfilter: nf_tables: prefer nft_chain_validate netfilter: nfnetlink_queue: drop bogus WARN_ON ethtool: netlink: do not return SQI value if link is down ppp: reject claimed-as-LCP but actually malformed packets selftests/bpf: Add timer lockup selftest net: ethernet: mtk-star-emac: set mac_managed_pm when probing e1000e: fix force smbus during suspend flow tcp: avoid too many retransmit packets bpf: Defer work in bpf_timer_cancel_and_free bpf: Fail bpf_timer_cancel when callback is being cancelled bpf: fix order of args in call to bpf_map_kvcalloc net: ethernet: lantiq_etop: fix double free in detach i40e: Fix XDP program unloading while removing the driver net: fix rc7's __skb_datagram_iter() net: ks8851: Fix deadlock with the SPI chip variant octeontx2-af: Fix incorrect value output on error path in rvu_check_rsrc_availability() ...
2024-07-11Merge tag 'vfs-6.10-rc8.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "cachefiles: - Export an existing and add a new cachefile helper to be used in filesystems to fix reference count bugs - Use the newly added fscache_ty_get_volume() helper to get a reference count on an fscache_volume to handle volumes that are about to be removed cleanly - After withdrawing a fscache_cache via FSCACHE_CACHE_IS_WITHDRAWN wait for all ongoing cookie lookups to complete and for the object count to reach zero - Propagate errors from vfs_getxattr() to avoid an infinite loop in cachefiles_check_volume_xattr() because it keeps seeing ESTALE - Don't send new requests when an object is dropped by raising CACHEFILES_ONDEMAND_OJBSTATE_DROPPING - Cancel all requests for an object that is about to be dropped - Wait for the ondemand_boject_worker to finish before dropping a cachefiles object to prevent use-after-free - Use cyclic allocation for message ids to better handle id recycling - Add missing lock protection when iterating through the xarray when polling netfs: - Use standard logging helpers for debug logging VFS: - Fix potential use-after-free in file locks during trace_posix_lock_inode(). The tracepoint could fire while another task raced it and freed the lock that was requested to be traced - Only increment the nr_dentry_negative counter for dentries that are present on the superblock LRU. Currently, DCACHE_LRU_LIST list is used to detect this case. However, the flag is also raised in combination with DCACHE_SHRINK_LIST to indicate that dentry->d_lru is used. So checking only DCACHE_LRU_LIST will lead to wrong nr_dentry_negative count. Fix the check to not count dentries that are on a shrink related list Misc: - hfsplus: fix an uninitialized value issue in copy_name - minix: fix minixfs_rename with HIGHMEM. It still uses kunmap() even though we switched it to kmap_local_page() a while ago" * tag 'vfs-6.10-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: minixfs: Fix minixfs_rename with HIGHMEM hfsplus: fix uninit-value in copy_name vfs: don't mod negative dentry count when on shrinker list filelock: fix potential use-after-free in posix_lock_inode cachefiles: add missing lock protection when polling cachefiles: cyclic allocation of msg_id to avoid reuse cachefiles: wait for ondemand_object_worker to finish when dropping object cachefiles: cancel all requests for the object that is being dropped cachefiles: stop sending new request when dropping object cachefiles: propagate errors from vfs_getxattr() to avoid infinite loop cachefiles: fix slab-use-after-free in cachefiles_withdraw_cookie() cachefiles: fix slab-use-after-free in fscache_withdraw_volume() netfs, fscache: export fscache_put_volume() and add fscache_try_get_volume() netfs: Switch debug logging to pr_debug()
2024-07-11tick/broadcast: Make takeover of broadcast hrtimer reliableYu Liao
Running the LTP hotplug stress test on a aarch64 machine results in rcu_sched stall warnings when the broadcast hrtimer was owned by the un-plugged CPU. The issue is the following: CPU1 (owns the broadcast hrtimer) CPU2 tick_broadcast_enter() // shutdown local timer device broadcast_shutdown_local() ... tick_broadcast_exit() clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT) // timer device is not programmed cpumask_set_cpu(cpu, tick_broadcast_force_mask) initiates offlining of CPU1 take_cpu_down() /* * CPU1 shuts down and does not * send broadcast IPI anymore */ takedown_cpu() hotplug_cpu__broadcast_tick_pull() // move broadcast hrtimer to this CPU clockevents_program_event() bc_set_next() hrtimer_start() /* * timer device is not programmed * because only the first expiring * timer will trigger clockevent * device reprogramming */ What happens is that CPU2 exits broadcast mode with force bit set, then the local timer device is not reprogrammed and CPU2 expects to receive the expired event by the broadcast IPI. But this does not happen because CPU1 is offlined by CPU2. CPU switches the clockevent device to ONESHOT state, but does not reprogram the device. The subsequent reprogramming of the hrtimer broadcast device does not program the clockevent device of CPU2 either because the pending expiry time is already in the past and the CPU expects the event to be delivered. As a consequence all CPUs which wait for a broadcast event to be delivered are stuck forever. Fix this issue by reprogramming the local timer device if the broadcast force bit of the CPU is set so that the broadcast hrtimer is delivered. [ tglx: Massage comment and change log. Add Fixes tag ] Fixes: 989dcb645ca7 ("tick: Handle broadcast wakeup of multiple cpus") Signed-off-by: Yu Liao <liaoyu15@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240711124843.64167-1-liaoyu15@huawei.com
2024-07-11dt-bindings: mmc: sdhci-sprd: convert to YAMLStanislav Jakubek
Covert the Spreadtrum SDHCI controller bindings to DT schema. Rename the file to match compatible. Drop assigned-* properties as these should not be needed. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Stanislav Jakubek <stano.jakubek@gmail.com> Link: https://lore.kernel.org/r/ZozY+tOkzK9yfjbo@standask-GA-A55M-S2HP Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-07-11mmc: davinci_mmc: report all possible bus widthsBastien Curutchet
A dev_info() at probe's end() report the supported bus width. It never reports 8-bits width while the driver can handle it. Update the info message at then end of the probe to report the use of 8-bits data when needed. Signed-off-by: Bastien Curutchet <bastien.curutchet@bootlin.com> Link: https://lore.kernel.org/r/20240711081838.47256-3-bastien.curutchet@bootlin.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-07-11mmc: Merge branch fixes into nextUlf Hansson
Merge the mmc fixes for v6.10-rc[n] into the next branch, to allow them to get tested together with the new mmc changes that are targeted for v6.11. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-07-11mmc: davinci_mmc: Prevent transmitted data size from exceeding sgm's lengthBastien Curutchet
No check is done on the size of the data to be transmiited. This causes a kernel panic when this size exceeds the sg_miter's length. Limit the number of transmitted bytes to sgm->length. Cc: stable@vger.kernel.org Fixes: ed01d210fd91 ("mmc: davinci_mmc: Use sg_miter for PIO") Signed-off-by: Bastien Curutchet <bastien.curutchet@bootlin.com> Link: https://lore.kernel.org/r/20240711081838.47256-2-bastien.curutchet@bootlin.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-07-11mmc: sdhci: Fix max_seg_size for 64KiB PAGE_SIZEAdrian Hunter
blk_queue_max_segment_size() ensured: if (max_size < PAGE_SIZE) max_size = PAGE_SIZE; whereas: blk_validate_limits() makes it an error: if (WARN_ON_ONCE(lim->max_segment_size < PAGE_SIZE)) return -EINVAL; The change from one to the other, exposed sdhci which was setting maximum segment size too low in some circumstances. Fix the maximum segment size when it is too low. Fixes: 616f87661792 ("mmc: pass queue_limits to blk_mq_alloc_disk") Cc: stable@vger.kernel.org Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Link: https://lore.kernel.org/r/20240710180737.142504-1-adrian.hunter@intel.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-07-11drm/xe/display/xe_hdcp_gsc: Free arbiter on driver removalNirmoy Das
Free arbiter allocated in intel_hdcp_gsc_init(). Fixes: 152f2df954d8 ("drm/xe/hdcp: Enable HDCP for XE") Cc: Suraj Kandpal <suraj.kandpal@intel.com> Cc: Arun R Murthy <arun.r.murthy@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240708125918.23573-1-nirmoy.das@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> (cherry picked from commit 33891539f9d6f245e93a76e3fb5791338180374f) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-07-11drm/xe: Use write-back caching mode for system memory on DGFXThomas Hellström
The caching mode for buffer objects with VRAM as a possible placement was forced to write-combined, regardless of placement. However, write-combined system memory is expensive to allocate and even though it is pooled, the pool is expensive to shrink, since it involves global CPU TLB flushes. Moreover write-combined system memory from TTM is only reliably available on x86 and DGFX doesn't have an x86 restriction. So regardless of the cpu caching mode selected for a bo, internally use write-back caching mode for system memory on DGFX. Coherency is maintained, but user-space clients may perceive a difference in cpu access speeds. v2: - Update RB- and Ack tags. - Rephrase wording in xe_drm.h (Matt Roper) v3: - Really rephrase wording. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: 622f709ca629 ("drm/xe/uapi: Add support for CPU caching mode") Cc: Pallavi Mishra <pallavi.mishra@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: dri-devel@lists.freedesktop.org Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Effie Yu <effie.yu@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Jose Souza <jose.souza@intel.com> Cc: Michal Mrozek <michal.mrozek@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Acked-by: Matthew Auld <matthew.auld@intel.com> Acked-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Fixes: 622f709ca629 ("drm/xe/uapi: Add support for CPU caching mode") Acked-by: Michal Mrozek <michal.mrozek@intel.com> Acked-by: Effie Yu <effie.yu@intel.com> #On chat Link: https://patchwork.freedesktop.org/patch/msgid/20240705132828.27714-1-thomas.hellstrom@linux.intel.com (cherry picked from commit 01e0cfc994be484ddcb9e121e353e51d8bb837c0) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-07-11Merge tag 'asoc-fix-v6.10-rc7' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.10 A few fairly small fixes for ASoC, there's a relatively large set of hardening changes for the cs_dsp firmware file parsing and a couple of other small device specific fixes.
2024-07-11btrfs: avoid races when tracking progress for extent map shrinkingFilipe Manana
We store the progress (root and inode numbers) of the extent map shrinker in fs_info without any synchronization but we can have multiple tasks calling into the shrinker during memory allocations when there's enough memory pressure for example. This can result in a task A reading fs_info->extent_map_shrinker_last_ino after another task B updates it, and task A reading fs_info->extent_map_shrinker_last_root before task B updates it, making task A see an odd state that isn't necessarily harmful but may make it skip certain inode ranges or do more work than necessary by going over the same inodes again. These unprotected accesses would also trigger warnings from tools like KCSAN. So add a lock to protect access to these progress fields. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11btrfs: stop extent map shrinker if reschedule is neededFilipe Manana
The extent map shrinker can be called in a variety of contexts where we are under memory pressure, and of them is when a task is trying to allocate memory. For this reason the shrinker is typically called with a value of struct shrink_control::nr_to_scan that is much smaller than what we return in the nr_cached_objects callback of struct super_operations (fs/btrfs/super.c:btrfs_nr_cached_objects()), so that the shrinker does not take a long time and cause high latencies. However we can still take a lot of time in the shrinker even for a limited amount of nr_to_scan: 1) When traversing the red black tree that tracks open inodes in a root, as for example with millions of open inodes we get a deep tree which takes time searching for an inode; 2) Iterating over the extent map tree, which is a red black tree, of an inode when doing the rb_next() calls and when removing an extent map from the tree, since often that requires rebalancing the red black tree; 3) When trying to write lock an inode's extent map tree we may wait for a significant amount of time, because there's either another task about to do IO and searching for an extent map in the tree or inserting an extent map in the tree, and we can have thousands or even millions of extent maps for an inode. Furthermore, there can be concurrent calls to the shrinker so the lock might be busy simply because there is already another task shrinking extent maps for the same inode; 4) We often reschedule if we need to, which further increases latency. So improve on this by stopping the extent map shrinking code whenever we need to reschedule and make it skip an inode if we can't immediately lock its extent map tree. Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Reported-by: Andrea Gelmini <andrea.gelmini@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CABXGCsMmmb36ym8hVNGTiU8yfUS_cGvoUmGCcBrGWq9OxTrs+A@mail.gmail.com/ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-11btrfs: use delayed iput during extent map shrinkingFilipe Manana
When putting an inode during extent map shrinking we're doing a standard iput() but that may take a long time in case the inode is dirty and we are doing the final iput that triggers eviction - the VFS will have to wait for writeback before calling the btrfs evict callback (see fs/inode.c:evict()). This slows down the task running the shrinker which may have been triggered while updating some tree for example, meaning locks are held as well as an open transaction handle. Also if the iput() ends up triggering eviction and the inode has no links anymore, then we trigger item truncation which requires flushing delayed items, space reservation to start a transaction and that may trigger the space reclaim task and wait for it, resulting in deadlocks in case the reclaim task needs for example to commit a transaction and the shrinker is being triggered from a path holding a transaction handle. Syzbot reported such a case with the following stack traces: ====================================================== WARNING: possible circular locking dependency detected 6.10.0-rc2-syzkaller-00010-g2ab795141095 #0 Not tainted ------------------------------------------------------ kswapd0/111 is trying to acquire lock: ffff88801eae4610 (sb_internal#3){.+.+}-{0:0}, at: btrfs_commit_inode_delayed_inode+0x110/0x330 fs/btrfs/delayed-inode.c:1275 but task is already holding lock: ffffffff8dd3a9a0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xa88/0x1970 mm/vmscan.c:6924 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (fs_reclaim){+.+.}-{0:0}: __fs_reclaim_acquire mm/page_alloc.c:3783 [inline] fs_reclaim_acquire+0x102/0x160 mm/page_alloc.c:3797 might_alloc include/linux/sched/mm.h:334 [inline] slab_pre_alloc_hook mm/slub.c:3890 [inline] slab_alloc_node mm/slub.c:3980 [inline] kmem_cache_alloc_lru_noprof+0x58/0x2f0 mm/slub.c:4019 btrfs_alloc_inode+0x118/0xb20 fs/btrfs/inode.c:8411 alloc_inode+0x5d/0x230 fs/inode.c:261 iget5_locked fs/inode.c:1235 [inline] iget5_locked+0x1c9/0x2c0 fs/inode.c:1228 btrfs_iget_locked fs/btrfs/inode.c:5590 [inline] btrfs_iget_path fs/btrfs/inode.c:5607 [inline] btrfs_iget+0xfb/0x230 fs/btrfs/inode.c:5636 create_reloc_inode+0x403/0x820 fs/btrfs/relocation.c:3911 btrfs_relocate_block_group+0x471/0xe60 fs/btrfs/relocation.c:4114 btrfs_relocate_chunk+0x143/0x450 fs/btrfs/volumes.c:3373 __btrfs_balance fs/btrfs/volumes.c:4157 [inline] btrfs_balance+0x211a/0x3f00 fs/btrfs/volumes.c:4534 btrfs_ioctl_balance fs/btrfs/ioctl.c:3675 [inline] btrfs_ioctl+0x12ed/0x8290 fs/btrfs/ioctl.c:4742 __do_compat_sys_ioctl+0x2c3/0x330 fs/ioctl.c:1007 do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline] __do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386 do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411 entry_SYSENTER_compat_after_hwframe+0x84/0x8e -> #2 (btrfs_trans_num_extwriters){++++}-{0:0}: join_transaction+0x164/0xf40 fs/btrfs/transaction.c:315 start_transaction+0x427/0x1a70 fs/btrfs/transaction.c:700 btrfs_rebuild_free_space_tree+0xaa/0x480 fs/btrfs/free-space-tree.c:1323 btrfs_start_pre_rw_mount+0x218/0xf60 fs/btrfs/disk-io.c:2999 open_ctree+0x41ab/0x52e0 fs/btrfs/disk-io.c:3554 btrfs_fill_super fs/btrfs/super.c:946 [inline] btrfs_get_tree_super fs/btrfs/super.c:1863 [inline] btrfs_get_tree+0x11e9/0x1b90 fs/btrfs/super.c:2089 vfs_get_tree+0x8f/0x380 fs/super.c:1780 fc_mount+0x16/0xc0 fs/namespace.c:1125 btrfs_get_tree_subvol fs/btrfs/super.c:2052 [inline] btrfs_get_tree+0xa53/0x1b90 fs/btrfs/super.c:2090 vfs_get_tree+0x8f/0x380 fs/super.c:1780 do_new_mount fs/namespace.c:3352 [inline] path_mount+0x6e1/0x1f10 fs/namespace.c:3679 do_mount fs/namespace.c:3692 [inline] __do_sys_mount fs/namespace.c:3898 [inline] __se_sys_mount fs/namespace.c:3875 [inline] __ia32_sys_mount+0x295/0x320 fs/namespace.c:3875 do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline] __do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386 do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411 entry_SYSENTER_compat_after_hwframe+0x84/0x8e -> #1 (btrfs_trans_num_writers){++++}-{0:0}: join_transaction+0x148/0xf40 fs/btrfs/transaction.c:314 start_transaction+0x427/0x1a70 fs/btrfs/transaction.c:700 btrfs_rebuild_free_space_tree+0xaa/0x480 fs/btrfs/free-space-tree.c:1323 btrfs_start_pre_rw_mount+0x218/0xf60 fs/btrfs/disk-io.c:2999 open_ctree+0x41ab/0x52e0 fs/btrfs/disk-io.c:3554 btrfs_fill_super fs/btrfs/super.c:946 [inline] btrfs_get_tree_super fs/btrfs/super.c:1863 [inline] btrfs_get_tree+0x11e9/0x1b90 fs/btrfs/super.c:2089 vfs_get_tree+0x8f/0x380 fs/super.c:1780 fc_mount+0x16/0xc0 fs/namespace.c:1125 btrfs_get_tree_subvol fs/btrfs/super.c:2052 [inline] btrfs_get_tree+0xa53/0x1b90 fs/btrfs/super.c:2090 vfs_get_tree+0x8f/0x380 fs/super.c:1780 do_new_mount fs/namespace.c:3352 [inline] path_mount+0x6e1/0x1f10 fs/namespace.c:3679 do_mount fs/namespace.c:3692 [inline] __do_sys_mount fs/namespace.c:3898 [inline] __se_sys_mount fs/namespace.c:3875 [inline] __ia32_sys_mount+0x295/0x320 fs/namespace.c:3875 do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline] __do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386 do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411 entry_SYSENTER_compat_after_hwframe+0x84/0x8e -> #0 (sb_internal#3){.+.+}-{0:0}: check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain kernel/locking/lockdep.c:3869 [inline] __lock_acquire+0x2478/0x3b30 kernel/locking/lockdep.c:5137 lock_acquire kernel/locking/lockdep.c:5754 [inline] lock_acquire+0x1b1/0x560 kernel/locking/lockdep.c:5719 percpu_down_read include/linux/percpu-rwsem.h:51 [inline] __sb_start_write include/linux/fs.h:1655 [inline] sb_start_intwrite include/linux/fs.h:1838 [inline] start_transaction+0xbc1/0x1a70 fs/btrfs/transaction.c:694 btrfs_commit_inode_delayed_inode+0x110/0x330 fs/btrfs/delayed-inode.c:1275 btrfs_evict_inode+0x960/0xe80 fs/btrfs/inode.c:5291 evict+0x2ed/0x6c0 fs/inode.c:667 iput_final fs/inode.c:1741 [inline] iput.part.0+0x5a8/0x7f0 fs/inode.c:1767 iput+0x5c/0x80 fs/inode.c:1757 btrfs_scan_root fs/btrfs/extent_map.c:1118 [inline] btrfs_free_extent_maps+0xbd3/0x1320 fs/btrfs/extent_map.c:1189 super_cache_scan+0x409/0x550 fs/super.c:227 do_shrink_slab+0x44f/0x11c0 mm/shrinker.c:435 shrink_slab+0x18a/0x1310 mm/shrinker.c:662 shrink_one+0x493/0x7c0 mm/vmscan.c:4790 shrink_many mm/vmscan.c:4851 [inline] lru_gen_shrink_node+0x89f/0x1750 mm/vmscan.c:4951 shrink_node mm/vmscan.c:5910 [inline] kswapd_shrink_node mm/vmscan.c:6720 [inline] balance_pgdat+0x1105/0x1970 mm/vmscan.c:6911 kswapd+0x5ea/0xbf0 mm/vmscan.c:7180 kthread+0x2c1/0x3a0 kernel/kthread.c:389 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 other info that might help us debug this: Chain exists of: sb_internal#3 --> btrfs_trans_num_extwriters --> fs_reclaim Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(btrfs_trans_num_extwriters); lock(fs_reclaim); rlock(sb_internal#3); *** DEADLOCK *** 2 locks held by kswapd0/111: #0: ffffffff8dd3a9a0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xa88/0x1970 mm/vmscan.c:6924 #1: ffff88801eae40e0 (&type->s_umount_key#62){++++}-{3:3}, at: super_trylock_shared fs/super.c:562 [inline] #1: ffff88801eae40e0 (&type->s_umount_key#62){++++}-{3:3}, at: super_cache_scan+0x96/0x550 fs/super.c:196 stack backtrace: CPU: 0 PID: 111 Comm: kswapd0 Not tainted 6.10.0-rc2-syzkaller-00010-g2ab795141095 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:114 check_noncircular+0x31a/0x400 kernel/locking/lockdep.c:2187 check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain kernel/locking/lockdep.c:3869 [inline] __lock_acquire+0x2478/0x3b30 kernel/locking/lockdep.c:5137 lock_acquire kernel/locking/lockdep.c:5754 [inline] lock_acquire+0x1b1/0x560 kernel/locking/lockdep.c:5719 percpu_down_read include/linux/percpu-rwsem.h:51 [inline] __sb_start_write include/linux/fs.h:1655 [inline] sb_start_intwrite include/linux/fs.h:1838 [inline] start_transaction+0xbc1/0x1a70 fs/btrfs/transaction.c:694 btrfs_commit_inode_delayed_inode+0x110/0x330 fs/btrfs/delayed-inode.c:1275 btrfs_evict_inode+0x960/0xe80 fs/btrfs/inode.c:5291 evict+0x2ed/0x6c0 fs/inode.c:667 iput_final fs/inode.c:1741 [inline] iput.part.0+0x5a8/0x7f0 fs/inode.c:1767 iput+0x5c/0x80 fs/inode.c:1757 btrfs_scan_root fs/btrfs/extent_map.c:1118 [inline] btrfs_free_extent_maps+0xbd3/0x1320 fs/btrfs/extent_map.c:1189 super_cache_scan+0x409/0x550 fs/super.c:227 do_shrink_slab+0x44f/0x11c0 mm/shrinker.c:435 shrink_slab+0x18a/0x1310 mm/shrinker.c:662 shrink_one+0x493/0x7c0 mm/vmscan.c:4790 shrink_many mm/vmscan.c:4851 [inline] lru_gen_shrink_node+0x89f/0x1750 mm/vmscan.c:4951 shrink_node mm/vmscan.c:5910 [inline] kswapd_shrink_node mm/vmscan.c:6720 [inline] balance_pgdat+0x1105/0x1970 mm/vmscan.c:6911 kswapd+0x5ea/0xbf0 mm/vmscan.c:7180 kthread+0x2c1/0x3a0 kernel/kthread.c:389 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> So fix this by using btrfs_add_delayed_iput() so that the final iput is delegated to the cleaner kthread. Link: https://lore.kernel.org/linux-btrfs/000000000000892280061a344581@google.com/ Reported-by: syzbot+3dad89b3993a4b275e72@syzkaller.appspotmail.com Fixes: 956a17d9d050 ("btrfs: add a shrinker for extent maps") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>