summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-10-28tools headers: Update the linux/unaligned.h copy with the kernel sourcesArnaldo Carvalho de Melo
To pick up the changes in: 7f053812dab3946c ("random: vDSO: minimize and simplify header includes") That required adding a copy of include/vdso/unaligned.h and its checking in tools/perf/check-headers.h. Addressing this perf tools build warning: Warning: Kernel ABI header differences: diff -u tools/include/linux/unaligned.h include/linux/unaligned.h Please see tools/include/uapi/README for further details. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Ian Rogers <irogers@google.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/Zx-uHvAbPAESofEN@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-28tools headers arm64: Sync arm64's cputype.h with the kernel sourcesArnaldo Carvalho de Melo
To get the changes in: 924725707d80bc25 ("arm64: cputype: Add Neoverse-N3 definitions") That makes this perf source code to be rebuilt: CC /tmp/build/perf-tools/util/arm-spe.o The changes in the above patch add MIDR_NEOVERSE_N3, that probably need changes in arm-spe.c, so probably we need to add it to that array? Or maybe we need to leave this for later when this is all tested on those machines? static const struct midr_range neoverse_spe[] = { MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1), MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2), MIDR_ALL_VERSIONS(MIDR_NEOVERSE_V1), {}, }; Mark Rutland recommended about arm-spe.c in a previous update to this file: "I would not touch this for now -- someone would have to go audit the TRMs to check that those other cores have the same encoding, and I think it'd be better to do that as a follow-up." That addresses this perf build warning: Warning: Kernel ABI header differences: diff -u tools/arch/arm64/include/asm/cputype.h arch/arm64/include/asm/cputype.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/Zx-dffKdGsgkhG96@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-28tools headers: Synchronize {uapi/}linux/bits.h with the kernel sourcesArnaldo Carvalho de Melo
To pick up the changes in this cset: 947697c6f0f75f98 ("uapi: Define GENMASK_U128") This addresses these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/include/uapi/linux/bits.h include/uapi/linux/bits.h diff -u tools/include/linux/bits.h include/linux/bits.h Please see tools/include/uapi/README for further details. Acked-by: Yury Norov <yury.norov@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/Zx-ZVH7bHqtFn8Dv@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-28tpm: Rollback tpm2_load_null()Jarkko Sakkinen
Do not continue on tpm2_create_primary() failure in tpm2_load_null(). Cc: stable@vger.kernel.org # v6.10+ Fixes: eb24c9788cd9 ("tpm: disable the TPM if NULL name changes") Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-10-28tpm: Return tpm2_sessions_init() when null key creation failsJarkko Sakkinen
Do not continue tpm2_sessions_init() further if the null key pair creation fails. Cc: stable@vger.kernel.org # v6.10+ Fixes: d2add27cf2b8 ("tpm: Add NULL primary creation") Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-10-28iov_iter: fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAPHugh Dickins
generic/077 on x86_32 CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP=y with highmem, on huge=always tmpfs, issues a warning and then hangs (interruptibly): WARNING: CPU: 5 PID: 3517 at mm/highmem.c:622 kunmap_local_indexed+0x62/0xc9 CPU: 5 UID: 0 PID: 3517 Comm: cp Not tainted 6.12.0-rc4 #2 ... copy_page_from_iter_atomic+0xa6/0x5ec generic_perform_write+0xf6/0x1b4 shmem_file_write_iter+0x54/0x67 Fix copy_page_from_iter_atomic() by limiting it in that case (include/linux/skbuff.h skb_frag_must_loop() does similar). But going forward, perhaps CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP is too surprising, has outlived its usefulness, and should just be removed? Fixes: 908a1ad89466 ("iov_iter: Handle compound highmem pages in copy_page_from_iter_atomic()") Signed-off-by: Hugh Dickins <hughd@google.com> Link: https://lore.kernel.org/r/dd5f0c89-186e-18e1-4f43-19a60f5a9774@google.com Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-28ASoC: cs42l51: Fix some error handling paths in cs42l51_probe()Christophe JAILLET
If devm_gpiod_get_optional() fails, we need to disable previously enabled regulators, as done in the other error handling path of the function. Also, gpiod_set_value_cansleep(, 1) needs to be called to undo a potential gpiod_set_value_cansleep(, 0). If the "reset" gpio is not defined, this additional call is just a no-op. This behavior is the same as the one already in the .remove() function. Fixes: 11b9cd748e31 ("ASoC: cs42l51: add reset management") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com> Link: https://patch.msgid.link/a5e5f4b9fb03f46abd2c93ed94b5c395972ce0d1.1729975570.git.christophe.jaillet@wanadoo.fr Signed-off-by: Mark Brown <broonie@kernel.org>
2024-10-28autofs: fix thinko in validate_dev_ioctl()Ian Kent
I was so sure the per-dentry expire timeout patch worked ok but my testing was flawed. In validate_dev_ioctl() the check for ioctl AUTOFS_DEV_IOCTL_TIMEOUT_CMD should use the ioctl number not the passed in ioctl command. Fixes: 433f9d76a010 ("autofs: add per dentry expire timeout") Cc: <stable@vger.kernel.org> # mainline only Signed-off-by: Ian Kent <raven@themaw.net> Link: https://lore.kernel.org/r/20241027224732.5507-1-raven@themaw.net Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-28RISC-V: KVM: Use NACL HFENCEs for KVM request based HFENCEsAnup Patel
When running under some other hypervisor, use SBI NACL based HFENCEs for TLB shoot-down via KVM requests. This makes HFENCEs faster whenever SBI nested acceleration is available. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241020194734.58686-14-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28RISC-V: KVM: Save trap CSRs in kvm_riscv_vcpu_enter_exit()Anup Patel
Save trap CSRs in the kvm_riscv_vcpu_enter_exit() function instead of the kvm_arch_vcpu_ioctl_run() function so that HTVAL and HTINST CSRs are accessed in more optimized manner while running under some other hypervisor. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241020194734.58686-13-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28RISC-V: KVM: Use SBI sync SRET call when availableAnup Patel
Implement an optimized KVM world-switch using SBI sync SRET call when SBI nested acceleration extension is available. This improves KVM world-switch when KVM RISC-V is running as a Guest under some other hypervisor. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241020194734.58686-12-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28RISC-V: KVM: Use nacl_csr_xyz() for accessing AIA CSRsAnup Patel
When running under some other hypervisor, prefer nacl_csr_xyz() for accessing AIA CSRs in the run-loop. This makes CSR access faster whenever SBI nested acceleration is available. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241020194734.58686-11-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28RISC-V: KVM: Use nacl_csr_xyz() for accessing H-extension CSRsAnup Patel
When running under some other hypervisor, prefer nacl_csr_xyz() for accessing H-extension CSRs in the run-loop. This makes CSR access faster whenever SBI nested acceleration is available. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241020194734.58686-10-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28RISC-V: KVM: Add common nested acceleration supportAnup Patel
Add a common nested acceleration support which will be shared by all parts of KVM RISC-V. This nested acceleration support detects and enables SBI NACL extension usage based on static keys which ensures minimum impact on the non-nested scenario. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241020194734.58686-9-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28RISC-V: Add defines for the SBI nested acceleration extensionAnup Patel
Add defines for the new SBI nested acceleration extension which was ratified as part of the SBI v2.0 specification. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241020194734.58686-8-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28RISC-V: KVM: Don't setup SGEI for zero guest external interruptsAnup Patel
No need to setup SGEI local interrupt when there are zero guest external interrupts (i.e. zero HW IMSIC guest files). Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241020194734.58686-7-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28RISC-V: KVM: Replace aia_set_hvictl() with aia_hvictl_value()Anup Patel
The aia_set_hvictl() internally writes the HVICTL CSR which makes it difficult to optimize the CSR write using SBI NACL extension for kvm_riscv_vcpu_aia_update_hvip() function so replace aia_set_hvictl() with new aia_hvictl_value() which only computes the HVICTL value. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241020194734.58686-6-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28RISC-V: KVM: Break down the __kvm_riscv_switch_to() into macrosAnup Patel
Break down the __kvm_riscv_switch_to() function into macros so that these macros can be later re-used by SBI NACL extension based low-level switch function. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241020194734.58686-5-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28RISC-V: KVM: Save/restore SCOUNTEREN in C sourceAnup Patel
The SCOUNTEREN CSR need not be saved/restored in the low-level __kvm_riscv_switch_to() function hence move the SCOUNTEREN CSR save/restore to the kvm_riscv_vcpu_swap_in_guest_state() and kvm_riscv_vcpu_swap_in_host_state() functions in C sources. Also, re-arrange the CSR save/restore and related GPR usage in the low-level __kvm_riscv_switch_to() low-level function. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241020194734.58686-4-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28RISC-V: KVM: Save/restore HSTATUS in C sourceAnup Patel
We will be optimizing HSTATUS CSR access via shared memory setup using the SBI nested acceleration extension. To facilitate this, we first move HSTATUS save/restore in kvm_riscv_vcpu_enter_exit(). Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241020194734.58686-3-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28RISC-V: KVM: Order the object files alphabeticallyAnup Patel
Order the object files alphabetically in the Makefile so that it is very predictable inserting new object files in the future. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241020194734.58686-2-apatel@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28riscv: KVM: add basic support for host vs guest profilingQuan Zhou
For the information collected on the host side, we need to identify which data originates from the guest and record these events separately, this can be achieved by having KVM register perf callbacks. Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/00342d535311eb0629b9ba4f1e457a48e2abee33.1728957131.git.zhouquan@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28riscv: perf: add guest vs host distinctionQuan Zhou
Introduce basic guest support in perf, enabling it to distinguish between PMU interrupts in the host or guest, and collect fundamental information. Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/a67d527dc1b11493fe11f7f53584772fdd983744.1728957131.git.zhouquan@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2024-10-28kbuild: rust: avoid errors with old `rustc`s without LLVM patch versionMiguel Ojeda
Some old versions of `rustc` did not report the LLVM version without the patch version, e.g.: $ rustc --version --verbose rustc 1.48.0 (7eac88abb 2020-11-16) binary: rustc commit-hash: 7eac88abb2e57e752f3302f02be5f3ce3d7adfb4 commit-date: 2020-11-16 host: x86_64-unknown-linux-gnu release: 1.48.0 LLVM version: 11.0 Which would make the new `scripts/rustc-llvm-version.sh` fail and, in turn, the build: $ make LLVM=1 SYNC include/config/auto.conf.cmd ./scripts/rustc-llvm-version.sh: 13: arithmetic expression: expecting primary: "10000 * 10 + 100 * 0 + " init/Kconfig:83: syntax error init/Kconfig:83: invalid statement make[3]: *** [scripts/kconfig/Makefile:85: syncconfig] Error 1 make[2]: *** [Makefile:679: syncconfig] Error 2 make[1]: *** [/home/cam/linux/Makefile:780: include/config/auto.conf.cmd] Error 2 make: *** [Makefile:224: __sub-make] Error 2 Since we do not need to support such binaries, we can avoid adding logic for computing `rustc`'s LLVM version for those old binaries. Thus, instead, just make the match stricter. Other `rustc` binaries (even newer) did not report the LLVM version at all, but that was fine, since it would not match "LLVM", e.g.: $ rustc --version --verbose rustc 1.49.0 (e1884a8e3 2020-12-29) binary: rustc commit-hash: e1884a8e3c3e813aada8254edfa120e85bf5ffca commit-date: 2020-12-29 host: x86_64-unknown-linux-gnu release: 1.49.0 Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: Gary Guo <gary@garyguo.net> Reported-by: Cameron MacPherson <cameron.macpherson@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219423 Fixes: af0121c2d303 ("kbuild: rust: add `CONFIG_RUSTC_LLVM_VERSION`") Tested-by: Cameron MacPherson <cameron.macpherson@gmail.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20241027145636.416030-1-ojeda@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-10-27Linux 6.12-rc5Linus Torvalds
2024-10-27Merge tag 'x86_urgent_for_v6.12_rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Prevent a certain range of pages which get marked as hypervisor-only, to get allocated to a CoCo (SNP) guest which cannot use them and thus fail booting - Fix the microcode loader on AMD to pay attention to the stepping of a patch and to handle the case where a BIOS config option splits the machine into logical NUMA nodes per L3 cache slice - Disable LAM from being built by default due to security concerns * tag 'x86_urgent_for_v6.12_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev: Ensure that RMP table fixups are reserved x86/microcode/AMD: Split load_microcode_amd() x86/microcode/AMD: Pay attention to the stepping dynamically x86/lam: Disable ADDRESS_MASKING in most cases
2024-10-27Merge tag 'ftrace-v6.12-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ftrace fixes from Steven Rostedt: - Fix missing mutex unlock in error path of register_ftrace_graph() A previous fix added a return on an error path and forgot to unlock the mutex. Instead of dealing with error paths, use guard(mutex) as the mutex is just released at the exit of the function anyway. Other functions in this file should be updated with this, but that's a cleanup and not a fix. - Change cpuhp setup name to be consistent with other cpuhp states The same fix that the above patch fixes added a cpuhp_setup_state() call with the name of "fgraph_idle_init". I was informed that it should instead be something like: "fgraph:online". Update that too. * tag 'ftrace-v6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: fgraph: Change the name of cpuhp state to "fgraph:online" fgraph: Fix missing unlock in register_ftrace_graph()
2024-10-27Merge tag 'platform-drivers-x86-v6.12-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Hans de Goede: - Asus thermal profile fix, fixing performance issues on Lunar Lake - Intel PMC: one revert for a lockdep issue and one bugfix - Dell WMI: Ignore some WMI events on suspend/resume to silence warnings * tag 'platform-drivers-x86-v6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: asus-wmi: Fix thermal profile initialization platform/x86: dell-wmi: Ignore suspend notifications platform/x86/intel/pmc: Fix pmc_core_iounmap to call iounmap for valid addresses platform/x86:intel/pmc: Revert "Enable the ACPI PM Timer to be turned off when suspended"
2024-10-27Merge tag 'firewire-fixes-6.12-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394 Pull firewire fix from Takashi Sakamoto: "A single commit to resolve a regression existing in v6.11 or later. The change in 1394 OHCI driver in v6.11 kernel could cause general protection faults when rediscovering nodes in IEEE 1394 bus while holding a spin lock. Consequently, watchdog checks can report a hard lockup. Currently, this issue is observed primarily during the system resume phase when using an extra node with three ports or more is used. However, it could potentially occur in the other cases as well" * tag 'firewire-fixes-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394: firewire: core: fix invalid port index for parent device
2024-10-27Merge tag 'block-6.12-20241026' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - Pull request for MD via Song fixing a few issues - Fix a wrong check in blk_rq_map_user_bvec(), causing IO errors on passthrough IO (Xinyu) * tag 'block-6.12-20241026' of git://git.kernel.dk/linux: block: fix sanity checks in blk_rq_map_user_bvec md/raid10: fix null ptr dereference in raid10_size() md: ensure child flush IO does not affect origin bio->bi_status
2024-10-27Merge tag 'xfs-6.12-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Carlos Maiolino: - Fix recovery of allocator ops after a growfs - Do not fail repairs on metadata files with no attr fork * tag 'xfs-6.12-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: update the pag for the last AG at recovery time xfs: don't use __GFP_RETRY_MAYFAIL in xfs_initialize_perag xfs: error out when a superblock buffer update reduces the agcount xfs: update the file system geometry after recoverying superblock buffers xfs: merge the perag freeing helpers xfs: pass the exact range to initialize to xfs_initialize_perag xfs: don't fail repairs on metadata files with no attr fork
2024-10-27irqchip/gic-v4: Correctly deal with set_affinity on lazily-mapped VPEsMarc Zyngier
Zenghui points out that a recent change to the way set_affinity is handled for VPEs has the potential to return an error if the VPE hasn't been mapped yet (because the guest hasn't emited a MAPTI command yet), affecting GICv4.0 implementations that rely on the ITSList feature. Fix this by making the set_affinity succeed in this case, and return early, without trying to touch the HW. Fixes: 1442ee0011983 ("irqchip/gic-v4: Don't allow a VMOVP on a dying VPE") Reported-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/all/20241027102220.1858558-1-maz@kernel.org Link: https://lore.kernel.org/r/aab45cd3-e5ca-58cf-e081-e32a17f5b4e7@huawei.com
2024-10-27genirq/msi: Fix off-by-one error in msi_domain_alloc()Jinjie Ruan
The error path in msi_domain_alloc(), frees the already allocated MSI interrupts in a loop, but the loop condition terminates when the index reaches zero, which fails to free the first allocated MSI interrupt at index zero. Check for >= 0 so that msi[0] is freed as well. Fixes: f3cf8bb0d6c3 ("genirq: Add generic msi irq domain support") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241026063639.10711-1-ruanjinjie@huawei.com
2024-10-27posix-cpu-timers: Clear TICK_DEP_BIT_POSIX_TIMER on cloneBenjamin Segall
When cloning a new thread, its posix_cputimers are not inherited, and are cleared by posix_cputimers_init(). However, this does not clear the tick dependency it creates in tsk->tick_dep_mask, and the handler does not reach the code to clear the dependency if there were no timers to begin with. Thus if a thread has a cputimer running before clone/fork, all descendants will prevent nohz_full unless they create a cputimer of their own. Fix this by entirely clearing the tick_dep_mask in copy_process(). (There is currently no inherited state that needs a tick dependency) Process-wide timers do not have this problem because fork does not copy signal_struct as a baseline, it creates one from scratch. Fixes: b78783000d5c ("posix-cpu-timers: Migrate to use new tick dependency mask model") Signed-off-by: Ben Segall <bsegall@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/xm26o737bq8o.fsf@google.com
2024-10-27firewire: core: fix invalid port index for parent deviceTakashi Sakamoto
In a commit 24b7f8e5cd65 ("firewire: core: use helper functions for self ID sequence"), the enumeration over self ID sequence was refactored with some helper functions with KUnit tests. These helper functions are guaranteed to work expectedly by the KUnit tests, however their application includes a mistake to assign invalid value to the index of port connected to parent device. This bug affects the case that any extra node devices which has three or more ports are connected to 1394 OHCI controller. In the case, the path to update the tree cache could hits WARN_ON(), and gets general protection fault due to the access to invalid address computed by the invalid value. This commit fixes the bug to assign correct port index. Cc: stable@vger.kernel.org Reported-by: Edmund Raile <edmund.raile@proton.me> Closes: https://lore.kernel.org/lkml/8a9902a4ece9329af1e1e42f5fea76861f0bf0e8.camel@proton.me/ Fixes: 24b7f8e5cd65 ("firewire: core: use helper functions for self ID sequence") Link: https://lore.kernel.org/r/20241025034137.99317-1-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-10-26platform/x86: asus-wmi: Fix thermal profile initializationArmin Wolf
When support for vivobook fan profiles was added, the initial call to throttle_thermal_policy_set_default() was removed, which however is necessary for full initialization. Fix this by calling throttle_thermal_policy_set_default() again when setting up the platform profile. Fixes: bcbfcebda2cb ("platform/x86: asus-wmi: add support for vivobook fan profiles") Reported-by: Michael Larabel <Michael@phoronix.com> Closes: https://www.phoronix.com/review/lunar-lake-xe2/5 Signed-off-by: Armin Wolf <W_Armin@gmx.de> Link: https://lore.kernel.org/r/20241025191514.15032-2-W_Armin@gmx.de Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-10-26sched/numa: Fix the potential null pointer dereference in task_numa_work()Shawn Wang
When running stress-ng-vm-segv test, we found a null pointer dereference error in task_numa_work(). Here is the backtrace: [323676.066985] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 ...... [323676.067108] CPU: 35 PID: 2694524 Comm: stress-ng-vm-se ...... [323676.067113] pstate: 23401009 (nzCv daif +PAN -UAO +TCO +DIT +SSBS BTYPE=--) [323676.067115] pc : vma_migratable+0x1c/0xd0 [323676.067122] lr : task_numa_work+0x1ec/0x4e0 [323676.067127] sp : ffff8000ada73d20 [323676.067128] x29: ffff8000ada73d20 x28: 0000000000000000 x27: 000000003e89f010 [323676.067130] x26: 0000000000080000 x25: ffff800081b5c0d8 x24: ffff800081b27000 [323676.067133] x23: 0000000000010000 x22: 0000000104d18cc0 x21: ffff0009f7158000 [323676.067135] x20: 0000000000000000 x19: 0000000000000000 x18: ffff8000ada73db8 [323676.067138] x17: 0001400000000000 x16: ffff800080df40b0 x15: 0000000000000035 [323676.067140] x14: ffff8000ada73cc8 x13: 1fffe0017cc72001 x12: ffff8000ada73cc8 [323676.067142] x11: ffff80008001160c x10: ffff000be639000c x9 : ffff8000800f4ba4 [323676.067145] x8 : ffff000810375000 x7 : ffff8000ada73974 x6 : 0000000000000001 [323676.067147] x5 : 0068000b33e26707 x4 : 0000000000000001 x3 : ffff0009f7158000 [323676.067149] x2 : 0000000000000041 x1 : 0000000000004400 x0 : 0000000000000000 [323676.067152] Call trace: [323676.067153] vma_migratable+0x1c/0xd0 [323676.067155] task_numa_work+0x1ec/0x4e0 [323676.067157] task_work_run+0x78/0xd8 [323676.067161] do_notify_resume+0x1ec/0x290 [323676.067163] el0_svc+0x150/0x160 [323676.067167] el0t_64_sync_handler+0xf8/0x128 [323676.067170] el0t_64_sync+0x17c/0x180 [323676.067173] Code: d2888001 910003fd f9000bf3 aa0003f3 (f9401000) [323676.067177] SMP: stopping secondary CPUs [323676.070184] Starting crashdump kernel... stress-ng-vm-segv in stress-ng is used to stress test the SIGSEGV error handling function of the system, which tries to cause a SIGSEGV error on return from unmapping the whole address space of the child process. Normally this program will not cause kernel crashes. But before the munmap system call returns to user mode, a potential task_numa_work() for numa balancing could be added and executed. In this scenario, since the child process has no vma after munmap, the vma_next() in task_numa_work() will return a null pointer even if the vma iterator restarts from 0. Recheck the vma pointer before dereferencing it in task_numa_work(). Fixes: 214dbc428137 ("sched: convert to vma iterator") Signed-off-by: Shawn Wang <shawnwang@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org # v6.2+ Link: https://lkml.kernel.org/r/20241025022208.125527-1-shawnwang@linux.alibaba.com
2024-10-25Input: adp5588-keys - do not try to disable interrupt 0Dmitry Torokhov
Commit dc748812fca0 ("Input: adp5588-keys - add support for pure gpio") made having interrupt line optional for the device, however it neglected to update suspend and resume handlers that try to disable interrupts for the duration of suspend. Fix this by checking if interrupt number assigned to the i2c device is not 0 before trying to disable or reenable it. Fixes: dc748812fca0 ("Input: adp5588-keys - add support for pure gpio") Link: https://lore.kernel.org/r/Zv_2jEMYSWDw2gKs@google.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2024-10-26wifi: mac80211: ieee80211_i: Fix memory corruption bug in struct ↵Gustavo A. R. Silva
ieee80211_chanctx Move the `struct ieee80211_chanctx_conf conf` to the end of `struct ieee80211_chanctx` and fix a memory corruption bug triggered e.g. in `hwsim_set_chanctx_magic()`: `radar_detected` is being overwritten when `cp->magic = HWSIM_CHANCTX_MAGIC;` See the function call sequence below: drv_add_chanctx(... struct ieee80211_chanctx *ctx) -> local->ops->add_chanctx(&local->hw, &ctx->conf) -> mac80211_hwsim_add_chanctx(... struct ieee80211_chanctx_conf *ctx) -> hwsim_set_chanctx_magic(ctx) This also happens in a number of other drivers. Also, add a code comment to try to prevent people from introducing new members after `struct ieee80211_chanctx_conf conf`. Notice that `struct ieee80211_chanctx_conf` is a flexible structure --a structure that contains a flexible-array member, so it should always be at the end of any other containing structures. This change also fixes 50 of the following warnings: net/mac80211/ieee80211_i.h:895:39: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] -Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Fixes: bca8bc0399ac ("wifi: mac80211: handle ieee80211_radar_detected() for MLO") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/ZxwWPrncTeSi1UTq@kspp [also refer to other drivers in commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-25Merge tag '9p-for-6.12-rc5' of https://github.com/martinetd/linuxLinus Torvalds
Pull more 9p reverts from Dominique Martinet: "Revert patches causing inode collision problems. The code simplification introduced significant regressions on servers that do not remap inode numbers when exporting multiple underlying filesystems with colliding inodes. See the top-most revert (commit be2ca3825372) for details. This problem had been ignored for too long and the reverts will also head to stable (6.9+). I'm confident this set of patches gets us back to previous behaviour (another related patch had already been reverted back in April and we're almost back to square 1, and the rest didn't touch inode lifecycle)" * tag '9p-for-6.12-rc5' of https://github.com/martinetd/linux: Revert "fs/9p: simplify iget to remove unnecessary paths" Revert "fs/9p: fix uaf in in v9fs_stat2inode_dotl" Revert "fs/9p: remove redundant pointer v9ses" Revert " fs/9p: mitigate inode collisions"
2024-10-25sched_ext: Fix enq_last_no_enq_fails selftestTejun Heo
cc9877fb7677 ("sched_ext: Improve error reporting during loading") changed how load failures are reported so that more error context can be communicated. This breaks the enq_last_no_enq_fails test as attach no longer fails. The scheduler is guaranteed to be ejected on attach completion with full error information. Update enq_last_no_enq_fails so that it checks that the scheduler is ejected using ops.exit(). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Vishal Chourasia <vishalc@linux.ibm.com> Link: http://lkml.kernel.org/r/Zxknp7RAVNjmdJSc@linux.ibm.com Fixes: cc9877fb7677 ("sched_ext: Improve error reporting during loading")
2024-10-25sched_ext: Make cast_mask() inlineTejun Heo
cast_mask() doesn't do any actual work and is defined in a header file. Force it to be inline. When it is not inlined and the function is not used, it can cause verificaiton failures like the following: # tools/testing/selftests/sched_ext/runner -t minimal ===== START ===== TEST: minimal DESCRIPTION: Verify we can load a fully minimal scheduler OUTPUT: libbpf: prog 'cast_mask': missing BPF prog type, check ELF section name '.text' libbpf: prog 'cast_mask': failed to load: -22 libbpf: failed to load object 'minimal' libbpf: failed to load BPF skeleton 'minimal': -22 ERR: minimal.c:20 Failed to open and load skel not ok 1 minimal # ===== END ===== Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: a748db0c8c6a ("tools/sched_ext: Receive misc updates from SCX repo")
2024-10-25scx: Fix raciness in scx_ops_bypass()David Vernet
scx_ops_bypass() can currently race on the ops enable / disable path as follows: 1. scx_ops_bypass(true) called on enable path, bypass depth is set to 1 2. An op on the init path exits, which schedules scx_ops_disable_workfn() 3. scx_ops_bypass(false) is called on the disable path, and bypass depth is decremented to 0 4. kthread is scheduled to execute scx_ops_disable_workfn() 5. scx_ops_bypass(true) called, bypass depth set to 1 6. scx_ops_bypass() races when iterating over CPUs While it's not safe to take any blocking locks on the bypass path, it is safe to take a raw spinlock which cannot be preempted. This patch therefore updates scx_ops_bypass() to use a raw spinlock to synchronize, and changes scx_ops_bypass_depth to be a regular int. Without this change, we observe the following warnings when running the 'exit' sched_ext selftest (sometimes requires a couple of runs): .[root@virtme-ng sched_ext]# ./runner -t exit ===== START ===== TEST: exit ... [ 14.935078] WARNING: CPU: 2 PID: 360 at kernel/sched/ext.c:4332 scx_ops_bypass+0x1ca/0x280 [ 14.935126] Modules linked in: [ 14.935150] CPU: 2 UID: 0 PID: 360 Comm: sched_ext_ops_h Not tainted 6.11.0-virtme #24 [ 14.935192] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 14.935242] Sched_ext: exit (enabling+all) [ 14.935244] RIP: 0010:scx_ops_bypass+0x1ca/0x280 [ 14.935300] Code: ff ff ff e8 48 96 10 00 fb e9 08 ff ff ff c6 05 7b 34 e8 01 01 90 48 c7 c7 89 86 88 87 e8 be 1d f8 ff 90 0f 0b 90 90 eb 95 90 <0f> 0b 90 41 8b 84 24 24 0a 00 00 eb 97 90 0f 0b 90 41 8b 84 24 24 [ 14.935394] RSP: 0018:ffffb706c0957ce0 EFLAGS: 00010002 [ 14.935424] RAX: 0000000000000009 RBX: 0000000000000001 RCX: 00000000e3fb8b2a [ 14.935465] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffffff88a4c080 [ 14.935512] RBP: 0000000000009b56 R08: 0000000000000004 R09: 00000003f12e520a [ 14.935555] R10: ffffffff863a9795 R11: 0000000000000000 R12: ffff8fc5fec31300 [ 14.935598] R13: ffff8fc5fec31318 R14: 0000000000000286 R15: 0000000000000018 [ 14.935642] FS: 0000000000000000(0000) GS:ffff8fc5fe680000(0000) knlGS:0000000000000000 [ 14.935684] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 14.935721] CR2: 0000557d92890b88 CR3: 000000002464a000 CR4: 0000000000750ef0 [ 14.935765] PKRU: 55555554 [ 14.935782] Call Trace: [ 14.935802] <TASK> [ 14.935823] ? __warn+0xce/0x220 [ 14.935850] ? scx_ops_bypass+0x1ca/0x280 [ 14.935881] ? report_bug+0xc1/0x160 [ 14.935909] ? handle_bug+0x61/0x90 [ 14.935934] ? exc_invalid_op+0x1a/0x50 [ 14.935959] ? asm_exc_invalid_op+0x1a/0x20 [ 14.935984] ? raw_spin_rq_lock_nested+0x15/0x30 [ 14.936019] ? scx_ops_bypass+0x1ca/0x280 [ 14.936046] ? srso_alias_return_thunk+0x5/0xfbef5 [ 14.936081] ? __pfx_scx_ops_disable_workfn+0x10/0x10 [ 14.936111] scx_ops_disable_workfn+0x146/0xac0 [ 14.936142] ? finish_task_switch+0xa9/0x2c0 [ 14.936172] ? srso_alias_return_thunk+0x5/0xfbef5 [ 14.936211] ? __pfx_scx_ops_disable_workfn+0x10/0x10 [ 14.936244] kthread_worker_fn+0x101/0x2c0 [ 14.936268] ? __pfx_kthread_worker_fn+0x10/0x10 [ 14.936299] kthread+0xec/0x110 [ 14.936327] ? __pfx_kthread+0x10/0x10 [ 14.936351] ret_from_fork+0x37/0x50 [ 14.936374] ? __pfx_kthread+0x10/0x10 [ 14.936400] ret_from_fork_asm+0x1a/0x30 [ 14.936427] </TASK> [ 14.936443] irq event stamp: 21002 [ 14.936467] hardirqs last enabled at (21001): [<ffffffff863aa35f>] resched_cpu+0x9f/0xd0 [ 14.936521] hardirqs last disabled at (21002): [<ffffffff863dd0ba>] scx_ops_bypass+0x11a/0x280 [ 14.936571] softirqs last enabled at (20642): [<ffffffff863683d7>] __irq_exit_rcu+0x67/0xd0 [ 14.936622] softirqs last disabled at (20637): [<ffffffff863683d7>] __irq_exit_rcu+0x67/0xd0 [ 14.936672] ---[ end trace 0000000000000000 ]--- [ 14.953282] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF) [ 14.953352] ------------[ cut here ]------------ [ 14.953383] WARNING: CPU: 2 PID: 360 at kernel/sched/ext.c:4335 scx_ops_bypass+0x1d8/0x280 [ 14.953428] Modules linked in: [ 14.953453] CPU: 2 UID: 0 PID: 360 Comm: sched_ext_ops_h Tainted: G W 6.11.0-virtme #24 [ 14.953505] Tainted: [W]=WARN [ 14.953527] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 14.953574] RIP: 0010:scx_ops_bypass+0x1d8/0x280 [ 14.953603] Code: c6 05 7b 34 e8 01 01 90 48 c7 c7 89 86 88 87 e8 be 1d f8 ff 90 0f 0b 90 90 eb 95 90 0f 0b 90 41 8b 84 24 24 0a 00 00 eb 97 90 <0f> 0b 90 41 8b 84 24 24 0a 00 00 eb 92 f3 0f 1e fa 49 8d 84 24 f0 [ 14.953693] RSP: 0018:ffffb706c0957ce0 EFLAGS: 00010046 [ 14.953722] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000001 [ 14.953763] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8fc5fec31318 [ 14.953804] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 [ 14.953845] R10: ffffffff863a9795 R11: 0000000000000000 R12: ffff8fc5fec31300 [ 14.953888] R13: ffff8fc5fec31318 R14: 0000000000000286 R15: 0000000000000018 [ 14.953934] FS: 0000000000000000(0000) GS:ffff8fc5fe680000(0000) knlGS:0000000000000000 [ 14.953974] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 14.954009] CR2: 0000557d92890b88 CR3: 000000002464a000 CR4: 0000000000750ef0 [ 14.954052] PKRU: 55555554 [ 14.954068] Call Trace: [ 14.954085] <TASK> [ 14.954102] ? __warn+0xce/0x220 [ 14.954126] ? scx_ops_bypass+0x1d8/0x280 [ 14.954150] ? report_bug+0xc1/0x160 [ 14.954178] ? handle_bug+0x61/0x90 [ 14.954203] ? exc_invalid_op+0x1a/0x50 [ 14.954226] ? asm_exc_invalid_op+0x1a/0x20 [ 14.954250] ? raw_spin_rq_lock_nested+0x15/0x30 [ 14.954285] ? scx_ops_bypass+0x1d8/0x280 [ 14.954311] ? __mutex_unlock_slowpath+0x3a/0x260 [ 14.954343] scx_ops_disable_workfn+0xa3e/0xac0 [ 14.954381] ? __pfx_scx_ops_disable_workfn+0x10/0x10 [ 14.954413] kthread_worker_fn+0x101/0x2c0 [ 14.954442] ? __pfx_kthread_worker_fn+0x10/0x10 [ 14.954479] kthread+0xec/0x110 [ 14.954507] ? __pfx_kthread+0x10/0x10 [ 14.954530] ret_from_fork+0x37/0x50 [ 14.954553] ? __pfx_kthread+0x10/0x10 [ 14.954576] ret_from_fork_asm+0x1a/0x30 [ 14.954603] </TASK> [ 14.954621] irq event stamp: 21002 [ 14.954644] hardirqs last enabled at (21001): [<ffffffff863aa35f>] resched_cpu+0x9f/0xd0 [ 14.954686] hardirqs last disabled at (21002): [<ffffffff863dd0ba>] scx_ops_bypass+0x11a/0x280 [ 14.954735] softirqs last enabled at (20642): [<ffffffff863683d7>] __irq_exit_rcu+0x67/0xd0 [ 14.954782] softirqs last disabled at (20637): [<ffffffff863683d7>] __irq_exit_rcu+0x67/0xd0 [ 14.954829] ---[ end trace 0000000000000000 ]--- [ 15.022283] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF) [ 15.092282] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF) [ 15.149282] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF) ok 1 exit # ===== END ===== And with it, the test passes without issue after 1000s of runs: .[root@virtme-ng sched_ext]# ./runner -t exit ===== START ===== TEST: exit DESCRIPTION: Verify we can cleanly exit a scheduler in multiple places OUTPUT: [ 7.412856] sched_ext: BPF scheduler "exit" enabled [ 7.427924] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF) [ 7.466677] sched_ext: BPF scheduler "exit" enabled [ 7.475923] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF) [ 7.512803] sched_ext: BPF scheduler "exit" enabled [ 7.532924] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF) [ 7.586809] sched_ext: BPF scheduler "exit" enabled [ 7.595926] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF) [ 7.661923] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF) [ 7.723923] sched_ext: BPF scheduler "exit" disabled (unregistered from BPF) ok 1 exit # ===== END ===== ============================= RESULTS: PASSED: 1 SKIPPED: 0 FAILED: 0 Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class") Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-10-25cxl/test: Improve init-order fidelity relative to real-world systemsDan Williams
The investigation of an initialization failure [1] highlighted that cxl_test does not reflect the init-order of real world systems. The expected order is root/bus first then async probing of the memory devices. Fix up cxl_test to reflect that order. While it did not reproduce the initial bug report (since that is dependent on built-in vs modular builds), it did reveal a separate latent bug in the subsystem's decoder shutdown flow. Fix for that sent separately. Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/172964784521.81806.15791069994065969243.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-10-25cxl/port: Prevent out-of-order decoder allocationDan Williams
With the recent change to allow out-of-order decoder de-commit it highlights a need to strengthen the in-order decoder commit guarantees. As it stands match_free_decoder() ensures that if 2 regions are racing decoder allocations the one that wins the race will get the lower id decoder, but that still leaves the race to *commit* the decoder. Rather than have this complicated case of "reserved in-order, but may still commit out-of-order", just arrange for the reservation order to match the commit-order. In other words, prevent subsequent allocations until the last reservation is committed. This precludes overlapping region creation events and requires the previous regionN to either move forward to the decoder commit stage or drop its reservation before regionN+1 can move forward. That is, provided that regionN and regionN+1 decode through the same switch port. As a side effect this allows match_free_decoder() to drop its dependency on needing write access to the device_find_child() @data parameter [1]. Reported-by: Zijun Hu <quic_zijuhu@quicinc.com> Closes: http://lore.kernel.org/20240905-const_dfc_prepare-v4-0-4180e1d5a244@quicinc.com Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/172964783668.81806.14962699553881333486.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-10-25cxl/port: Fix use-after-free, permit out-of-order decoder shutdownDan Williams
In support of investigating an initialization failure report [1], cxl_test was updated to register mock memory-devices after the mock root-port/bus device had been registered. That led to cxl_test crashing with a use-after-free bug with the following signature: cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1 cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1 cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0 1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1 [..] cxld_unregister: cxl decoder14.0: cxl_region_decode_reset: cxl_region region3: mock_decoder_reset: cxl_port port3: decoder3.0 reset 2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1 cxl_endpoint_decoder_release: cxl decoder14.0: [..] cxld_unregister: cxl decoder7.0: 3) cxl_region_decode_reset: cxl_region region3: Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI [..] RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core] [..] Call Trace: <TASK> cxl_region_decode_reset+0x69/0x190 [cxl_core] cxl_region_detach+0xe8/0x210 [cxl_core] cxl_decoder_kill_region+0x27/0x40 [cxl_core] cxld_unregister+0x5d/0x60 [cxl_core] At 1) a region has been established with 2 endpoint decoders (7.0 and 14.0). Those endpoints share a common switch-decoder in the topology (3.0). At teardown, 2), decoder14.0 is the first to be removed and hits the "out of order reset case" in the switch decoder. The effect though is that region3 cleanup is aborted leaving it in-tact and referencing decoder14.0. At 3) the second attempt to teardown region3 trips over the stale decoder14.0 object which has long since been deleted. The fix here is to recognize that the CXL specification places no mandate on in-order shutdown of switch-decoders, the driver enforces in-order allocation, and hardware enforces in-order commit. So, rather than fail and leave objects dangling, always remove them. In support of making cxl_region_decode_reset() always succeed, cxl_region_invalidate_memregion() failures are turned into warnings. Crashing the kernel is ok there since system integrity is at risk if caches cannot be managed around physical address mutation events like CXL region destruction. A new device_for_each_child_reverse_from() is added to cleanup port->commit_end after all dependent decoders have been disabled. In other words if decoders are allocated 0->1->2 and disabled 1->2->0 then port->commit_end only decrements from 2 after 2 has been disabled, and it decrements all the way to zero since 1 was disabled previously. Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] Cc: stable@vger.kernel.org Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Zijun Hu <quic_zijuhu@quicinc.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/172964782781.81806.17902885593105284330.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-10-25cxl/acpi: Ensure ports ready at cxl_acpi_probe() returnDan Williams
In order to ensure root CXL ports are enabled upon cxl_acpi_probe() when the 'cxl_port' driver is built as a module, arrange for the module to be pre-loaded or built-in. The "Fixes:" but no "Cc: stable" on this patch reflects that the issue is merely by inspection since the bug that triggered the discovery of this potential problem [1] is fixed by other means. However, a stable backport should do no harm. Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Gregory Price <gourry@gourry.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/172964781969.81806.17276352414854540808.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-10-25cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices()Dan Williams
It turns out since its original introduction, pre-2.6.12, bus_rescan_devices() has skipped devices that might be in the process of attaching or detaching from their driver. For CXL this behavior is unwanted and expects that cxl_bus_rescan() is a probe barrier. That behavior is simple enough to achieve with bus_for_each_dev() paired with call to device_attach(), and it is unclear why bus_rescan_devices() took the position of lockless consumption of dev->driver which is racy. The "Fixes:" but no "Cc: stable" on this patch reflects that the issue is merely by inspection since the bug that triggered the discovery of this potential problem [1] is fixed by other means. However, a stable backport should do no harm. Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Gregory Price <gourry@gourry.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/172964781104.81806.4277549800082443769.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-10-25cxl/port: Fix CXL port initialization order when the subsystem is built-inDan Williams
When the CXL subsystem is built-in the module init order is determined by Makefile order. That order violates expectations. The expectation is that cxl_acpi and cxl_mem can race to attach. If cxl_acpi wins the race, cxl_mem will find the enabled CXL root ports it needs. If cxl_acpi loses the race it will retrigger cxl_mem to attach via cxl_bus_rescan(). That flow only works if cxl_acpi can assume ports are enabled immediately upon cxl_acpi_probe() return. That in turn can only happen in the CONFIG_CXL_ACPI=y case if the cxl_port driver is registered before cxl_acpi_probe() runs. Fix up the order to prevent initialization failures. Ensure that cxl_port is built-in when cxl_acpi is also built-in, arrange for Makefile order to resolve the subsys_initcall() order of cxl_port and cxl_acpi, and arrange for Makefile order to resolve the device_initcall() (module_init()) order of the remaining objects. As for what contributed to this not being found earlier, the CXL regression environment, cxl_test, builds all CXL functionality as a module to allow to symbol mocking and other dynamic reload tests. As a result there is no regression coverage for the built-in case. Reported-by: Gregory Price <gourry@gourry.net> Closes: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net Tested-by: Gregory Price <gourry@gourry.net> Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") Cc: stable@vger.kernel.org Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Alejandro Lucero <alucerop@amd.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/172988474904.476062.7961350937442459266.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-10-25scsi: ufs: core: Fix another deadlock during RTC updatePeter Wang
If ufshcd_rtc_work calls ufshcd_rpm_put_sync() and the pm's usage_count is 0, we will enter the runtime suspend callback. However, the runtime suspend callback will wait to flush ufshcd_rtc_work, causing a deadlock. Replace ufshcd_rpm_put_sync() with ufshcd_rpm_put() to avoid the deadlock. Fixes: 6bf999e0eb41 ("scsi: ufs: core: Add UFS RTC support") Cc: stable@vger.kernel.org #6.11.x Signed-off-by: Peter Wang <peter.wang@mediatek.com> Link: https://lore.kernel.org/r/20241024015453.21684-1-peter.wang@mediatek.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>