linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2023-04-21	KVM: arm64: Clarify host SME state management	Mark Brown
	Normally when running a guest we do not touch the floating point register state until first use of floating point by the guest, saving the current state and loading the guest state at that point. This has been found to offer a performance benefit in common cases. However currently if SME is active when switching to a guest then we exit streaming mode, disable ZA and invalidate the floating point register state prior to starting the guest. The exit from streaming mode is required for correct guest operation, if we leave streaming mode enabled then many non-SME operations can generate SME traps (eg, SVE operations will become streaming SVE operations). If EL1 leaves CPACR_EL1.SMEN disabled then the host is unable to intercept these traps. This will mean that a SME unaware guest will see SME exceptions which will confuse it. Disabling streaming mode also avoids creating spurious indications of usage of the SME hardware which could impact system performance, especially with shared SME implementations. Document the requirement to exit streaming mode clearly. There is no issue with guest operation caused by PSTATE.ZA so we can defer handling for that until first floating point usage, do so if the register state is not that of the current task and hence has already been saved. We could also do this for the case where the register state is that for the current task however this is very unlikely to happen and would require disproportionate effort so continue to save the state in that case. Saving this state on first use would require that we map and unmap storage for the host version of these registers for use by the hypervisor, taking care to deal with protected KVM and the fact that the host can free or reallocate the backing storage. Given that the strong recommendation is that applications should only keep PSTATE.ZA enabled when the state it enables is in active use it is difficult to see a case where a VMM would wish to do this, it would need to not only be using SME but also running the guest in the middle of SME usage. This can be revisited in the future if a use case does arises, in the interim such tasks will work but experience a performance overhead. This brings our handling of SME more into line with our handling of other floating point state and documents more clearly the constraints we have, especially around streaming mode. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221214-kvm-arm64-sme-context-switch-v2-3-57ba0082e9ff@kernel.org
2023-04-21	KVM: arm64: Restructure check for SVE support in FP trap handler	Mark Brown
	We share the same handler for general floating point and SVE traps with a check to make sure we don't handle any SVE traps if the system doesn't have SVE support. Since we will be adding SME support and wishing to handle that along with other FP related traps rewrite the check to be more scalable and a bit clearer too, ensuring we don't misidentify SME traps as SVE ones. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221214-kvm-arm64-sme-context-switch-v2-2-57ba0082e9ff@kernel.org
2023-04-21	KVM: arm64: Document check for TIF_FOREIGN_FPSTATE	Mark Brown
	In kvm_arch_vcpu_load_fp() we unconditionally set the current FP state to FP_STATE_HOST_OWNED, this will be overridden to FP_STATE_NONE if TIF_FOREIGN_FPSTATE is set but the check is deferred until kvm_arch_vcpu_ctxflush_fp() where we are no longer preemptable. Add a comment to this effect to help avoid people being concerned about the lack of a check and discover where the check is done. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221214-kvm-arm64-sme-context-switch-v2-1-57ba0082e9ff@kernel.org
2023-04-21	KVM: arm64: Fix repeated words in comments	Jingyu Wang
	Delete the redundant word 'to'. Signed-off-by: Jingyu Wang <jingyuwang_vip@163.com> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230309075919.169518-1-jingyuwang_vip@163.com
2023-04-21	spi: bcm63xx: remove PM_SLEEP based conditional compilation	Dhruva Gole
	Get rid of conditional compilation based on CONFIG_PM_SLEEP because it may introduce build issues with certain configs where it maybe disabled This is because if above config is not enabled the suspend-resume functions are never part of the code but the bcm63xx_spi_pm_ops struct still inits them to non-existent suspend-resume functions. Fixes: b42dfed83d95 ("spi: add Broadcom BCM63xx SPI controller driver") Signed-off-by: Dhruva Gole <d-gole@ti.com> Link: https://lore.kernel.org/r/20230420121615.967487-1-d-gole@ti.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-04-21	ASoC: fsl: imx-audmix: remove dummy dai_link->platform	Kuninori Morimoto
	Dummy dai_link->platform is not needed. Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Acked-by: Shengjiu Wang <shengjiu.wang@gmail.com> Tested-by: Shengjiu Wang <shengjiu.wang@gmail.com> Link: https://lore.kernel.org/r/877cu6f619.wl-kuninori.morimoto.gx@renesas.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-04-21	ASoC: fsl: imx-audmix: cleanup platform which is using Generic DMA	Kuninori Morimoto
	If CPU is using soc-generic-dmaengine-pcm, Platform Component will be same as CPU Component. In this case, we can use CPU dlc for Platform dlc. This patch shares CPU dlc with Platform, and add comment. Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Acked-by: Shengjiu Wang <shengjiu.wang@gmail.com> Tested-by: Shengjiu Wang <shengjiu.wang@gmail.com> Link: https://lore.kernel.org/r/878remf61j.wl-kuninori.morimoto.gx@renesas.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-04-21	ASoC: fsl: imx-spdif: cleanup platform which is using Generic DMA	Kuninori Morimoto
	If CPU is using soc-generic-dmaengine-pcm, Platform Component will be same as CPU Component. In this case, we can use CPU dlc for Platform dlc. This patch shares CPU dlc with Platform, and add comment. Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Acked-by: Shengjiu Wang <shengjiu.wang@gmail.com> Tested-by: Shengjiu Wang <shengjiu.wang@gmail.com> Link: https://lore.kernel.org/r/87a5z2f61w.wl-kuninori.morimoto.gx@renesas.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-04-21	ASoC: fsl: imx-es8328: cleanup platform which is using Generic DMA	Kuninori Morimoto
	If CPU is using soc-generic-dmaengine-pcm, Platform Component will be same as CPU Component. In this case, we can use CPU dlc for Platform dlc. This patch shares CPU dlc with Platform, and add comment. Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Acked-by: Shengjiu Wang <shengjiu.wang@gmail.com> Tested-by: Shengjiu Wang <shengjiu.wang@gmail.com> Link: https://lore.kernel.org/r/87bkjif628.wl-kuninori.morimoto.gx@renesas.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-04-21	RISC-V: KVM: Virtualize per-HART AIA CSRs	Anup Patel
	The AIA specification introduce per-HART AIA CSRs which primarily support: * 64 local interrupts on both RV64 and RV32 * priority for each of the 64 local interrupts * interrupt filtering for local interrupts This patch virtualize above mentioned AIA CSRs and also extend ONE_REG interface to allow user-space save/restore Guest/VM view of these CSRs. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-04-21	firmware/sysfb: Fix VESA format selection	Pierre Asselin
	Some legacy BIOSes report no reserved bits in their 32-bit rgb mode, breaking the calculation of bits_per_pixel in commit f35cd3fa7729 ("firmware/sysfb: Fix EFI/VESA format selection"). However they report lfb_depth correctly for those modes. Keep the computation but set bits_per_pixel to lfb_depth if the latter is larger. v2 fixes the warnings from a max3() macro with arguments of different types; split the bits_per_pixel assignment to avoid uglyfing the code with too many typecasts. v3 fixes space and formatting blips pointed out by Javier, and change the bit_per_pixel assignment back to a single statement using two casts. v4 go back to v2 and use max_t() Signed-off-by: Pierre Asselin <pa@panix.com> Fixes: f35cd3fa7729 ("firmware/sysfb: Fix EFI/VESA format selection") Link: https://lore.kernel.org/r/4Psm6B6Lqkz1QXM@panix3.panix.com Link: https://lore.kernel.org/r/20230412150225.3757223-1-javierm@redhat.com Tested-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230419044834.10816-1-pa@panix.com
2023-04-21	RISC-V: KVM: Use bitmap for irqs_pending and irqs_pending_mask	Anup Patel
	To support 64 VCPU local interrupts on RV32 host, we should use bitmap for irqs_pending and irqs_pending_mask in struct kvm_vcpu_arch. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-04-21	RISC-V: KVM: Add ONE_REG interface for AIA CSRs	Anup Patel
	We implement ONE_REG interface for AIA CSRs as a separate subtype under the CSR ONE_REG interface. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-04-21	RISC-V: KVM: Implement subtype for CSR ONE_REG interface	Anup Patel
	To make the CSR ONE_REG interface extensible, we implement subtype for the CSR ONE_REG IDs. The existing CSR ONE_REG IDs are treated as subtype = 0 (aka General CSRs). Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-04-21	RISC-V: KVM: Initial skeletal support for AIA	Anup Patel
	To incrementally implement AIA support, we first add minimal skeletal support which only compiles and detects AIA hardware support at the boot-time but does not provide any functionality. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-04-21	RISC-V: KVM: Drop the _MASK suffix from hgatp.VMID mask defines	Anup Patel
	The hgatp.VMID mask defines are used before shifting when extracting VMID value from hgatp CSR value so based on the convention followed in the other parts of asm/csr.h, the hgatp.VMID mask defines should not have a _MASK suffix. While we are here, let's use GENMASK() for hgatp.VMID and hgatp.PPN. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-04-21	RISC-V: Detect AIA CSRs from ISA string	Anup Patel
	We have two extension names for AIA ISA support: Smaia (M-mode AIA CSRs) and Ssaia (S-mode AIA CSRs). We extend the ISA string parsing to detect Smaia and Ssaia extensions. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Anup Patel <anup@brainfault.org> Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-21	RISC-V: Add AIA related CSR defines	Anup Patel
	The RISC-V AIA specification improves handling per-HART local interrupts in a backward compatible manner. This patch adds defines for new RISC-V AIA CSRs. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org> Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-21	RISC-V: KVM: Allow Zbb extension for Guest/VM	Anup Patel
	We extend the KVM ISA extension ONE_REG interface to allow KVM user space to detect and enable Zbb extension for Guest/VM. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-04-21	RISC-V: KVM: Add ONE_REG interface to enable/disable SBI extensions	Anup Patel
	We add ONE_REG interface to enable/disable SBI extensions (just like the ONE_REG interface for ISA extensions). This allows KVM user-space to decide the set of SBI extension enabled for a Guest and by default all SBI extensions are enabled. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-04-21	RISC-V: KVM: Alphabetize selects	Andrew Jones
	While alphabetized lists tend to become unalphabetized almost as quickly as they get fixed up, it is preferred to keep select lists in Kconfigs in order. Let's fix KVM's up. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-04-21	KVM: RISC-V: Retry fault if vma_lookup() results become invalid	David Matlack
	Read mmu_invalidate_seq before dropping the mmap_lock so that KVM can detect if the results of vma_lookup() (e.g. vma_shift) become stale before it acquires kvm->mmu_lock. This fixes a theoretical bug where a VMA could be changed by userspace after vma_lookup() and before KVM reads the mmu_invalidate_seq, causing KVM to install page table entries based on a (possibly) no-longer-valid vma_shift. Re-order the MMU cache top-up to earlier in user_mem_abort() so that it is not done after KVM has read mmu_invalidate_seq (i.e. so as to avoid inducing spurious fault retries). It's unlikely that any sane userspace currently modifies VMAs in such a way as to trigger this race. And even with directed testing I was unable to reproduce it. But a sufficiently motivated host userspace might be able to exploit this race. Note KVM/ARM had the same bug and was fixed in a separate, near identical patch (see Link). Link: https://lore.kernel.org/kvm/20230313235454.2964067-1-dmatlack@google.com/ Fixes: 9955371cc014 ("RISC-V: KVM: Implement MMU notifiers") Cc: stable@vger.kernel.org Signed-off-by: David Matlack <dmatlack@google.com> Tested-by: Anup Patel <anup@brainfault.org> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-04-21	perf/x86/intel/uncore: Add events for Intel SPR IMC PMU	Stephane Eranian
	Add missing clockticks and cas_count_* events for Intel SapphireRapids IMC PMU. These events are useful to measure memory bandwidth. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230419214241.2310385-1-eranian@google.com
2023-04-21	sched/clock: Fix local_clock() before sched_clock_init()	Aaron Thompson
	Have local_clock() return sched_clock() if sched_clock_init() has not yet run. sched_clock_cpu() has this check but it was not included in the new noinstr implementation of local_clock(). The effect can be seen on x86 with CONFIG_PRINTK_TIME enabled, for instance. scd->clock quickly reaches the value of TICK_NSEC and that value is returned until sched_clock_init() runs. dmesg without this patch: [ 0.000000] kvm-clock: ... [ 0.000002] kvm-clock: ... [ 0.000672] clocksource: ... [ 0.001000] tsc: ... [ 0.001000] e820: ... [ 0.001000] e820: ... ... [ 0.001000] ..TIMER: ... [ 0.001000] clocksource: ... [ 0.378956] Calibrating delay loop ... [ 0.379955] pid_max: ... dmesg with this patch: [ 0.000000] kvm-clock: ... [ 0.000001] kvm-clock: ... [ 0.000675] clocksource: ... [ 0.002685] tsc: ... [ 0.003331] e820: ... [ 0.004190] e820: ... ... [ 0.421939] ..TIMER: ... [ 0.422842] clocksource: ... [ 0.424582] Calibrating delay loop ... [ 0.425580] pid_max: ... Fixes: 776f22913b8e ("sched/clock: Make local_clock() noinstr") Signed-off-by: Aaron Thompson <dev@aaront.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230413175012.2201-1-dev@aaront.org
2023-04-21	sched/rt: Fix bad task migration for rt tasks	Schspa Shi
	Commit 95158a89dd50 ("sched,rt: Use the full cpumask for balancing") allows find_lock_lowest_rq() to pick a task with migration disabled. The purpose of the commit is to push the current running task on the CPU that has the migrate_disable() task away. However, there is a race which allows a migrate_disable() task to be migrated. Consider: CPU0 CPU1 push_rt_task check is_migration_disabled(next_task) task not running and migration_disabled == 0 find_lock_lowest_rq(next_task, rq); _double_lock_balance(this_rq, busiest); raw_spin_rq_unlock(this_rq); double_rq_lock(this_rq, busiest); <<wait for busiest rq>> <wakeup> task become running migrate_disable(); <context out> deactivate_task(rq, next_task, 0); set_task_cpu(next_task, lowest_rq->cpu); WARN_ON_ONCE(is_migration_disabled(p)); Fixes: 95158a89dd50 ("sched,rt: Use the full cpumask for balancing") Signed-off-by: Schspa Shi <schspa@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Tested-by: Dwaine Gonyier <dgonyier@redhat.com>
2023-04-21	sched: Fix performance regression introduced by mm_cid	Mathieu Desnoyers
	Introduce per-mm/cpu current concurrency id (mm_cid) to fix a PostgreSQL sysbench regression reported by Aaron Lu. Keep track of the currently allocated mm_cid for each mm/cpu rather than freeing them immediately on context switch. This eliminates most atomic operations when context switching back and forth between threads belonging to different memory spaces in multi-threaded scenarios (many processes, each with many threads). The per-mm/per-cpu mm_cid values are serialized by their respective runqueue locks. Thread migration is handled by introducing invocation to sched_mm_cid_migrate_to() (with destination runqueue lock held) in activate_task() for migrating tasks. If the destination cpu's mm_cid is unset, and if the source runqueue is not actively using its mm_cid, then the source cpu's mm_cid is moved to the destination cpu on migration. Introduce a task-work executed periodically, similarly to NUMA work, which delays reclaim of cid values when they are unused for a period of time. Keep track of the allocation time for each per-cpu cid, and let the task work clear them when they are observed to be older than SCHED_MM_CID_PERIOD_NS and unused. This task work also clears all mm_cids which are greater or equal to the Hamming weight of the mm cidmask to keep concurrency ids compact. Because we want to ensure the mm_cid converges towards the smaller values as migrations happen, the prior optimization that was done when context switching between threads belonging to the same mm is removed, because it could delay the lazy release of the destination runqueue mm_cid after it has been replaced by a migration. Removing this prior optimization is not an issue performance-wise because the introduced per-mm/per-cpu mm_cid tracking also covers this more specific case. Fixes: af7f588d8f73 ("sched: Introduce per-memory-map concurrency ID") Reported-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Aaron Lu <aaron.lu@intel.com> Link: https://lore.kernel.org/lkml/20230327080502.GA570847@ziqianlu-desk2/
2023-04-21	Merge branch 'v6.3-rc7'	Peter Zijlstra
	Sync with the urgent patches; in particular: a53ce18cacb4 ("sched/fair: Sanitize vruntime of entity being migrated") Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2023-04-21	net/packet: support mergeable feature of virtio	Jianfeng Tan
	Packet sockets, like tap, can be used as the backend for kernel vhost. In packet sockets, virtio net header size is currently hardcoded to be the size of struct virtio_net_hdr, which is 10 bytes; however, it is not always the case: some virtio features, such as mrg_rxbuf, need virtio net header to be 12-byte long. Mergeable buffers, as a virtio feature, is worthy of supporting: packets that are larger than one-mbuf size will be dropped in vhost worker's handle_rx if mrg_rxbuf feature is not used, but large packets cannot be avoided and increasing mbuf's size is not economical. With this virtio feature enabled by virtio-user, packet sockets with hardcoded 10-byte virtio net header will parse mac head incorrectly in packet_snd by taking the last two bytes of virtio net header as part of mac header. This incorrect mac header parsing will cause packet to be dropped due to invalid ether head checking in later under-layer device packet receiving. By adding extra field vnet_hdr_sz with utilizing holes in struct packet_sock to record currently used virtio net header size and supporting extra sockopt PACKET_VNET_HDR_SZ to set specified vnet_hdr_sz, packet sockets can know the exact length of virtio net header that virtio user gives. In packet_snd, tpacket_snd and packet_recvmsg, instead of using hardcoded virtio net header size, it can get the exact vnet_hdr_sz from corresponding packet_sock, and parse mac header correctly based on this information to avoid the packets being mistakenly dropped. Signed-off-by: Jianfeng Tan <henry.tjf@antgroup.com> Co-developed-by: Anqi Shen <amy.saq@antgroup.com> Signed-off-by: Anqi Shen <amy.saq@antgroup.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21	Merge branch 'mlx5-ipsec-fixes'	David S. Miller
	Leon Romanovsky says: ==================== Fixes to mlx5 IPsec implementation This small patchset includes various fixes and one refactoring patch which I collected for the features sent in this cycle, with one exception - first patch. First patch fixes code which was introduced in previous cycle, however I was able to trigger FW error only in custom debug code, so don't see a need to send it to net-rc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21	net/mlx5e: Refactor duplicated code in mlx5e_ipsec_init_macs	Leon Romanovsky
	ARP discovery code has same logic for RX and TX flows, but with different source and destination fields. Instead of duplicating same code in mlx5e_ipsec_init_macs, let's refactor. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21	net/mlx5e: Properly release work data structure	Leon Romanovsky
	There are some flows in which work structure is not allocated at all and it is needed to be checked prior release of data structure. general protection fault, probably for non-canonical address 0xdffffc000000000a: 0000 [#1] SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000050-0x0000000000000057] CPU: 6 PID: 3486 Comm: kworker/6:0 Not tainted 6.3.0-rc5_for_upstream_debug_2023_04_06_11_01 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: events xfrm_state_gc_task RIP: 0010:mlx5e_xfrm_free_state+0x177/0x260 [mlx5_core] Code: c1 ea 03 80 3c 02 00 0f 85 f5 00 00 00 4c 8b a5 08 01 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 50 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 b7 00 00 00 49 8b 7c 24 50 e8 85 7c 09 e0 4c 89 RSP: 0018:ffff888137a8fc50 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: ffff888180398000 RCX: 0000000000000000 RDX: 000000000000000a RSI: ffffffffa1878227 RDI: 0000000000000050 RBP: ffff88812a0c8000 R08: ffff888137a8fb60 R09: 0000000000000000 R10: fffffbfff09aba0c R11: 0000000000000001 R12: 0000000000000000 R13: ffff88812a0c8108 R14: ffffffff84c63480 R15: ffff8881acb63118 FS: 0000000000000000(0000) GS:ffff88881eb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f667e8bc000 CR3: 0000000004693006 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ___xfrm_state_destroy+0x3c8/0x5e0 xfrm_state_gc_task+0xf6/0x140 ? ___xfrm_state_destroy+0x5e0/0x5e0 process_one_work+0x7c2/0x1340 ? lockdep_hardirqs_on_prepare+0x3f0/0x3f0 ? pwq_dec_nr_in_flight+0x230/0x230 ? spin_bug+0x1d0/0x1d0 worker_thread+0x59d/0xec0 ? __kthread_parkme+0xd9/0x1d0 ? process_one_work+0x1340/0x1340 kthread+0x28f/0x330 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 Modules linked in: sch_ingress openvswitch nsh mlx5_vdpa vringh vhost_iotlb vdpa mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm ib_uverbs ib_core vfio_pci vfio_pci_core vfio_iommu_type1 vfio cuse overlay zram zsmalloc fuse [last unloaded: mlx5_core] ---[ end trace 0000000000000000 ]--- Fixes: 4562116f8a56 ("net/mlx5e: Generalize IPsec work structs") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21	net/mlx5e: Compare all fields in IPv6 address	Leon Romanovsky
	Fix size argument in memcmp to compare whole IPv6 address. Fixes: b3beba1fb404 ("net/mlx5e: Allow policies with reqid 0, to support IKE policy holes") Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Emeel Hakim <ehakim@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21	net/mlx5e: Don't overwrite extack message returned from IPsec SA validator	Leon Romanovsky
	Addition of new err_xfrm label caused to error messages be overwritten. Fix it by using proper NL_SET_ERR_MSG_WEAK_MOD macro together with change in a default message. Fixes: aa8bd0c9518c ("net/mlx5e: Support IPsec acquire default SA") Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21	net/mlx5e: Fix FW error while setting IPsec policy block action	Leon Romanovsky
	When trying to set IPsec policy block action the following error is generated: mlx5_cmd_out_err:803:(pid 3426): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3), err(-22) This error means that drop action is not allowed when modify action is set, so update the code to skip modify header for XFRM_POLICY_BLOCK action. Fixes: 6721239672fe ("net/mlx5e: Skip IPsec encryption for TX path without matching policy") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21	net: stmmac:fix system hang when setting up tag_8021q VLAN for DSA ports	Yan Wang
	The system hang because of dsa_tag_8021q_port_setup()-> stmmac_vlan_rx_add_vid(). I found in stmmac_drv_probe() that cailing pm_runtime_put() disabled the clock. First, when the kernel is compiled with CONFIG_PM=y,The stmmac's resume/suspend is active. Secondly,stmmac as DSA master,the dsa_tag_8021q_port_setup() function will callback stmmac_vlan_rx_add_vid when DSA dirver starts. However, The system is hanged for the stmmac_vlan_rx_add_vid() accesses its registers after stmmac's clock is closed. I would suggest adding the pm_runtime_resume_and_get() to the stmmac_vlan_rx_add_vid().This guarantees that resuming clock output while in use. Fixes: b3dcb3127786 ("net: stmmac: correct clocks enabled in stmmac_vlan_rx_kill_vid()") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Yan Wang <rk.code@outlook.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21	sh: sq: Use the bitmap API when applicable	Christophe JAILLET
	Using the bitmap API is less verbose than hand writing it. It also improves the semantic. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/a51e9f32c19a007f4922943282cb12c89064440d.1681671848.git.christophe.jaillet@wanadoo.fr Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2023-04-21	ALSA: pcm: rewrite snd_pcm_playback_silence()	Oswald Buddenhagen
	The auto-silencer supports two modes: "thresholded" to fill up "just enough", and "top-up" to fill up "as much as possible". The two modes used rather distinct code paths, which this patch unifies. The only remaining distinction is how much we actually want to fill. This fixes a bug in thresholded mode, where we failed to use new_hw_ptr, resulting in under-fill. Top-up mode is now more well-behaved and much easier to understand in corner cases. This also updates comments in the proximity of silencing-related data structures. Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de> Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230420113324.877164-1-oswald.buddenhagen@gmx.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-04-21	KVM: arm64: Constify start/end/phys fields of the pgtable walker data	Marc Zyngier
	As we are revamping the way the pgtable walker evaluates some of the data, make it clear that we rely on somew of the fields to be constant across the lifetime of a walk. For this, flag the start, end and phys fields of the walk data as 'const', which will generate an error if we were to accidentally update these fields again. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-04-21	KVM: arm64: Infer PA offset from VA in hyp map walker	Oliver Upton
	Similar to the recently fixed stage-2 walker, the hyp map walker increments the PA and VA of a walk separately. Unlike stage-2, there is no bug here as the map walker has exclusive access to the stage-1 page tables. Nonetheless, in the interest of continuity throughout the page table code, tweak the hyp map walker to avoid incrementing the PA and instead use the VA as the authoritative source of how far along a table walk has gotten. Calculate the PA to use for a leaf PTE by adding the offset of the VA from the start of the walk to the starting PA. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230421071606.1603916-3-oliver.upton@linux.dev
2023-04-21	KVM: arm64: Infer the PA offset from IPA in stage-2 map walker	Oliver Upton
	Until now, the page table walker counted increments to the PA and IPA of a walk in two separate places. While the PA is incremented as soon as a leaf PTE is installed in stage2_map_walker_try_leaf(), the IPA is actually bumped in the generic table walker context. Critically, __kvm_pgtable_visit() rereads the PTE after the LEAF callback returns to work out if a table or leaf was installed, and only bumps the IPA for a leaf PTE. This arrangement worked fine when we handled faults behind the write lock, as the walker had exclusive access to the stage-2 page tables. However, commit 1577cb5823ce ("KVM: arm64: Handle stage-2 faults in parallel") started handling all stage-2 faults behind the read lock, opening up a race where a walker could increment the PA but not the IPA of a walk. Nothing good ensues, as the walker starts mapping with the incorrect IPA -> PA relationship. For example, assume that two vCPUs took a data abort on the same IPA. One observes that dirty logging is disabled, and the other observed that it is enabled: vCPU attempting PMD mapping vCPU attempting PTE mapping ====================================== ===================================== /* install PMD / stage2_make_pte(ctx, leaf); data->phys += granule; / replace PMD with a table / stage2_try_break_pte(ctx, data->mmu); stage2_make_pte(ctx, table); / table is observed / ctx.old = READ_ONCE(ptep); table = kvm_pte_table(ctx.old, level); /* * map walk continues w/o incrementing * IPA. */ __kvm_pgtable_walk(..., level + 1); Bring an end to the whole mess by using the IPA as the single source of truth for how far along a walk has gotten. Work out the correct PA to map by calculating the IPA offset from the beginning of the walk and add that to the starting physical address. Cc: stable@vger.kernel.org Fixes: 1577cb5823ce ("KVM: arm64: Handle stage-2 faults in parallel") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230421071606.1603916-2-oliver.upton@linux.dev
2023-04-21	pinctrl-bcm2835.c: fix race condition when setting gpio dir	Hans Verkuil
	In the past setting the pin direction called pinctrl_gpio_direction() which uses a mutex to serialize this. That was changed to set the direction directly in the pin controller driver, but that lost the serialization mechanism. Since the direction of multiple pins are in the same register you can have a race condition, something that was in fact observed with the cec-gpio driver. Add a new spinlock to serialize writing to the FSEL registers. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Fixes: 1a4541b68e25 ("pinctrl-bcm2835: don't call pinctrl_gpio_direction()") Link: https://lore.kernel.org/r/4302b66b-ca20-0f19-d2aa-ee8661118863@xs4all.nl Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-04-21	dt-bindings: pinctrl: qcom,sm8150: Drop duplicate function value "atest_usb2"	Rob Herring
	The enum value "atest_usb2" appears twice. Remove the duplicate. The meta-schema normally catches these, but schemas under "$defs" was not getting checked. A fix for that is pending. Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20230418150613.1528233-1-robh@kernel.org Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-04-21	Merge branch kvm-arm64/spec-ptw into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/spec-ptw: : . : On taking an exception from EL1&0 to EL2(&0), the page table walker is : allowed to carry on with speculative walks started from EL1&0 while : running at EL2 (see R_LFHQG). Given that the PTW may be actively using : the EL1&0 system registers, the only safe way to deal with it is to : issue a DSB before changing any of it. : : We already did the right thing for SPE and TRBE, but ignored the PTW : for unknown reasons (probably because the architecture wasn't crystal : clear at the time). : : This requires a bit of surgery in the nvhe code, though most of these : patches are comments so that my future self can understand the purpose : of these barriers. The VHE code is largely unaffected, thanks to the : DSB in the context switch. : . KVM: arm64: vhe: Drop extra isb() on guest exit KVM: arm64: vhe: Synchronise with page table walker on MMU update KVM: arm64: pkvm: Document the side effects of kvm_flush_dcache_to_poc() KVM: arm64: nvhe: Synchronise with page table walker on TLBI KVM: arm64: nvhe: Synchronise with page table walker on vcpu run Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-04-21	Merge branch kvm-arm64/smccc-filtering into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/smccc-filtering: : . : SMCCC call filtering and forwarding to userspace, courtesy of : Oliver Upton. From the cover letter: : : "The Arm SMCCC is rather prescriptive in regards to the allocation of : SMCCC function ID ranges. Many of the hypercall ranges have an : associated specification from Arm (FF-A, PSCI, SDEI, etc.) with some : room for vendor-specific implementations. : : The ever-expanding SMCCC surface leaves a lot of work within KVM for : providing new features. Furthermore, KVM implements its own : vendor-specific ABI, with little room for other implementations (like : Hyper-V, for example). Rather than cramming it all into the kernel we : should provide a way for userspace to handle hypercalls." : . KVM: selftests: Fix spelling mistake "KVM_HYPERCAL_EXIT_SMC" -> "KVM_HYPERCALL_EXIT_SMC" KVM: arm64: Test that SMC64 arch calls are reserved KVM: arm64: Prevent userspace from handling SMC64 arch range KVM: arm64: Expose SMC/HVC width to userspace KVM: selftests: Add test for SMCCC filter KVM: selftests: Add a helper for SMCCC calls with SMC instruction KVM: arm64: Let errors from SMCCC emulation to reach userspace KVM: arm64: Return NOT_SUPPORTED to guest for unknown PSCI version KVM: arm64: Introduce support for userspace SMCCC filtering KVM: arm64: Add support for KVM_EXIT_HYPERCALL KVM: arm64: Use a maple tree to represent the SMCCC filter KVM: arm64: Refactor hvc filtering to support different actions KVM: arm64: Start handling SMCs from EL1 KVM: arm64: Rename SMC/HVC call handler to reflect reality KVM: arm64: Add vm fd device attribute accessors KVM: arm64: Add a helper to check if a VM has ran once KVM: x86: Redefine 'longmode' as a flag for KVM_EXIT_HYPERCALL Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-04-21	Merge branch kvm-arm64/selftest/misc-6.4 into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/selftest/misc-6.4: : . : Misc selftest updates for 6.4 : : - Add comments for recently added ID registers : . KVM: selftests: Comment newly defined aarch64 ID registers Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-04-21	dt-bindings: pinctrl: qcom: Add few missing functions	Devi Priya
	Added the missing functions cri_trng2, gpio and removed the duplicate entry qdss_tracedata_b Fixes: 5b63ccb69ee8 ("dt-bindings: pinctrl: qcom: Add support for IPQ9574") Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Devi Priya <quic_devipriy@quicinc.com> Link: https://lore.kernel.org/r/20230417061337.6552-1-quic_devipriy@quicinc.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-04-21	Merge branch kvm-arm64/selftest/lpa into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/selftest/lpa: : . : Selftest fixes addressing PTE and TTBR0_EL1 encodings for : 52bit PAs : . KVM: selftests: arm64: Fix ttbr0_el1 encoding for PA bits > 48 KVM: selftests: arm64: Fix pte encode/decode for PA bits > 48 KVM: selftests: Fixup config fragment for access_tracking_perf_test Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-04-21	Merge branch kvm-arm64/timer-vm-offsets into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/timer-vm-offsets: (21 commits) : . : This series aims at satisfying multiple goals: : : - allow a VMM to atomically restore a timer offset for a whole VM : instead of updating the offset each time a vcpu get its counter : written : : - allow a VMM to save/restore the physical timer context, something : that we cannot do at the moment due to the lack of offsetting : : - provide a framework that is suitable for NV support, where we get : both global and per timer, per vcpu offsetting, and manage : interrupts in a less braindead way. : : Conflict resolution involves using the new per-vcpu config lock instead : of the home-grown timer lock. : . KVM: arm64: Handle 32bit CNTPCTSS traps KVM: arm64: selftests: Augment existing timer test to handle variable offset KVM: arm64: selftests: Deal with spurious timer interrupts KVM: arm64: selftests: Add physical timer registers to the sysreg list KVM: arm64: nv: timers: Support hyp timer emulation KVM: arm64: nv: timers: Add a per-timer, per-vcpu offset KVM: arm64: Document KVM_ARM_SET_CNT_OFFSETS and co KVM: arm64: timers: Abstract the number of valid timers per vcpu KVM: arm64: timers: Fast-track CNTPCT_EL0 trap handling KVM: arm64: Elide kern_hyp_va() in VHE-specific parts of the hypervisor KVM: arm64: timers: Move the timer IRQs into arch_timer_vm_data KVM: arm64: timers: Abstract per-timer IRQ access KVM: arm64: timers: Rationalise per-vcpu timer init KVM: arm64: timers: Allow save/restoring of the physical timer KVM: arm64: timers: Allow userspace to set the global counter offset KVM: arm64: Expose {un,}lock_all_vcpus() to the rest of KVM KVM: arm64: timers: Allow physical offset without CNTPOFF_EL2 KVM: arm64: timers: Use CNTPOFF_EL2 to offset the physical timer arm64: Add HAS_ECV_CNTPOFF capability arm64: Add CNTPOFF_EL2 register definition ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-04-21	Merge branch kvm-arm64/lock-inversion into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/lock-inversion: : . : vm/vcpu lock inversion fixes, courtesy of Oliver Upton, plus a few : extra fixes from both Oliver and Reiji Watanabe. : : From the initial cover letter: : : As it so happens, lock ordering in KVM/arm64 is completely backwards. : There's a significant amount of VM-wide state that needs to be accessed : from the context of a vCPU. Until now, this was accomplished by : acquiring the kvm->lock, but that cannot be nested within vcpu->mutex. : : This series fixes the issue with some fine-grained locking for MP state : and a new, dedicated mutex that can nest with both kvm->lock and : vcpu->mutex. : . KVM: arm64: Have kvm_psci_vcpu_on() use WRITE_ONCE() to update mp_state KVM: arm64: Acquire mp_state_lock in kvm_arch_vcpu_ioctl_vcpu_init() KVM: arm64: vgic: Don't acquire its_lock before config_lock KVM: arm64: Use config_lock to protect vgic state KVM: arm64: Use config_lock to protect data ordered against KVM_RUN KVM: arm64: Avoid lock inversion when setting the VM register width KVM: arm64: Avoid vcpu->mutex v. kvm->lock inversion in CPU_ON Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-04-21	pinctrl: qcom: spmi-gpio: Add PMI632 support	Luca Weiss
	Add support for the 8 GPIOs found on PMI632. Signed-off-by: Luca Weiss <luca@z3ntu.xyz> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20230414-pmi632-v2-2-98bafa909c36@z3ntu.xyz Signed-off-by: Linus Walleij <linus.walleij@linaro.org>