summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-12workqueue: Override implicit ordered attribute in ↵Waiman Long
workqueue_apply_unbound_cpumask() Commit 5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered") enabled implicit ordered attribute to be added to WQ_UNBOUND workqueues with max_active of 1. This prevented the changing of attributes to these workqueues leading to fix commit 0a94efb5acbb ("workqueue: implicit ordered attribute should be overridable"). However, workqueue_apply_unbound_cpumask() was not updated at that time. So sysfs changes to wq_unbound_cpumask has no effect on WQ_UNBOUND workqueues with implicit ordered attribute. Since not all WQ_UNBOUND workqueues are visible on sysfs, we are not able to make all the necessary cpumask changes even if we iterates all the workqueue cpumasks in sysfs and changing them one by one. Fix this problem by applying the corresponding change made to apply_workqueue_attrs_locked() in the fix commit to workqueue_apply_unbound_cpumask(). Fixes: 5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-10-12Merge tag 'soc-fixes-6.6-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "AngeloGioacchino Del Regno is stepping in as co-maintainer for the MediaTek SoC platform and starts by sending some dts fixes for the mt8195 platform that had been pending for a while. On the ixp4xx platform, Krzysztof Halasa steps down as co-maintainer, reflecting that Linus Walleij has been handling this on his own for the past few years. Generic RISC-V kernels are now marked as incompatible with the RZ/Five platform that requires custom hacks both for managing its DMA bounce buffers and for addressing low virtual memory. Finally, there is one bugfix for the AMDTEE firmware driver to prevent a use-after-free bug" * tag 'soc-fixes-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: IXP4xx MAINTAINERS entries arm64: dts: mediatek: mt8195: Set DSU PMU status to fail arm64: dts: mediatek: fix t-phy unit name arm64: dts: mediatek: mt8195-demo: update and reorder reserved memory regions arm64: dts: mediatek: mt8195-demo: fix the memory size to 8GB MAINTAINERS: Add Angelo as MediaTek SoC co-maintainer soc: renesas: Make ARCH_R9A07G043 (riscv version) depend on NONPORTABLE tee: amdtee: fix use-after-free vulnerability in amdtee_close_session
2023-10-12drivers: perf: Fix panic in riscv SBI mmap supportAlexandre Ghiti
The following panic can happen when mmap is called before the pmu add callback which sets the hardware counter index: this happens for example with the following command `perf record --no-bpf-event -n kill`. [ 99.461486] CPU: 1 PID: 1259 Comm: perf Tainted: G E 6.6.0-rc4ubuntu-defconfig #2 [ 99.461669] Hardware name: riscv-virtio,qemu (DT) [ 99.461748] epc : pmu_sbi_set_scounteren+0x42/0x44 [ 99.462337] ra : smp_call_function_many_cond+0x126/0x5b0 [ 99.462369] epc : ffffffff809f9d24 ra : ffffffff800f93e0 sp : ff60000082153aa0 [ 99.462407] gp : ffffffff82395c98 tp : ff6000009a218040 t0 : ff6000009ab3a4f0 [ 99.462425] t1 : 0000000000000004 t2 : 0000000000000100 s0 : ff60000082153ab0 [ 99.462459] s1 : 0000000000000000 a0 : ff60000098869528 a1 : 0000000000000000 [ 99.462473] a2 : 000000000000001f a3 : 0000000000f00000 a4 : fffffffffffffff8 [ 99.462488] a5 : 00000000000000cc a6 : 0000000000000000 a7 : 0000000000735049 [ 99.462502] s2 : 0000000000000001 s3 : ffffffff809f9ce2 s4 : ff60000098869528 [ 99.462516] s5 : 0000000000000002 s6 : 0000000000000004 s7 : 0000000000000001 [ 99.462530] s8 : ff600003fec98bc0 s9 : ffffffff826c5890 s10: ff600003fecfcde0 [ 99.462544] s11: ff600003fec98bc0 t3 : ffffffff819e2558 t4 : ff1c000004623840 [ 99.462557] t5 : 0000000000000901 t6 : ff6000008feeb890 [ 99.462570] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 [ 99.462658] [<ffffffff809f9d24>] pmu_sbi_set_scounteren+0x42/0x44 [ 99.462979] Code: 1060 4785 97bb 00d7 8fd9 9073 1067 6422 0141 8082 (9002) 0013 [ 99.463335] Kernel BUG [#2] To circumvent this, try to enable userspace access to the hardware counter when it is selected in addition to when the event is mapped. And vice-versa when the event is stopped/unmapped. Fixes: cc4c07c89aad ("drivers: perf: Implement perf event mmap support in the SBI backend") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20231006082010.11963-1-alexghiti@rivosinc.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-12riscv: Fix ftrace syscall handling which are now prefixed with __riscv_Alexandre Ghiti
ftrace creates entries for each syscall in the tracefs but has failed since commit 08d0ce30e0e4 ("riscv: Implement syscall wrappers") which prefixes all riscv syscalls with __riscv_. So fix this by implementing arch_syscall_match_sym_name() which allows us to ignore this prefix. And also ignore compat syscalls like x86/arm64 by implementing arch_trace_is_compat_syscall(). Fixes: 08d0ce30e0e4 ("riscv: Implement syscall wrappers") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Tested-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20231003182407.32198-1-alexghiti@rivosinc.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-12Merge tag 'pmdomain-v6.6-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm Pull pmdomain fix from Ulf Hansson: - imx: scu-pd: Correct the DMA2 channel * tag 'pmdomain-v6.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm: pmdomain: imx: scu-pd: correct DMA2 channel
2023-10-12RISC-V: Fix wrong use of CONFIG_HAVE_SOFTIRQ_ON_OWN_STACKJiexun Wang
If configuration options SOFTIRQ_ON_OWN_STACK and PREEMPT_RT are enabled simultaneously under RISC-V architecture, it will result in a compilation failure: arch/riscv/kernel/irq.c:64:6: error: redefinition of 'do_softirq_own_stack' 64 | void do_softirq_own_stack(void) | ^~~~~~~~~~~~~~~~~~~~ In file included from ./arch/riscv/include/generated/asm/softirq_stack.h:1, from arch/riscv/kernel/irq.c:15: ./include/asm-generic/softirq_stack.h:8:20: note: previous definition of 'do_softirq_own_stack' was here 8 | static inline void do_softirq_own_stack(void) | ^~~~~~~~~~~~~~~~~~~~ After changing CONFIG_HAVE_SOFTIRQ_ON_OWN_STACK to CONFIG_SOFTIRQ_ON_OWN_STACK, compilation can be successful. Fixes: dd69d07a5a6c ("riscv: stack: Support HAVE_SOFTIRQ_ON_OWN_STACK") Reviewed-by: Guo Ren <guoren@kernel.org> Signed-off-by: Jiexun Wang <wangjiexun@tinylab.org> Reviewed-by: Samuel Holland <samuel@sholland.org> Link: https://lore.kernel.org/r/20230913052940.374686-1-wangjiexun@tinylab.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-12riscv: kdump: fix crashkernel reserving problem on RISC-VChen Jiahao
When testing on risc-v QEMU environment with "crashkernel=" parameter enabled, a problem occurred with the following message: [ 0.000000] crashkernel low memory reserved: 0xf8000000 - 0x100000000 (128 MB) [ 0.000000] crashkernel reserved: 0x0000000177e00000 - 0x0000000277e00000 (4096 MB) [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/resource.c:779 __insert_resource+0x8e/0xd0 [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc2-next-20230920 #1 [ 0.000000] Hardware name: riscv-virtio,qemu (DT) [ 0.000000] epc : __insert_resource+0x8e/0xd0 [ 0.000000] ra : insert_resource+0x28/0x4e [ 0.000000] epc : ffffffff80017344 ra : ffffffff8001742e sp : ffffffff81203db0 [ 0.000000] gp : ffffffff812ece98 tp : ffffffff8120dac0 t0 : ff600001f7ff2b00 [ 0.000000] t1 : 0000000000000000 t2 : 3428203030303030 s0 : ffffffff81203dc0 [ 0.000000] s1 : ffffffff81211e18 a0 : ffffffff81211e18 a1 : ffffffff81289380 [ 0.000000] a2 : 0000000277dfffff a3 : 0000000177e00000 a4 : 0000000177e00000 [ 0.000000] a5 : ffffffff81289380 a6 : 0000000277dfffff a7 : 0000000000000078 [ 0.000000] s2 : ffffffff81289380 s3 : ffffffff80a0bac8 s4 : ff600001f7ff2880 [ 0.000000] s5 : 0000000000000280 s6 : 8000000a00006800 s7 : 000000000000007f [ 0.000000] s8 : 0000000080017038 s9 : 0000000080038ea0 s10: 0000000000000000 [ 0.000000] s11: 0000000000000000 t3 : ffffffff80a0bc00 t4 : ffffffff80a0bc00 [ 0.000000] t5 : ffffffff80a0bbd0 t6 : ffffffff80a0bc00 [ 0.000000] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 [ 0.000000] [<ffffffff80017344>] __insert_resource+0x8e/0xd0 [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] Failed to add a Crash kernel resource at 177e00000 The crashkernel memory has been allocated successfully, whereas it failed to insert into iomem_resource. This is due to the unique reserving logic in risc-v arch specific code, i.e. crashk_res/crashk_low_res will be added into iomem_resource later in init_resources(), which is not aligned with current unified reserving logic in reserve_crashkernel_{generic,low}() and therefore leads to the failure of crashkernel reservation. Removing the arch specific code within #ifdef CONFIG_KEXEC_CORE in init_resources() to fix above problem. Fixes: 31549153088e ("riscv: kdump: use generic interface to simplify crashkernel reservation") Signed-off-by: Chen Jiahao <chenjiahao16@huawei.com> Link: https://lore.kernel.org/r/20230925024333.730964-1-chenjiahao16@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-12riscv: Remove duplicate objcopy flagSong Shuai
There are two duplicate `-O binary` flags when objcopying from vmlinux to Image/xipImage. RISC-V set `-O binary` flag in both OBJCOPYFLAGS in the top-level riscv Makefile and OBJCOPYFLAGS_* in the boot/Makefile, and the objcopy cmd in Kbuild would join them together. The `-O binary` flag is only needed for objcopying Image, so remove the OBJCOPYFLAGS in the top-level riscv Makefile. Fixes: c0fbcd991860 ("RISC-V: Build flat and compressed kernel images") Signed-off-by: Song Shuai <songshuaishuai@tinylab.org> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20230914091334.1458542-1-songshuaishuai@tinylab.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-12x86/alternatives: Disable KASAN in apply_alternatives()Kirill A. Shutemov
Fei has reported that KASAN triggers during apply_alternatives() on a 5-level paging machine: BUG: KASAN: out-of-bounds in rcu_is_watching() Read of size 4 at addr ff110003ee6419a0 by task swapper/0/0 ... __asan_load4() rcu_is_watching() trace_hardirqs_on() text_poke_early() apply_alternatives() ... On machines with 5-level paging, cpu_feature_enabled(X86_FEATURE_LA57) gets patched. It includes KASAN code, where KASAN_SHADOW_START depends on __VIRTUAL_MASK_SHIFT, which is defined with cpu_feature_enabled(). KASAN gets confused when apply_alternatives() patches the KASAN_SHADOW_START users. A test patch that makes KASAN_SHADOW_START static, by replacing __VIRTUAL_MASK_SHIFT with 56, works around the issue. Fix it for real by disabling KASAN while the kernel is patching alternatives. [ mingo: updated the changelog ] Fixes: 6657fca06e3f ("x86/mm: Allow to boot without LA57 if CONFIG_X86_5LEVEL=y") Reported-by: Fei Yang <fei.yang@intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231012100424.1456-1-kirill.shutemov@linux.intel.com
2023-10-12net/mlx5e: Fix VF representors reporting zero counters to "ip -s" commandAmir Tzin
Although vf_vport entry of struct mlx5e_stats is never updated, its values are mistakenly copied to the caller structure in the VF representor .ndo_get_stat_64 callback mlx5e_rep_get_stats(). Remove redundant entry and use the updated one, rep_stats, instead. Fixes: 64b68e369649 ("net/mlx5: Refactor and expand rep vport stat group") Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Amir Tzin <amirtz@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-12net/mlx5e: Don't offload internal port if filter device is out deviceJianbo Liu
In the cited commit, if the routing device is ovs internal port, the out device is set to uplink, and packets go out after encapsulation. If filter device is uplink, it can trigger the following syndrome: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 3966): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0xcdb051), err(-22) Fix this issue by not offloading internal port if filter device is out device. In this case, packets are not forwarded to the root table to be processed, the termination table is used instead to forward them from uplink to uplink. Fixes: 100ad4e2d758 ("net/mlx5e: Offload internal port as encap route device") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-12net/mlx5e: Take RTNL lock before triggering netdev notifiersLama Kayal
Hold RTNL lock when calling xdp_set_features() with a registered netdev, as the call triggers the netdev notifiers. This could happen when switching from nic profile to uplink representor for example. Similar logic which fixed a similar scenario was previously introduced in the following commit: commit 72cc65497065 net/mlx5e: Take RTNL lock when needed before calling xdp_set_features(). This fixes the following assertion and warning call trace: RTNL: assertion failed at net/core/dev.c (1961) WARNING: CPU: 13 PID: 2529 at net/core/dev.c:1961 call_netdevice_notifiers_info+0x7c/0x80 Modules linked in: rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay mlx5_core zram zsmalloc fuse CPU: 13 PID: 2529 Comm: devlink Not tainted 6.5.0_for_upstream_min_debug_2023_09_07_20_04 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:call_netdevice_notifiers_info+0x7c/0x80 Code: 8f ff 80 3d 77 0d 16 01 00 75 c5 ba a9 07 00 00 48 c7 c6 c4 bb 0d 82 48 c7 c7 18 c8 06 82 c6 05 5b 0d 16 01 01 e8 44 f6 8c ff <0f> 0b eb a2 0f 1f 44 00 00 55 48 89 e5 41 54 48 83 e4 f0 48 83 ec RSP: 0018:ffff88819930f7f0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffff8309f740 RCX: 0000000000000027 RDX: ffff88885fb5b5c8 RSI: 0000000000000001 RDI: ffff88885fb5b5c0 RBP: 0000000000000028 R08: ffff88887ffabaa8 R09: 0000000000000003 R10: ffff88887fecbac0 R11: ffff88887ff7bac0 R12: ffff88819930f810 R13: ffff88810b7fea40 R14: ffff8881154e8fd8 R15: ffff888107e881a0 FS: 00007f3ad248f800(0000) GS:ffff88885fb40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000563b85f164e0 CR3: 0000000113b5c006 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __warn+0x79/0x120 ? call_netdevice_notifiers_info+0x7c/0x80 ? report_bug+0x17c/0x190 ? handle_bug+0x3c/0x60 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? call_netdevice_notifiers_info+0x7c/0x80 call_netdevice_notifiers+0x2e/0x50 mlx5e_set_xdp_feature+0x21/0x50 [mlx5_core] mlx5e_build_rep_params+0x97/0x130 [mlx5_core] mlx5e_init_ul_rep+0x9f/0x100 [mlx5_core] mlx5e_netdev_init_profile+0x76/0x110 [mlx5_core] mlx5e_netdev_attach_profile+0x1f/0x90 [mlx5_core] mlx5e_netdev_change_profile+0x92/0x160 [mlx5_core] mlx5e_vport_rep_load+0x329/0x4a0 [mlx5_core] mlx5_esw_offloads_rep_load+0x9e/0xf0 [mlx5_core] esw_offloads_enable+0x4bc/0xe90 [mlx5_core] mlx5_eswitch_enable_locked+0x3c8/0x570 [mlx5_core] ? kmalloc_trace+0x25/0x80 mlx5_devlink_eswitch_mode_set+0x224/0x680 [mlx5_core] ? devlink_get_from_attrs_lock+0x9e/0x110 devlink_nl_cmd_eswitch_set_doit+0x60/0xe0 genl_family_rcv_msg_doit+0xd0/0x120 genl_rcv_msg+0x180/0x2b0 ? devlink_get_from_attrs_lock+0x110/0x110 ? devlink_nl_cmd_eswitch_get_doit+0x290/0x290 ? devlink_pernet_pre_exit+0xf0/0xf0 ? genl_family_rcv_msg_dumpit+0xf0/0xf0 netlink_rcv_skb+0x54/0x100 genl_rcv+0x24/0x40 netlink_unicast+0x1fc/0x2c0 netlink_sendmsg+0x232/0x4a0 sock_sendmsg+0x38/0x60 ? _copy_from_user+0x2a/0x60 __sys_sendto+0x110/0x160 ? handle_mm_fault+0x161/0x260 ? do_user_addr_fault+0x276/0x620 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f3ad231340a Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 RSP: 002b:00007ffd70aad4b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000c36b00 RCX:00007f3ad231340a RDX: 0000000000000038 RSI: 0000000000c36b00 RDI: 0000000000000003 RBP: 0000000000c36910 R08: 00007f3ad2625200 R09: 000000000000000c R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001 </TASK> ---[ end trace 0000000000000000 ]--- ------------[ cut here ]------------ Fixes: 4d5ab0ad964d ("net/mlx5e: take into account device reconfiguration for xdp_features flag") Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-12net/mlx5e: XDP, Fix XDP_REDIRECT mpwqe page fragment leaks on shutdownDragos Tatulea
When mlx5e_xdp_xmit is called without the XDP_XMIT_FLUSH set it is possible that it leaves a mpwqe session open. That is ok during runtime: the session will be closed on the next call to mlx5e_xdp_xmit. But having a mpwqe session still open at XDP sq close time is problematic: the pc counter is not updated before flushing the contents of the xdpi_fifo. This results in leaking page fragments. The fix is to always close the mpwqe session at the end of mlx5e_xdp_xmit, regardless of the XDP_XMIT_FLUSH flag being set or not. Fixes: 5e0d2eef771e ("net/mlx5e: XDP, Support Enhanced Multi-Packet TX WQE") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-12net/mlx5e: RX, Fix page_pool allocation failure recovery for legacy rqDragos Tatulea
When a page allocation fails during refill in mlx5e_refill_rx_wqes, the page will be released again on the next refill call. This triggers the page_pool negative page fragment count warning below: [ 338.326070] WARNING: CPU: 4 PID: 0 at include/net/page_pool/helpers.h:130 mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] ... [ 338.328993] RIP: 0010:mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] [ 338.329094] Call Trace: [ 338.329097] <IRQ> [ 338.329100] ? __warn+0x7d/0x120 [ 338.329105] ? mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] [ 338.329173] ? report_bug+0x155/0x180 [ 338.329179] ? handle_bug+0x3c/0x60 [ 338.329183] ? exc_invalid_op+0x13/0x60 [ 338.329187] ? asm_exc_invalid_op+0x16/0x20 [ 338.329192] ? mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] [ 338.329259] mlx5e_post_rx_wqes+0x210/0x5a0 [mlx5_core] [ 338.329327] ? mlx5e_poll_rx_cq+0x88/0x6f0 [mlx5_core] [ 338.329394] mlx5e_napi_poll+0x127/0x6b0 [mlx5_core] [ 338.329461] __napi_poll+0x25/0x1a0 [ 338.329465] net_rx_action+0x28a/0x300 [ 338.329468] __do_softirq+0xcd/0x279 [ 338.329473] irq_exit_rcu+0x6a/0x90 [ 338.329477] common_interrupt+0x82/0xa0 [ 338.329482] </IRQ> This patch fixes the legacy rq case by releasing all allocated fragments and then setting the skip flag on all released fragments. It is important to note that the number of released fragments will be higher than the number of allocated fragments when an allocation error occurs. Fixes: 3f93f82988bc ("net/mlx5e: RX, Defer page release in legacy rq for better recycling") Tested-by: Chris Mason <clm@fb.com> Reported-by: Chris Mason <clm@fb.com> Closes: https://lore.kernel.org/netdev/117FF31A-7BE0-4050-B2BB-E41F224FF72F@meta.com Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-12net/mlx5e: RX, Fix page_pool allocation failure recovery for striding rqDragos Tatulea
When a page allocation fails during refill in mlx5e_post_rx_mpwqes, the page will be released again on the next refill call. This triggers the page_pool negative page fragment count warning below: [ 2436.447717] WARNING: CPU: 1 PID: 2419 at include/net/page_pool/helpers.h:130 mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] ... [ 2436.447895] RIP: 0010:mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] [ 2436.447991] Call Trace: [ 2436.447975] mlx5e_post_rx_mpwqes+0x1d5/0xcf0 [mlx5_core] [ 2436.447994] <IRQ> [ 2436.447996] ? __warn+0x7d/0x120 [ 2436.448009] ? mlx5e_handle_rx_cqe_mpwrq+0x109/0x1d0 [mlx5_core] [ 2436.448002] ? mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] [ 2436.448044] ? mlx5e_poll_rx_cq+0x87/0x6e0 [mlx5_core] [ 2436.448061] ? report_bug+0x155/0x180 [ 2436.448065] ? handle_bug+0x36/0x70 [ 2436.448067] ? exc_invalid_op+0x13/0x60 [ 2436.448070] ? asm_exc_invalid_op+0x16/0x20 [ 2436.448079] mlx5e_napi_poll+0x122/0x6b0 [mlx5_core] [ 2436.448077] ? mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] [ 2436.448113] ? generic_exec_single+0x35/0x100 [ 2436.448117] __napi_poll+0x25/0x1a0 [ 2436.448120] net_rx_action+0x28a/0x300 [ 2436.448122] __do_softirq+0xcd/0x279 [ 2436.448126] irq_exit_rcu+0x6a/0x90 [ 2436.448128] sysvec_apic_timer_interrupt+0x6e/0x90 [ 2436.448130] </IRQ> This patch fixes the striding rq case by setting the skip flag on all the wqe pages that were expected to have new pages allocated. Fixes: 4c2a13236807 ("net/mlx5e: RX, Defer page release in striding rq for better recycling") Tested-by: Chris Mason <clm@fb.com> Reported-by: Chris Mason <clm@fb.com> Closes: https://lore.kernel.org/netdev/117FF31A-7BE0-4050-B2BB-E41F224FF72F@meta.com Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-12net/mlx5: Handle fw tracer change ownership event based on MTRCMaher Sanalla
Currently, whenever fw issues a change ownership event, the PF that owns the fw tracer drops its ownership directly and the other PFs try to pick up the ownership via what MTRC register suggests. In some cases, driver releases the ownership of the tracer and reacquires it later on. Whenever the driver releases ownership of the tracer, fw issues a change ownership event. This event can be delayed and come after driver has reacquired ownership of the tracer. Thus the late event will trigger the tracer owner PF to release the ownership again and lead to a scenario where no PF is owning the tracer. To prevent the scenario described above, when handling a change ownership event, do not drop ownership of the tracer directly, instead read the fw MTRC register to retrieve the up-to-date owner of the tracer and set it accordingly in driver level. Fixes: f53aaa31cce7 ("net/mlx5: FW tracer, implement tracer logic") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-12net/mlx5: Bridge, fix peer entry ageing in LAG modeVlad Buslov
With current implementation in single FDB LAG mode all packets are processed by eswitch 0 rules. As such, 'peer' FDB entries receive the packets for rules of other eswitches and are responsible for updating the main entry by sending SWITCHDEV_FDB_ADD_TO_BRIDGE notification from their background update wq task. However, this introduces a race condition when non-zero eswitch instance decides to delete a FDB entry, sends SWITCHDEV_FDB_DEL_TO_BRIDGE notification, but another eswitch's update task refreshes the same entry concurrently while its async delete work is still pending on the workque. In such case another SWITCHDEV_FDB_ADD_TO_BRIDGE event may be generated and entry will remain stuck in FDB marked as 'offloaded' since no more SWITCHDEV_FDB_DEL_TO_BRIDGE notifications are sent for deleting the peer entries. Fix the issue by synchronously marking deleted entries with MLX5_ESW_BRIDGE_FLAG_DELETED flag and skipping them in background update job. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-12net/mlx5: E-switch, register event handler before arming the eventShay Drory
Currently, mlx5 is registering event handler for vport context change event some time after arming the event. this can lead to missing an event, which will result in wrong rules in the FDB. Hence, register the event handler before arming the event. This solution is valid since FW is sending vport context change event only on vports which SW armed, and SW arming the vport when enabling it, which is done after the FDB has been created. Fixes: 6933a9379559 ("net/mlx5: E-Switch, Use async events chain") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-12net/mlx5: Perform DMA operations in the right locationsShay Drory
The cited patch change mlx5 driver so that during probe DMA operations were performed before pci_enable_device(), and during teardown DMA operations were performed after pci_disable_device(). DMA operations require PCI to be enabled. Hence, The above leads to the following oops in PPC systems[1]. On s390x systems, as reported by Niklas Schnelle, this is a problem because mlx5_pci_init() is where the DMA and coherent mask is set but mlx5_cmd_init() already does a dma_alloc_coherent(). Thus a DMA allocation is done during probe before the correct mask is set. This causes probe to fail initialization of the cmdif SW structs on s390x after that is converted to the common dma-iommu code. This is because on s390x DMA addresses below 4 GiB are reserved on current machines and unlike the old s390x specific DMA API implementation common code enforces DMA masks. Fix it by performing the DMA operations during probe after pci_enable_device() and after the dma mask is set, and during teardown before pci_disable_device(). [1] Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user iptable_nat xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 netconsole rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser ib_umad rdma_cm ib_ipoib iw_cm libiscsi scsi_transport_iscsi ib_cm ib_uverbs ib_core mlx5_core(-) ptp pps_core fuse vmx_crypto crc32c_vpmsum [last unloaded: mlx5_ib] CPU: 1 PID: 8937 Comm: modprobe Not tainted 6.5.0-rc3_for_upstream_min_debug_2023_07_31_16_02 #1 Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries NIP: c000000000423388 LR: c0000000001e733c CTR: c0000000001e4720 REGS: c0000000055636d0 TRAP: 0380 Not tainted (6.5.0-rc3_for_upstream_min_debug_2023_07_31_16_02) MSR: 8000000000009033 CR: 24008884 XER: 20040000 CFAR: c0000000001e7338 IRQMASK: 0 NIP [c000000000423388] __free_pages+0x28/0x160 LR [c0000000001e733c] dma_direct_free+0xac/0x190 Call Trace: [c000000005563970] [5deadbeef0000100] 0x5deadbeef0000100 (unreliable) [c0000000055639b0] [c0000000003d46cc] kfree+0x7c/0x150 [c000000005563a40] [c0000000001e47c8] dma_free_attrs+0xa8/0x1a0 [c000000005563aa0] [c008000000d0064c] mlx5_cmd_cleanup+0xa4/0x100 [mlx5_core] [c000000005563ad0] [c008000000cf629c] mlx5_mdev_uninit+0xf4/0x140 [mlx5_core] [c000000005563b00] [c008000000cf6448] remove_one+0x160/0x1d0 [mlx5_core] [c000000005563b40] [c000000000958540] pci_device_remove+0x60/0x110 [c000000005563b80] [c000000000a35e80] device_remove+0x70/0xd0 [c000000005563bb0] [c000000000a37a38] device_release_driver_internal+0x2a8/0x330 [c000000005563c00] [c000000000a37b8c] driver_detach+0x8c/0x160 [c000000005563c40] [c000000000a35350] bus_remove_driver+0x90/0x110 [c000000005563c80] [c000000000a38948] driver_unregister+0x48/0x90 [c000000005563cf0] [c000000000957e38] pci_unregister_driver+0x38/0x150 [c000000005563d40] [c008000000eb6140] mlx5_cleanup+0x38/0x90 [mlx5_core] Fixes: 06cd555f73ca ("net/mlx5: split mlx5_cmd_init() to probe and reload routines") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-10-12drm/nouveau: exec: fix ioctl kernel-doc warningRandy Dunlap
kernel-doc emits a warning: include/uapi/drm/nouveau_drm.h:49: warning: Cannot understand * @NOUVEAU_GETPARAM_EXEC_PUSH_MAX on line 49 - I thought it was a doc line We don't have a way to document a macro value via kernel-doc, so change the "/**" kernel-doc marker to a C comment and format the comment more like a kernel-doc comment for consistency. Fixes: d59e75eef52d ("drm/nouveau: exec: report max pushs through getparam") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Dave Airlie <airlied@redhat.com> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: dri-devel@lists.freedesktop.org Cc: nouveau@lists.freedesktop.org Cc: Bragatheswaran Manickavel <bragathemanick0908@gmail.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Danilo Krummrich <dakr@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231008140231.17921-1-rdunlap@infradead.org
2023-10-12Merge tag 'pinctrl-v6.6-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "Some pin control fixes for v6.6 which have been stacking up in my tree. Dmitry's fix to some locking in the core is the most substantial, that was a really neat fix. The rest is the usual assorted spray of minor driver fixes. - Drop some minor code causing warnings in the Lantiq driver - Fix out of bounds write in the Nuvoton driver - Fix lost IRQs with CONFIG_PM in the Starfive driver - Fix a locking issue in find_pinctrl() - Revert a regressive Tegra debug patch - Fix the Renesas RZN1 pin muxing" * tag 'pinctrl-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: renesas: rzn1: Enable missing PINMUX Revert "pinctrl: tegra: Add support to display pin function" pinctrl: avoid unsafe code pattern in find_pinctrl() pinctrl: starfive: jh7110: Add system pm ops to save and restore context pinctrl: starfive: jh7110: Fix failure to set irq after CONFIG_PM is enabled pinctrl: nuvoton: wpcm450: fix out of bounds write pinctrl: lantiq: Remove unsued declaration ltq_pinctrl_unregister()
2023-10-12workqueue: Use the kmem_cache_free() instead of kfree() to release pwqZqiang
Currently, the kfree() be used for pwq objects allocated with kmem_cache_alloc() in alloc_and_link_pwqs(), this isn't wrong. but usually, use "trace_kmem_cache_alloc/trace_kmem_cache_free" to track memory allocation and free. this commit therefore use kmem_cache_free() instead of kfree() in alloc_and_link_pwqs() and also consistent with release of the pwq in rcu_free_pwq(). Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-10-12workqueue: doc: Fix function and sysfs path errorsWangJinchao
alloc_ordered_queue -> alloc_ordered_workqueue /sys/devices/virtual/WQ_NAME/ -> /sys/devices/virtual/workqueue/WQ_NAME/ Signed-off-by: WangJinchao <wangjinchao@xfusion.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-10-12cgroup: Fix incorrect css_set_rwsem reference in commentWaiman Long
Since commit f0d9a5f17575 ("cgroup: make css_set_rwsem a spinlock and rename it to css_set_lock"), css_set_rwsem has been replaced by css_set_lock. That commit, however, missed the css_set_rwsem reference in include/linux/cgroup-defs.h. Fix that by changing it to css_set_lock as well. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-10-12drm/panel: Move AUX B116XW03 out of panel-edp back to panel-simpleDouglas Anderson
In commit 5f04e7ce392d ("drm/panel-edp: Split eDP panels out of panel-simple") I moved a pile of panels out of panel-simple driver into the newly created panel-edp driver. One of those panels, however, shouldn't have been moved. As is clear from commit e35e305eff0f ("drm/panel: simple: Add AUO B116XW03 panel support"), AUX B116XW03 is an LVDS panel. It's used in exynos5250-snow and exynos5420-peach-pit where it's clear that the panel is hooked up with LVDS. Furthermore, searching for datasheets I found one that makes it clear that this panel is LVDS. As far as I can tell, I got confused because in commit 88d3457ceb82 ("drm/panel: auo,b116xw03: fix flash backlight when power on") Jitao Shi added "DRM_MODE_CONNECTOR_eDP". That seems wrong. Looking at the downstream ChromeOS trees, it seems like some Mediatek boards are using a panel that they call "auo,b116xw03" that's an eDP panel. The best I can guess is that they actually have a different panel that has similar timing. If so then the proper panel should be used or they should switch to the generic "edp-panel" compatible. When moving this back to panel-edp, I wasn't sure what to use for .bus_flags and .bus_format and whether to add the extra "enable" delay from commit 88d3457ceb82 ("drm/panel: auo,b116xw03: fix flash backlight when power on"). I've added formats/flags/delays based on my (inexpert) analysis of the datasheet. These are untested. NOTE: if/when this is backported to stable, we might run into some trouble. Specifically, before 474c162878ba ("arm64: dts: mt8183: jacuzzi: Move panel under aux-bus") this panel was used by "mt8183-kukui-jacuzzi", which assumed it was an eDP panel. I don't know what to suggest for that other than someone making up a bogus panel for jacuzzi that's just for the stable channel. Fixes: 88d3457ceb82 ("drm/panel: auo,b116xw03: fix flash backlight when power on") Fixes: 5f04e7ce392d ("drm/panel-edp: Split eDP panels out of panel-simple") Tested-by: Anton Bambura <jenneron@postmarketos.org> Acked-by: Hsin-Yi Wang <hsinyi@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230925150010.1.Iff672233861bcc4cf25a7ad0a81308adc3bda8a4@changeid
2023-10-12drm/bridge: ti-sn65dsi86: Associate DSI device lifetime with auxiliary deviceStephen Boyd
The kernel produces a warning splat and the DSI device fails to register in this driver if the i2c driver probes, populates child auxiliary devices, and then somewhere in ti_sn_bridge_probe() a function call returns -EPROBE_DEFER. When the auxiliary driver probe defers, the dsi device created by devm_mipi_dsi_device_register_full() is left registered because the devm managed device used to manage the lifetime of the DSI device is the parent i2c device, not the auxiliary device that is being probed. Associate the DSI device created and managed by this driver to the lifetime of the auxiliary device, not the i2c device, so that the DSI device is removed when the auxiliary driver unbinds. Similarly change the device pointer used for dev_err_probe() so the deferred probe errors are associated with the auxiliary device instead of the parent i2c device so we can narrow down future problems faster. Cc: Douglas Anderson <dianders@chromium.org> Cc: Maxime Ripard <maxime@cerno.tech> Fixes: c3b75d4734cb ("drm/bridge: sn65dsi86: Register and attach our DSI device at probe") Signed-off-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20231002235407.769399-1-swboyd@chromium.org
2023-10-12KVM: arm64: timers: Correctly handle TGE flip with CNTPOFF_EL2Marc Zyngier
Contrary to common belief, HCR_EL2.TGE has a direct and immediate effect on the way the EL0 physical counter is offset. Flipping TGE from 1 to 0 while at EL2 immediately changes the way the counter compared to the CVAL limit. This means that we cannot directly save/restore the guest's view of CVAL, but that we instead must treat it as if CNTPOFF didn't exist. Only in the world switch, once we figure out that we do have CNTPOFF, can we must the offset back and forth depending on the polarity of TGE. Fixes: 2b4825a86940 ("KVM: arm64: timers: Use CNTPOFF_EL2 to offset the physical timer") Reported-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-10-12ovl: fix regression in parsing of mount options with escaped commaAmir Goldstein
Ever since commit 91c77947133f ("ovl: allow filenames with comma"), the following example was legit overlayfs mount options: mount -t overlay overlay -o 'lowerdir=/tmp/a\,b/lower' /mnt The conversion to new mount api moved to using the common helper generic_parse_monolithic() and discarded the specialized ovl_next_opt() option separator. Bring back ovl_next_opt() and use vfs_parse_monolithic_sep() to fix the regression. Reported-by: Ryan Hendrickson <ryan.hendrickson@alum.mit.edu> Closes: https://lore.kernel.org/r/8da307fb-9318-cf78-8a27-ba5c5a0aef6d@alum.mit.edu/ Fixes: 1784fbc2ed9c ("ovl: port to new mount api") Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-10-12fs: factor out vfs_parse_monolithic_sep() helperAmir Goldstein
Factor out vfs_parse_monolithic_sep() from generic_parse_monolithic(), so filesystems could use it with a custom option separator callback. Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-10-12KVM: arm64: POR{E0}_EL1 do not need trap handlersJoey Gouly
These will not be trapped by KVM, so don't need a handler. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231012123459.2820835-3-joey.gouly@arm.com
2023-10-12KVM: arm64: Add nPIR{E0}_EL1 to HFG trapsJoey Gouly
nPIR_EL1 and nPIREO_EL1 are part of the 'reverse polarity' set of bits, set them so that we disable the traps for a guest. Unfortunately, these bits are not yet described in the ARM ARM, but only live in the XML description. Also add them to the NV FGT forwarding infrastructure. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Fixes: e930694e6145 ("KVM: arm64: Restructure FGT register switching") Cc: Oliver Upton <oliver.upton@linux.dev> [maz: add entries to the NV FGT array, commit message update] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231012123459.2820835-2-joey.gouly@arm.com
2023-10-12KVM: MIPS: fix -Wunused-but-set-variable warningPaolo Bonzini
The variable is completely unused, remove it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-12KVM: arm64: pmu: Drop redundant check for non-NULL kvm_pmu_eventsAnshuman Khandual
There is an allocated and valid struct kvm_pmu_events for each cpu on the system via DEFINE_PER_CPU(). Hence there cannot be a NULL pointer accessed via this_cpu_ptr() in the helper kvm_get_pmu_events(). Hence non-NULL check for pmu in such places are redundant and can be dropped. Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: kvmarm@lists.linux.dev Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231012064617.897346-1-anshuman.khandual@arm.com
2023-10-12KVM: SVM: Fix build error when using -Werror=unused-but-set-variableTom Lendacky
Commit 916e3e5f26ab ("KVM: SVM: Do not use user return MSR support for virtualized TSC_AUX") introduced a local variable used for the rdmsr() function for the high 32-bits of the MSR value. This variable is not used after being set and triggers a warning or error, when treating warnings as errors, when the unused-but-set-variable flag is set. Mark this variable as __maybe_unused to fix this. Fixes: 916e3e5f26ab ("KVM: SVM: Do not use user return MSR support for virtualized TSC_AUX") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <0da9874b6e9fcbaaa5edeb345d7e2a7c859fc818.1696271334.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-12x86: KVM: SVM: refresh AVIC inhibition in svm_leave_nested()Maxim Levitsky
svm_leave_nested() similar to a nested VM exit, get the vCPU out of nested mode and thus should end the local inhibition of AVIC on this vCPU. Failure to do so, can lead to hangs on guest reboot. Raise the KVM_REQ_APICV_UPDATE request to refresh the AVIC state of the current vCPU in this case. Fixes: f44509f849fe ("KVM: x86: SVM: allow AVIC to co-exist with a nested guest running") Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230928173354.217464-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-12x86: KVM: SVM: add support for Invalid IPI Vector interceptionMaxim Levitsky
In later revisions of AMD's APM, there is a new 'incomplete IPI' exit code: "Invalid IPI Vector - The vector for the specified IPI was set to an illegal value (VEC < 16)" Note that tests on Zen2 machine show that this VM exit doesn't happen and instead AVIC just does nothing. Add support for this exit code by doing nothing, instead of filling the kernel log with errors. Also replace an unthrottled 'pr_err()' if another unknown incomplete IPI exit happens with vcpu_unimpl() (e.g in case AMD adds yet another 'Invalid IPI' exit reason) Cc: <stable@vger.kernel.org> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230928173354.217464-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-12x86: KVM: SVM: always update the x2avic msr interceptionMaxim Levitsky
The following problem exists since x2avic was enabled in the KVM: svm_set_x2apic_msr_interception is called to enable the interception of the x2apic msrs. In particular it is called at the moment the guest resets its apic. Assuming that the guest's apic was in x2apic mode, the reset will bring it back to the xapic mode. The svm_set_x2apic_msr_interception however has an erroneous check for '!apic_x2apic_mode()' which prevents it from doing anything in this case. As a result of this, all x2apic msrs are left unintercepted, and that exposes the bare metal x2apic (if enabled) to the guest. Oops. Remove the erroneous '!apic_x2apic_mode()' check to fix that. This fixes CVE-2023-5090 Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode") Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230928173354.217464-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-12KVM: selftests: Force load all supported XSAVE state in state testSean Christopherson
Extend x86's state to forcefully load *all* host-supported xfeatures by modifying xstate_bv in the saved state. Stuffing xstate_bv ensures that the selftest is verifying KVM's full ABI regardless of whether or not the guest code is successful in getting various xfeatures out of their INIT state, e.g. see the disaster that is/was MPX. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230928001956.924301-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-12KVM: selftests: Load XSAVE state into untouched vCPU during state testSean Christopherson
Expand x86's state test to load XSAVE state into a "dummy" vCPU prior to KVM_SET_CPUID2, and again with an empty guest CPUID model. Except for off-by-default features, i.e. AMX, KVM's ABI for KVM_SET_XSAVE is that userspace is allowed to load xfeatures so long as they are supported by the host. This is a regression test for a combination of KVM bugs where the state saved by KVM_GET_XSAVE{2} could not be loaded via KVM_SET_XSAVE if the saved xstate_bv would load guest-unsupported xfeatures. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230928001956.924301-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-12KVM: selftests: Touch relevant XSAVE state in guest for state testSean Christopherson
Modify support XSAVE state in the "state test's" guest code so that saving and loading state via KVM_{G,S}ET_XSAVE actually does something useful, i.e. so that xstate_bv in XSAVE state isn't empty. Punt on BNDCSR for now, it's easier to just stuff that xfeature from the host side. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230928001956.924301-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-12KVM: x86: Constrain guest-supported xfeatures only at KVM_GET_XSAVE{2}Sean Christopherson
Mask off xfeatures that aren't exposed to the guest only when saving guest state via KVM_GET_XSAVE{2} instead of modifying user_xfeatures directly. Preserving the maximal set of xfeatures in user_xfeatures restores KVM's ABI for KVM_SET_XSAVE, which prior to commit ad856280ddea ("x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0") allowed userspace to load xfeatures that are supported by the host, irrespective of what xfeatures are exposed to the guest. There is no known use case where userspace *intentionally* loads xfeatures that aren't exposed to the guest, but the bug fixed by commit ad856280ddea was specifically that KVM_GET_SAVE{2} would save xfeatures that weren't exposed to the guest, e.g. would lead to userspace unintentionally loading guest-unsupported xfeatures when live migrating a VM. Restricting KVM_SET_XSAVE to guest-supported xfeatures is especially problematic for QEMU-based setups, as QEMU has a bug where instead of terminating the VM if KVM_SET_XSAVE fails, QEMU instead simply stops loading guest state, i.e. resumes the guest after live migration with incomplete guest state, and ultimately results in guest data corruption. Note, letting userspace restore all host-supported xfeatures does not fix setups where a VM is migrated from a host *without* commit ad856280ddea, to a target with a subset of host-supported xfeatures. However there is no way to safely address that scenario, e.g. KVM could silently drop the unsupported features, but that would be a clear violation of KVM's ABI and so would require userspace to opt-in, at which point userspace could simply be updated to sanitize the to-be-loaded XSAVE state. Reported-by: Tyler Stachecki <stachecki.tyler@gmail.com> Closes: https://lore.kernel.org/all/20230914010003.358162-1-tstachecki@bloomberg.net Fixes: ad856280ddea ("x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0") Cc: stable@vger.kernel.org Cc: Leonardo Bras <leobras@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Message-Id: <20230928001956.924301-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-12x86/fpu: Allow caller to constrain xfeatures when copying to uabi bufferSean Christopherson
Plumb an xfeatures mask into __copy_xstate_to_uabi_buf() so that KVM can constrain which xfeatures are saved into the userspace buffer without having to modify the user_xfeatures field in KVM's guest_fpu state. KVM's ABI for KVM_GET_XSAVE{2} is that features that are not exposed to guest must not show up in the effective xstate_bv field of the buffer. Saving only the guest-supported xfeatures allows userspace to load the saved state on a different host with a fewer xfeatures, so long as the target host supports the xfeatures that are exposed to the guest. KVM currently sets user_xfeatures directly to restrict KVM_GET_XSAVE{2} to the set of guest-supported xfeatures, but doing so broke KVM's historical ABI for KVM_SET_XSAVE, which allows userspace to load any xfeatures that are supported by the *host*. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230928001956.924301-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-12Merge tag 'kvm-s390-master-6.6-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD One small fix for gisa to avoid stalls.
2023-10-12smb: client: prevent new fids from being removed by laundromatPaulo Alcantara
Check if @cfid->time is set in laundromat so we guarantee that only fully cached fids will be selected for removal. While we're at it, add missing locks to protect access of @cfid fields in order to avoid races with open_cached_dir() and cfids_laundromat_worker(), respectively. Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-10-12smb: client: make laundromat a delayed workerPaulo Alcantara
By having laundromat kthread processing cached directories on every second turned out to be overkill, especially when having multiple SMB mounts. Relax it by using a delayed worker instead that gets scheduled on every @dir_cache_timeout (default=30) seconds per tcon. This also fixes the 1s delay when tearing down tcon. Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-10-12riscv: signal: fix sigaltstack frame size checkingAndy Chiu
The alternative stack checking in get_sigframe introduced by the Vector support is not needed and has a problem. It is not needed as we have already validate it at the beginning of the function if we are already on an altstack. If not, the size of an altstack is always validated at its allocation stage with sigaltstack_size_valid(). Besides, we must only regard the size of an altstack if the handler of a signal is registered with SA_ONSTACK. So, blindly checking overflow of an altstack if sas_ss_size not equals to zero will check against wrong signal handlers if only a subset of signals are registered with SA_ONSTACK. Fixes: 8ee0b41898fa ("riscv: signal: Add sigcontext save/restore for vector") Reported-by: Prashanth Swaminathan <prashanthsw@google.com> Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Link: https://lore.kernel.org/r/20230822164904.21660-1-andy.chiu@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-12IXP4xx MAINTAINERS entriesKrzysztof Hałasa
Update MAINTAINERS entries for Intel IXP4xx SoCs. Linus has been handling all IXP4xx stuff since 2019 or so. Signed-off-by: Krzysztof Hałasa <khalasa@piap.pl> Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Deepak Saxena <dsaxena@plexity.net> Link: https://lore.kernel.org/r/m3ttqxu4ru.fsf@t19.piap.pl Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-10-12Merge branch 'rswitch-fix-issues-on-specific-conditions'Paolo Abeni
Yoshihiro Shimoda says: ==================== rswitch: Fix issues on specific conditions This patch series fix some issues of rswitch driver on specific condtions. ==================== Link: https://lore.kernel.org/r/20231010124858.183891-1-yoshihiro.shimoda.uh@renesas.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-12rswitch: Fix imbalance phy_power_off() callingYoshihiro Shimoda
The phy_power_off() should not be called if phy_power_on() failed. So, add a condition .power_count before calls phy_power_off(). Fixes: 5cb630925b49 ("net: renesas: rswitch: Add phy_power_{on,off}() calling") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-12rswitch: Fix renesas_eth_sw_remove() implementationYoshihiro Shimoda
Fix functions calling order and a condition in renesas_eth_sw_remove(). Otherwise, kernel NULL pointer dereference happens from phy_stop() if a net device opens. Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>