linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-04-29	net: stmmac: dwmac-sun8i: add missing of_node_put() in ↵	Yang Yingliang
	sun8i_dwmac_register_mdio_mux() The node pointer returned by of_get_child_by_name() with refcount incremented, so add of_node_put() after using it. Fixes: 634db83b8265 ("net: stmmac: dwmac-sun8i: Handle integrated/external MDIOs") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220428095716.540452-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-29	net: dsa: mt7530: add missing of_node_put() in mt7530_setup()	Yang Yingliang
	Add of_node_put() if of_get_phy_mode() fails in mt7530_setup() Fixes: 0c65b2b90d13 ("net: of_get_phy_mode: Change API to solve int/unit warnings") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220428095317.538829-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-29	net: dsa: ksz9477: port mirror sniffing limited to one port	Arun Ramadoss
	This patch limits the sniffing to only one port during the mirror add. And during the mirror_del it checks for all the ports using the sniff, if and only if no other ports are referring, sniffing is disabled. The code is updated based on the review comments of LAN937x port mirror patch. Link: https://patchwork.kernel.org/project/netdevbpf/patch/20210422094257.1641396-8-prasanna.vengateshan@microchip.com/ Fixes: b987e98e50ab ("dsa: add DSA switch driver for Microchip KSZ9477") Signed-off-by: Prasanna Vengateshan <prasanna.vengateshan@microchip.com> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Link: https://lore.kernel.org/r/20220428070709.7094-1-arun.ramadoss@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-29	hinic: fix bug of wq out of bound access	Qiao Ma
	If wq has only one page, we need to check wqe rolling over page by compare end_idx and curr_idx, and then copy wqe to shadow wqe to avoid out of bound access. This work has been done in hinic_get_wqe, but missed for hinic_read_wqe. This patch fixes it, and removes unnecessary MASKED_WQE_IDX(). Fixes: 7dd29ee12865 ("hinic: add sriov feature support") Signed-off-by: Qiao Ma <mqaio@linux.alibaba.com> Reviewed-by: Xunlei Pang <xlpang@linux.alibaba.com> Link: https://lore.kernel.org/r/282817b0e1ae2e28fdf3ed8271a04e77f57bf42e.1651148587.git.mqaio@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-29	net: mdio: Fix ENOMEM return value in BCM6368 mux bus controller	Niels Dossche
	Error values inside the probe function must be < 0. The ENOMEM return value has the wrong sign: it is positive instead of negative. Add a minus sign. Fixes: e239756717b5 ("net: mdio: Add BCM6368 MDIO mux bus controller") Signed-off-by: Niels Dossche <dossche.niels@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220428211931.8130-1-dossche.niels@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-29	net: ethernet: mediatek: add missing of_node_put() in mtk_sgmii_init()	Yang Yingliang
	The node pointer returned by of_parse_phandle() with refcount incremented, so add of_node_put() after using it in mtk_sgmii_init(). Fixes: 9ffee4a8276c ("net: ethernet: mediatek: Extend SGMII related functions") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220428062543.64883-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-29	Merge branch 'selftests-net-add-missing-tests-to-makefile'	Jakub Kicinski
	Hangbin Liu says: ==================== selftests: net: add missing tests to Makefile When generating the selftests to another folder, the fixed tests are missing as they are not in Makefile. The missing tests are generated by command: $ for f in $(ls *.sh); do grep -q $f Makefile \|\| echo $f; done ==================== Link: https://lore.kernel.org/r/20220428044511.227416-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-29	selftests/net/forwarding: add missing tests to Makefile	Hangbin Liu
	When generating the selftests to another folder, the fixed tests are missing as they are not in Makefile, e.g. make -C tools/testing/selftests/ install \ TARGETS="net/forwarding" INSTALL_PATH=/tmp/kselftests Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-29	selftests/net: add missing tests to Makefile	Hangbin Liu
	When generating the selftests to another folder, the fixed tests are missing as they are not in Makefile, e.g. make -C tools/testing/selftests/ install \ TARGETS="net" INSTALL_PATH=/tmp/kselftests Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-29	Revert "SUNRPC: attempt AF_LOCAL connect on setup"	Trond Myklebust
	This reverts commit 7073ea8799a8cf73db60270986f14e4aae20fa80. We must not try to connect the socket while the transport is under construction, because the mechanisms to safely tear it down are not in place. As the code stands, we end up leaking the sockets on a connection error. Reported-by: wanghai (M) <wanghai38@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-04-29	Merge tag 'soc-fixes-5.18-3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: - A fix for a regression caused by the previous set of bugfixes changing tegra and at91 pinctrl properties. More work is needed to figure out what this should actually be, but a revert makes it work for the moment. - Defconfig regression fixes for tegra after renamed symbols - Build-time warning and static checker fixes for imx, op-tee, sunxi, meson, at91, and omap - More at91 DT fixes for audio, regulator and spi nodes - A regression fix for Renesas Hyperflash memory probe - A stability fix for amlogic boards, modifying the allowed cpufreq states - Multiple fixes for system suspend on omap2+ - DT fixes for various i.MX bugs - A probe error fix for imx6ull-colibri MMC - A MAINTAINERS file entry for samsung bug reports * tag 'soc-fixes-5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (42 commits) Revert "arm: dts: at91: Fix boolean properties with values" bus: sunxi-rsb: Fix the return value of sunxi_rsb_device_create() Revert "arm64: dts: tegra: Fix boolean properties with values" arm64: dts: imx8mn-ddr4-evk: Describe the 32.768 kHz PMIC clock ARM: dts: imx6ull-colibri: fix vqmmc regulator MAINTAINERS: add Bug entry for Samsung and memory controller drivers memory: renesas-rpc-if: Fix HF/OSPI data transfer in Manual Mode ARM: dts: logicpd-som-lv: Fix wrong pinmuxing on OMAP35 ARM: dts: am3517-evm: Fix misc pinmuxing ARM: dts: am33xx-l4: Add missing touchscreen clock properties ARM: dts: Fix mmc order for omap3-gta04 ARM: dts: at91: fix pinctrl phandles ARM: dts: at91: sama5d4_xplained: fix pinctrl phandle name ARM: dts: at91: Describe regulators on at91sam9g20ek ARM: dts: at91: Map MCLK for wm8731 on at91sam9g20ek ARM: dts: at91: Fix boolean properties with values ARM: dts: at91: use generic node name for dataflash ARM: dts: at91: align SPI NOR node name with dtschema ARM: dts: at91: sama7g5ek: Align the impedance of the QSPI0's HSIO and PCB lines ARM: dts: at91: sama7g5ek: enable pull-up on flexcom3 console lines ...
2022-04-29	Merge tag 'clk-fixes-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "A semi-large pile of clk driver fixes this time around. Nothing is touching the core so these fixes are fairly well contained to specific devices that use these clk drivers. - Some Allwinner SoC fixes to gracefully handle errors and mark an RTC clk as critical so that the RTC keeps ticking. - Fix AXI bus clks and RTC clk design for Microchip PolarFire SoC driver introduced this cycle. This has some devicetree bits acked by riscv maintainers. We're fixing it now so that the prior bindings aren't released in a major kernel version. - Remove a reset on Microchip PolarFire SoCs that broke when enabling CONFIG_PM. - Set a min/max for the Qualcomm graphics clk. This got broken by the clk rate range patches introduced this cycle" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: sunxi: sun9i-mmc: check return value after calling platform_get_resource() clk: sunxi-ng: sun6i-rtc: Mark rtc-32k as critical riscv: dts: microchip: reparent mpfs clocks clk: microchip: mpfs: add RTCREF clock control clk: microchip: mpfs: re-parent the configurable clocks dt-bindings: rtc: add refclk to mpfs-rtc dt-bindings: clk: mpfs: add defines for two new clocks dt-bindings: clk: mpfs document msspll dri registers riscv: dts: microchip: fix usage of fic clocks on mpfs clk: microchip: mpfs: mark CLK_ATHENA as critical clk: microchip: mpfs: fix parents for FIC clocks clk: qcom: clk-rcg2: fix gfx3d frequency calculation clk: microchip: mpfs: don't reset disabled peripherals clk: sunxi-ng: fix not NULL terminated coccicheck error
2022-04-29	Merge tag 'block-5.18-2022-04-29' of git://git.kernel.dk/linux-block	Linus Torvalds
	Pull block fixes from Jens Axboe: - Revert of a patch that caused timestamp issues (Tejun) - iocost warning fix (Tejun) - bfq warning fix (Jan) * tag 'block-5.18-2022-04-29' of git://git.kernel.dk/linux-block: bfq: Fix warning in bfqq_request_over_limit() Revert "block: inherit request start time from bio for BLK_CGROUP" iocost: don't reset the inuse weight of under-weighted debtors
2022-04-29	Merge tag 'io_uring-5.18-2022-04-29' of git://git.kernel.dk/linux-block	Linus Torvalds
	Pull io_uring fixes from Jens Axboe: "Pretty boring: - three patches just adding reserved field checks (me, Eugene) - Fixing a potential regression with IOPOLL caused by a block change (Joseph)" Boring is good. * tag 'io_uring-5.18-2022-04-29' of git://git.kernel.dk/linux-block: io_uring: check that data field is 0 in ringfd unregister io_uring: fix uninitialized field in rw io_kiocb io_uring: check reserved fields for recv/recvmsg io_uring: check reserved fields for send/sendmsg
2022-04-29	Merge tag 'random-5.18-rc5-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator fixes from Jason Donenfeld: - Eric noticed that the memmove() in crng_fast_key_erasure() was bogus, so this has been changed to a memcpy() and the confusing situation clarified with a detailed comment. - [Half]SipHash documentation updates from Bagas and Eric, after Eric pointed out that the use of HalfSipHash in random.c made a bit of the text potentially misleading. * tag 'random-5.18-rc5-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: Documentation: siphash: disambiguate HalfSipHash algorithm from hsiphash functions Documentation: siphash: enclose HalfSipHash usage example in the literal block Documentation: siphash: convert danger note to warning for HalfSipHash random: document crng_fast_key_erasure() destination possibility
2022-04-29	Merge tag 'ceph-for-5.18-rc5' of https://github.com/ceph/ceph-client	Linus Torvalds
	Pull ceph client fixes from Ilya Dryomov: "A fix for a NULL dereference that turns out to be easily triggerable by fsync (marked for stable) and a false positive WARN and snap_rwsem locking fixups" * tag 'ceph-for-5.18-rc5' of https://github.com/ceph/ceph-client: ceph: fix possible NULL pointer dereference for req->r_session ceph: remove incorrect session state check ceph: get snap_rwsem read lock in handle_cap_export for ceph_add_cap libceph: disambiguate cluster/pool full log message
2022-04-29	Revert "arm: dts: at91: Fix boolean properties with values"	Arnd Bergmann
	This reverts commit 0dc23d1a8e17, which caused another regression as the pinctrl code actually expects an integer value of 0 or 1 rather than a simple boolean property. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-04-29	Merge tag 'linux-can-fixes-for-5.18-20220429' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2022-04-29 The first patch is by Oliver Hartkopp and removes the ability to re-binding bounds sockets from the ISOTP. It turned out to be not needed and brings unnecessary complexity. The last 4 patches all target the grcan driver. Duoming Zhou's patch fixes a potential dead lock in the grcan_close() function. Daniel Hellstrom's patch fixes the dma_alloc_coherent() to use the correct device. Andreas Larsson's 1st patch fixes a broken system id check, the 2nd patch fixes the NAPI poll budget usage. * tag 'linux-can-fixes-for-5.18-20220429' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: grcan: only use the NAPI poll budget for RX can: grcan: grcan_probe(): fix broken system id check for errata workaround needs can: grcan: use ofdev->dev when allocating DMA memory can: grcan: grcan_close(): fix deadlock can: isotp: remove re-binding of bound socket ==================== Link: https://lore.kernel.org/r/20220429125612.1792561-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-29	KVM: x86: work around QEMU issue with synthetic CPUID leaves	Paolo Bonzini
	Synthesizing AMD leaves up to 0x80000021 caused problems with QEMU, which assumes the host CPUID[0x80000000].EAX is higher or equal to what KVM_GET_SUPPORTED_CPUID reports. This causes QEMU to issue bogus host CPUIDs when preparing the input to KVM_SET_CPUID2. It can even get into an infinite loop, which is only terminated by an abort(): cpuid_data is full, no space for cpuid(eax:0x8000001d,ecx:0x3e) To work around this, only synthesize those leaves if 0x8000001d exists on the host. The synthetic 0x80000021 leaf is mostly useful on Zen2, which satisfies the condition. Fixes: f144c49e8c39 ("KVM: x86: synthesize CPUID leaf 0x80000021h if useful") Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	Merge tag 'perf-tools-fixes-for-v5.18-2022-04-29' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix Intel PT (Processor Trace) timeless decoding with perf.data directory. - ARM SPE (Statistical Profiling Extensions) address fixes, for synthesized events and for SPE events with physical addresses. Add a simple 'perf test' entry to make sure this doesn't regress. - Remove arch specific processing of kallsyms data to fixup symbol end address, fixing excessive memory consumption in the annotation code. * tag 'perf-tools-fixes-for-v5.18-2022-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf symbol: Remove arch__symbols__fixup_end() perf symbol: Update symbols__fixup_end() perf symbol: Pass is_kallsyms to symbols__fixup_end() perf test: Add perf_event_attr test for Arm SPE perf arm-spe: Fix SPE events with phys addresses perf arm-spe: Fix addresses of synthesized SPE events perf intel-pt: Fix timeless decoding with perf.data directory
2022-04-29	selftests/seccomp: Don't call read() on TTY from background pgrp	Jann Horn
	Since commit 92d25637a3a4 ("kselftest: signal all child processes"), tests are executed in background process groups. This means that trying to read from stdin now throws SIGTTIN when stdin is a TTY, which breaks some seccomp selftests that try to use read(0, NULL, 0) as a dummy syscall. The simplest way to fix that is probably to just use -1 instead of 0 as the dummy read()'s FD. Fixes: 92d25637a3a4 ("kselftest: signal all child processes") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220319010011.1374622-1-jannh@google.com
2022-04-29	Merge tag 'riscv-for-linus-5.18-rc5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - A fix to properly ensure a single CPU is running during patch_text(). - A defconfig update to include RPMSG_CTRL when RPMSG_CHAR was set, necessary after a recent refactoring. * tag 'riscv-for-linus-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: RISC-V: configs: Configs that had RPMSG_CHAR now get RPMSG_CTRL riscv: patch_text: Fixup last cpu should be master
2022-04-29	Merge tag 'arm64-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Will Deacon: "Rename and reallocate the PT_ARM_MEMTAG_MTE ELF segment type. This is a fix to the MTE ELF ABI for a bug that was added during the most recent merge window as part of the coredump support. The issue is that the value assigned to the new PT_ARM_MEMTAG_MTE segment type has already been allocated to PT_AARCH64_UNWIND by the ELF ABI, so we've bumped the value and changed the name of the identifier to be better aligned with the existing one" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: elf: Fix the arm64 MTE ELF segment name and value
2022-04-29	KVM: X86/MMU: Fix shadowing 5-level NPT for 4-level NPT L1 guest	Lai Jiangshan
	When shadowing 5-level NPT for 4-level NPT L1 guest, the root_sp is allocated with role.level = 5 and the guest pagetable's root gfn. And root_sp->spt[0] is also allocated with the same gfn and the same role except role.level = 4. Luckily that they are different shadow pages, but only root_sp->spt[0] is the real translation of the guest pagetable. Here comes a problem: If the guest switches from gCR4_LA57=0 to gCR4_LA57=1 (or vice verse) and uses the same gfn as the root page for nested NPT before and after switching gCR4_LA57. The host (hCR4_LA57=1) might use the same root_sp for the guest even the guest switches gCR4_LA57. The guest will see unexpected page mapped and L2 may exploit the bug and hurt L1. It is lucky that the problem can't hurt L0. And three special cases need to be handled: The root_sp should be like role.direct=1 sometimes: its contents are not backed by gptes, root_sp->gfns is meaningless. (For a normal high level sp in shadow paging, sp->gfns is often unused and kept zero, but it could be relevant and meaningful if sp->gfns is used because they are backed by concrete gptes.) For such root_sp in the case, root_sp is just a portal to contribute root_sp->spt[0], and root_sp->gfns should not be used and root_sp->spt[0] should not be dropped if gpte[0] of the guest root pagetable is changed. Such root_sp should not be accounted too. So add role.passthrough to distinguish the shadow pages in the hash when gCR4_LA57 is toggled and fix above special cases by using it in kvm_mmu_page_{get\|set}_gfn() and sp_has_gptes(). Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Message-Id: <20220420131204.2850-3-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: X86/MMU: Add sp_has_gptes()	Lai Jiangshan
	Add sp_has_gptes() which equals to !sp->role.direct currently. Shadow page having gptes needs to be write-protected, accounted and responded to kvm_mmu_pte_write(). Use it in these places to replace !sp->role.direct and rename for_each_gfn_indirect_valid_sp. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Message-Id: <20220420131204.2850-2-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: SVM: Introduce trace point for the slow-path of avic_kic_target_vcpus	Suravee Suthikulpanit
	This can help identify potential performance issues when handles AVIC incomplete IPI due vCPU not running. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220420154954.19305-3-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: SVM: Use target APIC ID to complete AVIC IRQs when possible	Suravee Suthikulpanit
	Currently, an AVIC-enabled VM suffers from performance bottleneck when scaling to large number of vCPUs for I/O intensive workloads. In such case, a vCPU often executes halt instruction to get into idle state waiting for interrupts, in which KVM would de-schedule the vCPU from physical CPU. When AVIC HW tries to deliver interrupt to the halting vCPU, it would result in AVIC incomplete IPI #vmexit to notify KVM to reschedule the target vCPU into running state. Investigation has shown the main hotspot is in the kvm_apic_match_dest() in the following call stack where it tries to find target vCPUs corresponding to the information in the ICRH/ICRL registers. - handle_exit - svm_invoke_exit_handler - avic_incomplete_ipi_interception - kvm_apic_match_dest However, AVIC provides hints in the #vmexit info, which can be used to retrieve the destination guest physical APIC ID. In addition, since QEMU defines guest physical APIC ID to be the same as vCPU ID, it can be used to quickly identify the target vCPU to deliver IPI, and avoid the overhead from searching through all vCPUs to match the target vCPU. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220420154954.19305-2-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: x86/mmu: replace direct_map with root_role.direct	Paolo Bonzini
	direct_map is always equal to the direct field of the root page's role: - for shadow paging, direct_map is true if CR0.PG=0 and root_role.direct is copied from cpu_role.base.direct - for TDP, it is always true and root_role.direct is also always true - for shadow TDP, it is always false and root_role.direct is also always false Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: x86/mmu: replace root_level with cpu_role.base.level	Paolo Bonzini
	Remove another duplicate field of struct kvm_mmu. This time it's the root level for page table walking; the separate field is always initialized as cpu_role.base.level, so its users can look up the CPU mode directly instead. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: x86/mmu: replace shadow_root_level with root_role.level	Paolo Bonzini
	root_role.level is always the same value as shadow_level: - it's kvm_mmu_get_tdp_level(vcpu) when going through init_kvm_tdp_mmu - it's the level argument when going through kvm_init_shadow_ept_mmu - it's assigned directly from new_role.base.level when going through shadow_mmu_init_context Remove the duplication and get the level directly from the role. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: x86/mmu: pull CPU mode computation to kvm_init_mmu	Paolo Bonzini
	Do not lead init_kvm_*mmu into the temptation of poking into struct kvm_mmu_role_regs, by passing to it directly the CPU mode. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: x86/mmu: simplify and/or inline computation of shadow MMU roles	Paolo Bonzini
	Shadow MMUs compute their role from cpu_role.base, simply by adjusting the root level. It's one line of code, so do not place it in a separate function. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: x86/mmu: remove redundant bits from extended role	Paolo Bonzini
	Before the separation of the CPU and the MMU role, CR0.PG was not available in the base MMU role, because two-dimensional paging always used direct=1 in the MMU role. However, now that the raw role is snapshotted in mmu->cpu_role, the value of CR0.PG always matches both !cpu_role.base.direct and cpu_role.base.level > 0. There is no need to store it again in union kvm_mmu_extended_role; instead, write an is_cr0_pg accessor by hand that takes care of the conversion. Use cpu_role.base.level since the future of the direct field is unclear. Likewise, CR4.PAE is now always present in the CPU role as !cpu_role.base.has_4_byte_gpte. The inversion makes certain tests on the MMU role easier, and is easily hidden by the is_cr4_pae accessor when operating on the CPU role. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: x86/mmu: rename kvm_mmu_role union	Paolo Bonzini
	It is quite confusing that the "full" union is called kvm_mmu_role but is used for the "cpu_role" field of struct kvm_mmu. Rename it to kvm_cpu_role. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: x86/mmu: remove extended bits from mmu_role, rename field	Paolo Bonzini
	mmu_role represents the role of the root of the page tables. It does not need any extended bits, as those govern only KVM's page table walking; the is_* functions used for page table walking always use the CPU role. ext.valid is not present anymore in the MMU role, but an all-zero MMU role is impossible because the level field is never zero in the MMU role. So just zap the whole mmu_role in order to force invalidation after CPUID is updated. While making this change, which requires touching almost every occurrence of "mmu_role", rename it to "root_role". Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: x86/mmu: store shadow EFER.NX in the MMU role	Paolo Bonzini
	Now that the MMU role is separate from the CPU role, it can be a truthful description of the format of the shadow pages. This includes whether the shadow pages use the NX bit; so force the efer_nx field of the MMU role when TDP is disabled, and remove the hardcoding it in the callers of reset_shadow_zero_bits_mask. In fact, the initialization of reserved SPTE bits can now be made common to shadow paging and shadow NPT; move it to shadow_mmu_init_context. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: x86/mmu: cleanup computation of MMU roles for shadow paging	Paolo Bonzini
	Pass the already-computed CPU role, instead of redoing it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: x86/mmu: cleanup computation of MMU roles for two-dimensional paging	Paolo Bonzini
	Inline kvm_calc_mmu_role_common into its sole caller, and simplify it by removing the computation of unnecessary bits. Extended bits are unnecessary because page walking uses the CPU role, and EFER.NX/CR0.WP can be set to one unconditionally---matching the format of shadow pages rather than the format of guest pages. The MMU role for two dimensional paging does still depend on the CPU role, even if only barely so, due to SMM and guest mode; for consistency, pass it down to kvm_calc_tdp_mmu_root_page_role instead of querying the vcpu with is_smm or is_guest_mode. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: x86/mmu: remove kvm_calc_shadow_root_page_role_common	Paolo Bonzini
	kvm_calc_shadow_root_page_role_common is the same as kvm_calc_cpu_role except for the level, which is overwritten afterwards in kvm_calc_shadow_mmu_root_page_role and kvm_calc_shadow_npt_root_page_role. role.base.direct is already set correctly for the CPU role, and CR0.PG=1 is required for VMRUN so it will also be correct for nested NPT. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: x86/mmu: remove ept_ad field	Paolo Bonzini
	The ept_ad field is used during page walk to determine if the guest PTEs have accessed and dirty bits. In the MMU role, the ad_disabled bit represents whether the shadow PTEs have the bits, so it would be incorrect to replace PT_HAVE_ACCESSED_DIRTY with just !mmu->mmu_role.base.ad_disabled. However, the similar field in the CPU mode, ad_disabled, is initialized correctly: to the opposite value of ept_ad for shadow EPT, and zero for non-EPT guest paging modes (which always have A/D bits). It is therefore possible to compute PT_HAVE_ACCESSED_DIRTY from the CPU mode, like other page-format fields; it just has to be inverted to account for the different polarity. In fact, now that the CPU mode is distinct from the MMU roles, it would even be possible to remove PT_HAVE_ACCESSED_DIRTY macro altogether, and use !mmu->cpu_role.base.ad_disabled instead. I am not doing this because the macro has a small effect in terms of dead code elimination: text data bss dec hex 103544 16665 112 120321 1d601 # as of this patch 103746 16665 112 120523 1d6cb # without PT_HAVE_ACCESSED_DIRTY Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: x86/mmu: do not recompute root level from kvm_mmu_role_regs	Paolo Bonzini
	The root_level can be found in the cpu_role (in fact the field is superfluous and could be removed, but one thing at a time). Since there is only one usage left of role_regs_to_root_level, inline it into kvm_calc_cpu_role. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: x86/mmu: split cpu_role from mmu_role	Paolo Bonzini
	Snapshot the state of the processor registers that govern page walk into a new field of struct kvm_mmu. This is a more natural representation than having it mostly in mmu_role but not exclusively; the delta right now is represented in other fields, such as root_level. The nested MMU now has only the CPU role; and in fact the new function kvm_calc_cpu_role is analogous to the previous kvm_calc_nested_mmu_role, except that it has role.base.direct equal to !CR0.PG. For a walk-only MMU, "direct" has no meaning, but we set it to !CR0.PG so that role.ext.cr0_pg can go away in a future patch. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: x86/mmu: remove "bool base_only" arguments	Paolo Bonzini
	The argument is always false now that kvm_mmu_calc_root_page_role has been removed. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: x86: Clean up and document nested #PF workaround	Sean Christopherson
	Replace the per-vendor hack-a-fix for KVM's #PF => #PF => #DF workaround with an explicit, common workaround in kvm_inject_emulated_page_fault(). Aside from being a hack, the current approach is brittle and incomplete, e.g. nSVM's KVM_SET_NESTED_STATE fails to set ->inject_page_fault(), and nVMX fails to apply the workaround when VMX is intercepting #PF due to allow_smaller_maxphyaddr=1. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: x86/mmu: rephrase unclear comment	Paolo Bonzini
	If accessed bits are not supported there simple isn't any distinction between accessed and non-accessed gPTEs, so the comment does not make much sense. Rephrase it in terms of what happens if accessed bits are supported. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: x86/mmu: pull computation of kvm_mmu_role_regs to kvm_init_mmu	Paolo Bonzini
	The init_kvm_*mmu functions, with the exception of shadow NPT, do not need to know the full values of CR0/CR4/EFER; they only need to know the bits that make up the "role". This cleanup however will take quite a few incremental steps. As a start, pull the common computation of the struct kvm_mmu_role_regs into their caller: all of them extract the struct from the vcpu as the very first step. Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: x86/mmu: constify uses of struct kvm_mmu_role_regs	Paolo Bonzini
	struct kvm_mmu_role_regs is computed just once and then accessed. Use const to make this clearer, even though the const fields of struct kvm_mmu_role_regs already prevent (or make it harder...) to modify the contents of the struct. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: x86/mmu: nested EPT cannot be used in SMM	Paolo Bonzini
	The role.base.smm flag is always zero when setting up shadow EPT, do not bother copying it over from vcpu->arch.root_mmu. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: x86/mmu: Use enable_mmio_caching to track if MMIO caching is enabled	Sean Christopherson
	Clear enable_mmio_caching if hardware can't support MMIO caching and use the dedicated flag to detect if MMIO caching is enabled instead of assuming shadow_mmio_value==0 means MMIO caching is disabled. TDX will use a zero value even when caching is enabled, and is_mmio_spte() isn't so hot that it needs to avoid an extra memory access, i.e. there's no reason to be super clever. And the clever approach may not even be more performant, e.g. gcc-11 lands the extra check on a non-zero value inline, but puts the enable_mmio_caching out-of-line, i.e. avoids the few extra uops for non-MMIO SPTEs. Cc: Isaku Yamahata <isaku.yamahata@intel.com> Cc: Kai Huang <kai.huang@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220420002747.3287931-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-29	KVM: x86/mmu: Check for host MMIO exclusion from mem encrypt iff necessary	Sean Christopherson
	When determining whether or not a SPTE needs to have SME/SEV's memory encryption flag set, do the moderately expensive host MMIO pfn check if and only if the memory encryption mask is non-zero. Note, KVM could further optimize the host MMIO checks by making a single call to kvm_is_mmio_pfn(), but the tdp_enabled path (for EPT's memtype handling) will likely be split out to a separate flow[]. At that point, a better approach would be to shove the call to kvm_is_mmio_pfn() into VMX code so that AMD+NPT without SME doesn't get hit with an unnecessary lookup. [] https://lkml.kernel.org/r/20220321224358.1305530-3-bgardon@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220415004909.2216670-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>