linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2022-05-20	ARM: 9198/1: spectre-bhb: simplify BPIALL vector macro	Ard Biesheuvel
	The BPIALL mitigation for Spectre-BHB adds a single instruction to the handler sequence that doesn't clobber any registers. Given that these sequences are 10 instructions long, they don't fit neatly into a cacheline anyway, so we can simply move that single instruction to the start of the unmitigated one, and rearrange the symbol names accordingly. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-05-20	ARM: 9195/1: entry: avoid explicit literal loads	Ard Biesheuvel
	ARMv7 has MOVW/MOVT instruction pairs to load symbol addresses into registers without having to rely on literal loads that go via the D-cache. For older cores, we now support a similar arrangement, based on PC-relative group relocations. This means we can elide most literal loads entirely from the entry path, by switching to the ldr_va macro to emit the appropriate sequence depending on the target architecture revision. While at it, switch to the bl_r macro for invoking the right PABT/DABT helpers instead of setting the LR register explicitly, which does not play well with cores that speculate across function returns. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-05-20	ARM: 9194/1: assembler: simplify ldr_this_cpu for !SMP builds	Ard Biesheuvel
	When CONFIG_SMP is not defined, the CPU offset is always zero, and so we can simplify the sequence to load a per-CPU variable. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-05-20	ARM: 9192/1: amba: fix memory leak in amba_device_try_add()	Wang Kefeng
	If amba_device_try_add() return error code (not EPROBE_DEFER), memory leak occurred when amba device fails to read periphid. unreferenced object 0xc1c60800 (size 1024): comm "swapper/0", pid 1, jiffies 4294937333 (age 75.200s) hex dump (first 32 bytes): 40 40 db c1 04 08 c6 c1 04 08 c6 c1 00 00 00 00 @@.............. 00 d9 c1 c1 84 6f 38 c1 00 00 00 00 01 00 00 00 .....o8......... backtrace: [<(ptrval)>] kmem_cache_alloc_trace+0x168/0x2b4 [<(ptrval)>] amba_device_alloc+0x38/0x7c [<(ptrval)>] of_platform_bus_create+0x2f4/0x4e8 [<(ptrval)>] of_platform_bus_create+0x380/0x4e8 [<(ptrval)>] of_platform_bus_create+0x380/0x4e8 [<(ptrval)>] of_platform_bus_create+0x380/0x4e8 [<(ptrval)>] of_platform_populate+0x70/0xc4 [<(ptrval)>] of_platform_default_populate_init+0xb4/0xcc [<(ptrval)>] do_one_initcall+0x58/0x218 [<(ptrval)>] kernel_init_freeable+0x250/0x29c [<(ptrval)>] kernel_init+0x24/0x148 [<(ptrval)>] ret_from_fork+0x14/0x1c [<00000000>] 0x0 unreferenced object 0xc1db4040 (size 64): comm "swapper/0", pid 1, jiffies 4294937333 (age 75.200s) hex dump (first 32 bytes): 31 63 30 66 30 30 30 30 2e 77 64 74 00 00 00 00 1c0f0000.wdt.... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<(ptrval)>] __kmalloc_track_caller+0x19c/0x2f8 [<(ptrval)>] kvasprintf+0x60/0xcc [<(ptrval)>] kvasprintf_const+0x54/0x78 [<(ptrval)>] kobject_set_name_vargs+0x34/0xa8 [<(ptrval)>] dev_set_name+0x40/0x5c [<(ptrval)>] of_device_make_bus_id+0x128/0x1f8 [<(ptrval)>] of_platform_bus_create+0x4dc/0x4e8 [<(ptrval)>] of_platform_bus_create+0x380/0x4e8 [<(ptrval)>] of_platform_bus_create+0x380/0x4e8 [<(ptrval)>] of_platform_bus_create+0x380/0x4e8 [<(ptrval)>] of_platform_populate+0x70/0xc4 [<(ptrval)>] of_platform_default_populate_init+0xb4/0xcc [<(ptrval)>] do_one_initcall+0x58/0x218 [<(ptrval)>] kernel_init_freeable+0x250/0x29c [<(ptrval)>] kernel_init+0x24/0x148 [<(ptrval)>] ret_from_fork+0x14/0x1c Fix them by adding amba_device_put() to release device name and amba device. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-05-20	ARM: 9193/1: amba: Add amba_read_periphid() helper	Wang Kefeng
	Add new amba_read_periphid() helper to simplify error handling. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-05-20	ASoC: SOF: mediatek: add debug dump	Mark Brown
	Merge series from Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>: Add the ability to generate debug dumps on MediaTek SOF implementations.
2022-05-20	ASoC: remove two unnecessary gpiolib dependencies	Mark Brown
	Merge series from Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>: Remove two dependencies - issues reported by Intel kernel test bot.
2022-05-20	selftests: kvm/x86: Verify the pmu event filter matches the correct event	Aaron Lewis
	Add a test to demonstrate that when the guest programs an event select it is matched correctly in the pmu event filter and not inadvertently filtered. This could happen on AMD if the high nybble[1] in the event select gets truncated away only leaving the bottom byte[2] left for matching. This is a contrived example used for the convenience of demonstrating this issue, however, this can be applied to event selects 0x28A (OC Mode Switch) and 0x08A (L1 BTB Correction), where 0x08A could end up being denied when the event select was only set up to deny 0x28A. [1] bits 35:32 in the event select register and bits 11:8 in the event select. [2] bits 7:0 in the event select register and bits 7:0 in the event select. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Message-Id: <20220517051238.2566934-3-aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-20	selftests: kvm/x86: Add the helper function create_pmu_event_filter	Aaron Lewis
	Add a helper function that creates a pmu event filter given an event list. Currently, a pmu event filter can only be created with the same hard coded event list. Add a way to create one given a different event list. Also, rename make_pmu_event_filter to alloc_pmu_event_filter to clarify it's purpose given the introduction of create_pmu_event_filter. No functional changes intended. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Message-Id: <20220517051238.2566934-2-aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-20	kvm: x86/pmu: Fix the compare function used by the pmu event filter	Aaron Lewis
	When returning from the compare function the u64 is truncated to an int. This results in a loss of the high nybble[1] in the event select and its sign if that nybble is in use. Switch from using a result that can end up being truncated to a result that can only be: 1, 0, -1. [1] bits 35:32 in the event select register and bits 11:8 in the event select. Fixes: 7ff775aca48ad ("KVM: x86/pmu: Use binary search to check filtered events") Signed-off-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220517051238.2566934-1-aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-20	x86/tdx: Fix RETs in TDX asm	Peter Zijlstra
	Because build-testing is over-rated, fix a few trivial objtool complaints: vmlinux.o: warning: objtool: __tdx_module_call+0x3e: missing int3 after ret vmlinux.o: warning: objtool: __tdx_hypercall+0x6e: missing int3 after ret Fixes: eb94f1b6a70a ("x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220520083839.GR2578@worktop.programming.kicks-ass.net
2022-05-20	objtool: Fix objtool regression on x32 systems	Mikulas Patocka
	Commit c087c6e7b551 ("objtool: Fix type of reloc::addend") failed to appreciate cross building from ILP32 hosts, where 'int' == 'long' and the issue persists. As such, use s64/int64_t/Elf64_Sxword for this field and suffer the pain that is ISO C99 printf formats for it. Fixes: c087c6e7b551 ("objtool: Fix type of reloc::addend") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> [peterz: reword changelog, s/long long/s64/] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/alpine.LRH.2.02.2205161041260.11556@file01.intranet.prod.int.rdu2.redhat.com
2022-05-20	objtool: Fix symbol creation	Peter Zijlstra
	Nathan reported objtool failing with the following messages: warning: objtool: no non-local symbols !? warning: objtool: gelf_update_symshndx: invalid section index The problem is due to commit 4abff6d48dbc ("objtool: Fix code relocs vs weak symbols") failing to consider the case where an object would have no non-local symbols. The problem that commit tries to address is adding a STB_LOCAL symbol to the symbol table in light of the ELF spec's requirement that: In each symbol table, all symbols with STB_LOCAL binding preced the weak and global symbols. As ``Sections'' above describes, a symbol table section's sh_info section header member holds the symbol table index for the first non-local symbol. The approach taken is to find this first non-local symbol, move that to the end and then re-use the freed spot to insert a new local symbol and increment sh_info. Except it never considered the case of object files without global symbols and got a whole bunch of details wrong -- so many in fact that it is a wonder it ever worked :/ Specifically: - It failed to re-hash the symbol on the new index, so a subsequent find_symbol_by_index() would not find it at the new location and a query for the old location would now return a non-deterministic choice between the old and new symbol. - It failed to appreciate that the GElf wrappers are not a valid disk format (it works because GElf is basically Elf64 and we only support x86_64 atm.) - It failed to fully appreciate how horrible the libelf API really is and got the gelf_update_symshndx() call pretty much completely wrong; with the direct consequence that if inserting a second STB_LOCAL symbol would require moving the same STB_GLOBAL symbol again it would completely come unstuck. Write a new elf_update_symbol() function that wraps all the magic required to update or create a new symbol at a given index. Specifically, gelf_update_sym*() require an @ndx argument that is relative to the @data argument; this means you have to manually iterate the section data descriptor list and update @ndx. Fixes: 4abff6d48dbc ("objtool: Fix code relocs vs weak symbols") Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/YoPCTEYjoPqE4ZxB@hirez.programming.kicks-ass.net
2022-05-20	x86: Remove empty files	Borislav Petkov
	Remove empty files which were supposed to get removed with the respective commits removing the functionality in them: $ find arch/x86/ -empty arch/x86/lib/mmx_32.c arch/x86/include/asm/fpu/internal.h arch/x86/include/asm/mmx.h Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220520101723.12006-1-bp@alien8.de
2022-05-20	Merge branch 'net-ipa-next'	David S. Miller
	Alex Elder says: ==================== net: ipa: a mix of patches This series includes a mix of things things that are generally minor. The first four are sort of unrelated fixes, and summarizing them here wouldn't be that helpful. The last three together make it so only the "configuration data" we need after initialization is saved for later use. Most such data is used only during driver initialization. But endpoint configuration is needed later, so the last patch saves a copy of that. Eventually we'll want to support reconfiguring endpoints at runtime as well, and this will facilitate that. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-20	net: ipa: save a copy of endpoint default config	Alex Elder
	All elements of the default endpoint configuration are used in the code when programming an endpoint for use. But none of the other configuration data is ever needed once things are initialized. So rather than saving a pointer to all of the configuration data, save a copy of only the endpoint configuration portion. This will eventually allow endpoint configuration to be modifiable at runtime. But even before that it means we won't keep a pointer to configuration data after when no longer needed. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-20	net: ipa: rename a few endpoint config data types	Alex Elder
	Rename the just-moved data structure types to drop the "_data" suffix, to make it more obvious they are no longer meant to be used just as read-only initialization data. Rename the fields and variables of these types to use "config" instead of "data" in the name. This is another small step meant to facilitate review. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-20	net: ipa: move endpoint configuration data definitions	Alex Elder
	Move the definitions of the structures defining endpoint-specific configuration data out of "ipa_data.h" and into "ipa_endpoint.h". This is a trivial movement of code without any other change, to prepare for the next few patches. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-20	net: ipa: open-code ether_setup()	Alex Elder
	About half of the fields set by the call in ipa_modem_netdev_setup() are overwritten after the call. Instead, just skip the call, and open-code the (other) assignments it makes to the net_device structure fields. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-20	net: ipa: ignore endianness if there is no header	Alex Elder
	If we program an RX endpoint to have no header (header length is 0), header-related endpoint configuration values are meaningless and are ignored. The only case we support that defines a header is QMAP endpoints. In ipa_endpoint_init_hdr_ext() we set the endianness mask value unconditionally, but it should not be done if there is no header (meaning it is not configured for QMAP). Set the endianness conditionally, and rearrange the logic in that function slightly to avoid testing the qmap flag twice. Delete an incorrect comment in ipa_endpoint_init_aggr(). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-20	net: ipa: rename a GSI error code	Alex Elder
	The CHANNEL_NOT_RUNNING error condition has been generalized, so rename it to be INCORRECT_CHANNEL_STATE. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-20	net: ipa: drop an unneeded transaction reference	Alex Elder
	In gsi_channel_update(), a reference count is taken on the last completed transaction "to keep it from completing" before we give the event back to the hardware. Completion processing for that transaction (and any other "new" ones) will not occur until after this function returns, so there's no risk it completing early. So there's no need to take and drop the additional transaction reference. Use local variables in the call to gsi_evt_ring_doorbell(). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-20	x86/entry: Fixup objtool/ibt validation	Peter Zijlstra
	Commit 47f33de4aafb ("x86/sev: Mark the code returning to user space as syscall gap") added a bunch of text references without annotating them, resulting in a spree of objtool complaints: vmlinux.o: warning: objtool: vc_switch_off_ist+0x77: relocation to !ENDBR: entry_SYSCALL_64+0x15c vmlinux.o: warning: objtool: vc_switch_off_ist+0x8f: relocation to !ENDBR: entry_SYSCALL_compat+0xa5 vmlinux.o: warning: objtool: vc_switch_off_ist+0x97: relocation to !ENDBR: .entry.text+0x21ea vmlinux.o: warning: objtool: vc_switch_off_ist+0xef: relocation to !ENDBR: .entry.text+0x162 vmlinux.o: warning: objtool: __sev_es_ist_enter+0x60: relocation to !ENDBR: entry_SYSCALL_64+0x15c vmlinux.o: warning: objtool: __sev_es_ist_enter+0x6c: relocation to !ENDBR: .entry.text+0x162 vmlinux.o: warning: objtool: __sev_es_ist_enter+0x8a: relocation to !ENDBR: entry_SYSCALL_compat+0xa5 vmlinux.o: warning: objtool: __sev_es_ist_enter+0xc1: relocation to !ENDBR: .entry.text+0x21ea Since these text references are used to compare against IP, and are not an indirect call target, they don't need ENDBR so annotate them away. Fixes: 47f33de4aafb ("x86/sev: Mark the code returning to user space as syscall gap") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220520082604.GQ2578@worktop.programming.kicks-ass.net
2022-05-20	x86/microcode: Add explicit CPU vendor dependency	Borislav Petkov
	Add an explicit dependency to the respective CPU vendor so that the respective microcode support for it gets built only when that support is enabled. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/8ead0da9-9545-b10d-e3db-7df1a1f219e4@infradead.org
2022-05-19	cgroup: remove the superfluous judgment	Shida Zhang
	Remove the superfluous judgment since the function is never called for a root cgroup, as suggested by Tejun. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> Reviewed-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2022-05-20	Merge tag 'msm-next-5.19-fixes' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/abhinavk/msm into drm-next 5.19 fixes for msm-next - Limiting WB modes to max sspp linewidth - Fixing the supported rotations to add 180 back for IGT - Fix to handle pm_runtime_get_sync() errors to avoid unclocked access in the bind() path for dpu driver - Fix the irq_free() without request issue which was a big-time hitter in the CI-runs. Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com> Signed-off-by: Dave Airlie <airlied@redhat.com> From: Abhinav Kumar <quic_abhinavk@quicinc.com> Link: https://patchwork.freedesktop.org/patch/msgid/b011d51d-d634-123e-bf5f-27219ee33151@quicinc.com
2022-05-20	Merge tag 'drm-misc-next-fixes-2022-05-19' of ↵	Dave Airlie
	git://anongit.freedesktop.org/drm/drm-misc into drm-next A device tree binding change for Rockchip VOP2 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20220519080556.42p52cya4u6y3kps@houat
2022-05-19	Merge tag 'v5.18-p2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "Fix a regression in a recent fix to qcom-rng" * tag 'v5.18-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: qcom-rng - fix infinite loop on requests not multiple of WORD_SZ
2022-05-19	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next	Jakub Kicinski
	Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next, misc updates and fallout fixes from recent Florian's code rewritting (from last pull request): 1) Use new flowi4_l3mdev field in ip_route_me_harder(), from Martin Willi. 2) Avoid unnecessary GC with a timestamp in conncount, from William Tu and Yifeng Sun. 3) Remove TCP conntrack debugging, from Florian Westphal. 4) Fix compilation warning in ctnetlink, from Florian. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: ctnetlink: fix up for "netfilter: conntrack: remove unconfirmed list" netfilter: conntrack: remove pr_debug callsites from tcp tracker netfilter: nf_conncount: reduce unnecessary GC netfilter: Use l3mdev flow key when re-routing mangled packets ==================== Link: https://lore.kernel.org/r/20220519220206.722153-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19	eth: mtk_ppe: fix up after merge	Jakub Kicinski
	I missed this in the barrage of GCC 12 warnings. Commit cf2df74e202d ("net: fix dev_fill_forward_path with pppoe + bridge") changed the pointer into an array. Fixes: d7e6f5836038 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") Link: https://lore.kernel.org/r/20220520012555.2262461-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-20	MAINTAINERS: Update KVM RISC-V entry to cover selftests support	Anup Patel
	We update KVM RISC-V maintainers entry to include appropriate KVM selftests directories so that RISC-V related KVM selftests patches are CC'ed to KVM RISC-V mailing list. Signed-off-by: Anup Patel <anup@brainfault.org>
2022-05-20	RISC-V: KVM: Introduce ISA extension register	Atish Patra
	Currently, there is no provision for vmm (qemu-kvm or kvmtool) to query about multiple-letter ISA extensions. The config register is only used for base single letter ISA extensions. A new ISA extension register is added that will allow the vmm to query about any ISA extension one at a time. It is enabled for both single letter or multi-letter ISA extensions. The ISA extension register is useful to if the vmm requires to retrieve/set single extension while the config register should be used if all the base ISA extension required to retrieve or set. For any multi-letter ISA extensions, the new register interface must be used. Signed-off-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-05-20	RISC-V: KVM: Cleanup stale TLB entries when host CPU changes	Anup Patel
	On RISC-V platforms with hardware VMID support, we share same VMID for all VCPUs of a particular Guest/VM. This means we might have stale G-stage TLB entries on the current Host CPU due to some other VCPU of the same Guest which ran previously on the current Host CPU. To cleanup stale TLB entries, we simply flush all G-stage TLB entries by VMID whenever underlying Host CPU changes for a VCPU. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-05-20	RISC-V: KVM: Add remote HFENCE functions based on VCPU requests	Anup Patel
	The generic KVM has support for VCPU requests which can be used to do arch-specific work in the run-loop. We introduce remote HFENCE functions which will internally use VCPU requests instead of host SBI calls. Advantages of doing remote HFENCEs as VCPU requests are: 1) Multiple VCPUs of a Guest may be running on different Host CPUs so it is not always possible to determine the Host CPU mask for doing Host SBI call. For example, when VCPU X wants to do HFENCE on VCPU Y, it is possible that VCPU Y is blocked or in user-space (i.e. vcpu->cpu < 0). 2) To support nested virtualization, we will be having a separate shadow G-stage for each VCPU and a common host G-stage for the entire Guest/VM. The VCPU requests based remote HFENCEs helps us easily synchronize the common host G-stage and shadow G-stage of each VCPU without any additional IPI calls. This is also a preparatory patch for upcoming nested virtualization support where we will be having a shadow G-stage page table for each Guest VCPU. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-05-20	RISC-V: KVM: Reduce KVM_MAX_VCPUS value	Anup Patel
	Currently, the KVM_MAX_VCPUS value is 16384 for RV64 and 128 for RV32. The KVM_MAX_VCPUS value is too high for RV64 and too low for RV32 compared to other architectures (e.g. x86 sets it to 1024 and ARM64 sets it to 512). The too high value of KVM_MAX_VCPUS on RV64 also leads to VCPU mask on stack consuming 2KB. We set KVM_MAX_VCPUS to 1024 for both RV64 and RV32 to be aligned other architectures. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-05-20	RISC-V: KVM: Introduce range based local HFENCE functions	Anup Patel
	Various __kvm_riscv_hfence_xyz() functions implemented in the kvm/tlb.S are equivalent to corresponding HFENCE.GVMA instructions and we don't have range based local HFENCE functions. This patch provides complete set of local HFENCE functions which supports range based TLB invalidation and supports HFENCE.VVMA based functions. This is also a preparatory patch for upcoming Svinval support in KVM RISC-V. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-05-20	RISC-V: KVM: Treat SBI HFENCE calls as NOPs	Anup Patel
	We should treat SBI HFENCE calls as NOPs until nested virtualization is supported by KVM RISC-V. This will help us test booting a hypervisor under KVM RISC-V. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-05-20	RISC-V: KVM: Add Sv57x4 mode support for G-stage	Anup Patel
	Latest QEMU supports G-stage Sv57x4 mode so this patch extends KVM RISC-V G-stage handling to detect and use Sv57x4 mode when available. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-05-20	RISC-V: KVM: Use G-stage name for hypervisor page table	Anup Patel
	The two-stage address translation defined by the RISC-V privileged specification defines: VS-stage (guest virtual address to guest physical address) programmed by the Guest OS and G-stage (guest physical addree to host physical address) programmed by the hypervisor. To align with above terminology, we replace "stage2" with "gstage" and "Stage2" with "G-stage" name everywhere in KVM RISC-V sources. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-05-20	KVM: selftests: riscv: Remove unneeded semicolon	Jiapeng Chong
	Fix the following coccicheck warnings: ./tools/testing/selftests/kvm/lib/riscv/processor.c:353:3-4: Unneeded semicolon. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-05-20	KVM: selftests: riscv: Improve unexpected guest trap handling	Anup Patel
	Currently, we simply hang using "while (1) ;" upon any unexpected guest traps because the default guest trap handler is guest_hang(). The above approach is not useful to anyone because KVM selftests users will only see a hung application upon any unexpected guest trap. This patch improves unexpected guest trap handling for KVM RISC-V selftests by doing the following: 1) Return to host user-space 2) Dump VCPU registers 3) Die using TEST_ASSERT(0, ...) Signed-off-by: Anup Patel <apatel@ventanamicro.com> Tested-by: Mayuresh Chitale <mchitale@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2022-05-19	Merge branch 'mptcp-miscellaneous-fixes-and-a-new-test-case'	Jakub Kicinski
	Mat Martineau says: ==================== mptcp: Miscellaneous fixes and a new test case Patches 1 and 3 remove helpers that were iterating over the subflow connection list without proper locking. Iteration was not needed in either case. Patch 2 fixes handling of MP_FAIL timeout, checking for orphaned subflows instead of using the MPTCP socket data lock and connection state. Patch 4 adds a test for MP_FAIL timeout using tc pedit to induce checksum failures. ==================== Link: https://lore.kernel.org/r/20220518220446.209750-1-mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19	selftests: mptcp: add MP_FAIL reset testcase	Geliang Tang
	Add the multiple subflows test case for MP_FAIL, to test the MP_FAIL reset case. Use the test_linkfail value to make 1024KB test files. Invoke reset_with_fail() to use 'iptables' and 'tc action pedit' rules to produce the bit flips to trigger the checksum failures on ns2eth2. Add delays on ns2eth1 to make sure more data can translate on ns2eth2. The check_invert flag is enabled in reset_with_fail(), so this test prints out the inverted bytes, instead of the file mismatch errors. Invoke pedit_action_pkts() to get the numbers of the packets edited by the tc pedit actions, and print this numbers to the output. Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19	mptcp: Do not traverse the subflow connection list without lock	Mat Martineau
	The MPTCP socket's conn_list (list of subflows) requires the socket lock to access. The MP_FAIL timeout code added such an access, where it would check the list of subflows both in timer context and (later) in workqueue context where the socket lock is held. Rather than check the list twice, remove the check in the timeout handler and only depend on the check in the workqueue. Also remove the MPTCP_FAIL_NO_RESPONSE flag, since mptcp_mp_fail_no_response() has insignificant overhead and can be checked on each worker run. Fixes: 49fa1919d6bc ("mptcp: reset subflow when MP_FAIL doesn't respond") Reported-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19	mptcp: Check for orphaned subflow before handling MP_FAIL timer	Mat Martineau
	MP_FAIL timeout (waiting for a peer to respond to an MP_FAIL with another MP_FAIL) is implemented using the MPTCP socket's sk_timer. That timer is also used at MPTCP socket close, so it's important to not have the two timer users interfere with each other. At MPTCP socket close, all subflows are orphaned before sk_timer is manipulated. By checking the SOCK_DEAD flag on the subflows, each subflow can determine if the timer is safe to alter without acquiring any MPTCP-level lock. This replaces code that was using the mptcp_data_lock and MPTCP-level socket state checks that did not correctly protect the timer. Fixes: 49fa1919d6bc ("mptcp: reset subflow when MP_FAIL doesn't respond") Reviewed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19	mptcp: stop using the mptcp_has_another_subflow() helper	Paolo Abeni
	The mentioned helper requires the msk socket lock, and the current callers don't own it nor can't acquire it, so the access is racy. All the current callers are really checking for infinite mapping fallback, and the latter condition is explicitly tracked by the relevant msk variable: we can safely remove the caller usage - and the caller itself. The issue is present since MP_FAIL implementation, but the fix only applies since the infinite fallback support, ence the somewhat unexpected fixes tag. Fixes: 0530020a7c8f ("mptcp: track and update contiguous data status") Acked-and-tested-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19	tcp: improve PRR loss recovery	Yuchung Cheng
	This patch improves TCP PRR loss recovery behavior for a corner case. Previously during PRR conservation-bound mode, it strictly sends the amount equals to the amount newly acked or s/acked. The patch changes s.t. PRR may send additional amount that was banked previously (e.g. application-limited) in the conservation-bound mode, similar to the slow-start mode. This unifies and simplifies the algorithm further and may improve the recovery latency. This change still follow the general packet conservation design principle and always keep inflight/cwnd below the slow start threshold set by the congestion control module. PRR is described in RFC 6937. We'll include this change in the latest revision rfc6937-bis as well. Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220519003410.2531936-1-ycheng@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19	bonding: fix missed rcu protection	Hangbin Liu
	When removing the rcu_read_lock in bond_ethtool_get_ts_info() as discussed [1], I didn't notice it could be called via setsockopt, which doesn't hold rcu lock, as syzbot pointed: stack backtrace: CPU: 0 PID: 3599 Comm: syz-executor317 Not tainted 5.18.0-rc5-syzkaller-01392-g01f4685797a5 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 bond_option_active_slave_get_rcu include/net/bonding.h:353 [inline] bond_ethtool_get_ts_info+0x32c/0x3a0 drivers/net/bonding/bond_main.c:5595 __ethtool_get_ts_info+0x173/0x240 net/ethtool/common.c:554 ethtool_get_phc_vclocks+0x99/0x110 net/ethtool/common.c:568 sock_timestamping_bind_phc net/core/sock.c:869 [inline] sock_set_timestamping+0x3a3/0x7e0 net/core/sock.c:916 sock_setsockopt+0x543/0x2ec0 net/core/sock.c:1221 __sys_setsockopt+0x55e/0x6a0 net/socket.c:2223 __do_sys_setsockopt net/socket.c:2238 [inline] __se_sys_setsockopt net/socket.c:2235 [inline] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2235 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f8902c8eb39 Fix it by adding rcu_read_lock and take a ref on the real_dev. Since dev_hold() and dev_put() can take NULL these days, we can skip checking if real_dev exist. [1] https://lore.kernel.org/netdev/27565.1642742439@famine/ Reported-by: syzbot+92beb3d46aab498710fa@syzkaller.appspotmail.com Fixes: aa6034678e87 ("bonding: use rcu_dereference_rtnl when get bonding active slave") Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220519020148.1058344-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19	docs: change the title of networking docs	Jakub Kicinski
	The current title of our section of the documentation is Linux Networking Documentation. Since we're describing a section of Linux Documentation repeating those two words seems redundant. Link: https://lore.kernel.org/r/20220518234346.2088436-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19	net: ipa: don't proceed to out-of-bound write	Jakub Kicinski
	GCC 12 seems upset that we check ipa_irq against array bound but then proceed, anyway: drivers/net/ipa/ipa_interrupt.c: In function ‘ipa_interrupt_add’: drivers/net/ipa/ipa_interrupt.c:196:27: warning: array subscript 30 is above array bounds of ‘void ([30])(struct ipa , enum ipa_irq_id)’ [-Warray-bounds] 196 \| interrupt->handler[ipa_irq] = handler; \| ~~~~~~~~~~~~~~~~~~^~~~~~~~~ drivers/net/ipa/ipa_interrupt.c:42:27: note: while referencing ‘handler’ 42 \| ipa_irq_handler_t handler[IPA_IRQ_COUNT]; \| ^~~~~~~ Reviewed-by: Alex Elder <elder@linaro.org> Link: https://lore.kernel.org/r/20220519004417.2109886-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>