linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2019-10-31	cxgb4/l2t: Simplify 't4_l2e_free()' and '_t4_l2e_free()'	Christophe JAILLET
	Use '__skb_queue_purge()' instead of re-implementing it. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-31	s390/idle: fix cpu idle time calculation	Heiko Carstens
	The idle time reported in /proc/stat sometimes incorrectly contains huge values on s390. This is caused by a bug in arch_cpu_idle_time(). The kernel tries to figure out when a different cpu entered idle by accessing its per-cpu data structure. There is an ordering problem: if the remote cpu has an idle_enter value which is not zero, and an idle_exit value which is zero, it is assumed it is idle since "now". The "now" timestamp however is taken before the idle_enter value is read. Which in turn means that "now" can be smaller than idle_enter of the remote cpu. Unconditionally subtracting idle_enter from "now" can thus lead to a negative value (aka large unsigned value). Fix this by moving the get_tod_clock() invocation out of the loop. While at it also make the code a bit more readable. A similar bug also exists for show_idle_time(). Fix this is as well. Cc: <stable@vger.kernel.org> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-10-31	s390/unwind: fix mixing regs and sp	Ilya Leoshkevich
	unwind_for_each_frame stops after the first frame if regs->gprs[15] <= sp. The reason is that in case regs are specified, the first frame should be regs->psw.addr and the second frame should be sp->gprs[8]. However, currently the second frame is regs->gprs[15], which confuses outside_of_stack(). Fix by introducing a flag to distinguish this special case from unwinding the interrupt handler, for which the current behavior is appropriate. Fixes: 78c98f907413 ("s390/unwind: introduce stack unwind API") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Cc: stable@vger.kernel.org # v5.2+ Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-10-31	s390/cmm: fix information leak in cmm_timeout_handler()	Yihui ZENG
	The problem is that we were putting the NUL terminator too far: buf[sizeof(buf) - 1] = '\0'; If the user input isn't NUL terminated and they haven't initialized the whole buffer then it leads to an info leak. The NUL terminator should be: buf[len - 1] = '\0'; Signed-off-by: Yihui Zeng <yzeng56@asu.edu> Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> [heiko.carstens@de.ibm.com: keep semantics of how lenp and ppos are handled] Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-10-31	Merge branch 'bpf-cleanup-btf-raw-tp'	Daniel Borkmann
	Alexei Starovoitov says: ==================== v1->v2: addressed Andrii's feedback When BTF-enabled raw_tp were introduced the plan was to follow up with BTF-enabled kprobe and kretprobe reusing PROG_RAW_TRACEPOINT and PROG_KPROBE types. But k[ret]probe expect pt_regs while BTF-enabled program ctx will be the same as raw_tp. kretprobe is indistinguishable from kprobe while BTF-enabled kretprobe will have access to retval while kprobe will not. Hence PROG_KPROBE type is not reusable and reusing PROG_RAW_TRACEPOINT no longer fits well. Hence introduce 'umbrella' prog type BPF_PROG_TYPE_TRACING that will cover different BTF-enabled tracing attach points. The changes make libbpf side cleaner as well. check_attach_btf_id() is cleaner too. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-10-31	libbpf: Add support for prog_tracing	Alexei Starovoitov
	Cleanup libbpf from expected_attach_type == attach_btf_id hack and introduce BPF_PROG_TYPE_TRACING. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191030223212.953010-3-ast@kernel.org
2019-10-31	bpf: Replace prog_raw_tp+btf_id with prog_tracing	Alexei Starovoitov
	The bpf program type raw_tp together with 'expected_attach_type' was the most appropriate api to indicate BTF-enabled raw_tp programs. But during development it became apparent that 'expected_attach_type' cannot be used and new 'attach_btf_id' field had to be introduced. Which means that the information is duplicated in two fields where one of them is ignored. Clean it up by introducing new program type where both 'expected_attach_type' and 'attach_btf_id' fields have specific meaning. In the future 'expected_attach_type' will be extended with other attach points that have similar semantics to raw_tp. This patch is replacing BTF-enabled BPF_PROG_TYPE_RAW_TRACEPOINT with prog_type = BPF_RPOG_TYPE_TRACING expected_attach_type = BPF_TRACE_RAW_TP attach_btf_id = btf_id of raw tracepoint inside the kernel Future patches will add expected_attach_type = BPF_TRACE_FENTRY or BPF_TRACE_FEXIT where programs have the same input context and the same helpers, but different attach points. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191030223212.953010-2-ast@kernel.org
2019-10-31	arm64: cpufeature: Enable Qualcomm Falkor errata 1009 for Kryo	Bjorn Andersson
	The Kryo cores share errata 1009 with Falkor, so add their model definitions and enable it for them as well. Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> [will: Update entry in silicon-errata.rst] Signed-off-by: Will Deacon <will@kernel.org>
2019-10-31	KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active	Paolo Bonzini
	VMX already does so if the host has SMEP, in order to support the combination of CR0.WP=1 and CR4.SMEP=1. However, it is perfectly safe to always do so, and in fact VMX already ends up running with EFER.NXE=1 on old processors that lack the "load EFER" controls, because it may help avoiding a slow MSR write. Removing all the conditionals simplifies the code. SVM does not have similar code, but it should since recent AMD processors do support SMEP. So this patch also makes the code for the two vendors more similar while fixing NPT=0, CR0.WP=1 and CR4.SMEP=1 on AMD processors. Cc: stable@vger.kernel.org Cc: Joerg Roedel <jroedel@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-31	kvm: call kvm_arch_destroy_vm if vm creation fails	Jim Mattson
	In kvm_create_vm(), if we've successfully called kvm_arch_init_vm(), but then fail later in the function, we need to call kvm_arch_destroy_vm() so that it can do any necessary cleanup (like freeing memory). Fixes: 44a95dae1d229a ("KVM: x86: Detect and Initialize AVIC support") Signed-off-by: John Sperbeck <jsperbeck@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Junaid Shahid <junaids@google.com> [Remove dependency on "kvm: Don't clear reference count on kvm_create_vm() error path" which was not committed. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-31	efi/efi_test: Lock down /dev/efi_test and require CAP_SYS_ADMIN	Javier Martinez Canillas
	The driver exposes EFI runtime services to user-space through an IOCTL interface, calling the EFI services function pointers directly without using the efivar API. Disallow access to the /dev/efi_test character device when the kernel is locked down to prevent arbitrary user-space to call EFI runtime services. Also require CAP_SYS_ADMIN to open the chardev to prevent unprivileged users to call the EFI runtime services, instead of just relying on the chardev file mode bits for this. The main user of this driver is the fwts [0] tool that already checks if the effective user ID is 0 and fails otherwise. So this change shouldn't cause any regression to this tool. [0]: https://wiki.ubuntu.com/FirmwareTestSuite/Reference/uefivarinfo Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Laszlo Ersek <lersek@redhat.com> Acked-by: Matthew Garrett <mjg59@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191029173755.27149-7-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-31	x86, efi: Never relocate kernel below lowest acceptable address	Kairui Song
	Currently, kernel fails to boot on some HyperV VMs when using EFI. And it's a potential issue on all x86 platforms. It's caused by broken kernel relocation on EFI systems, when below three conditions are met: 1. Kernel image is not loaded to the default address (LOAD_PHYSICAL_ADDR) by the loader. 2. There isn't enough room to contain the kernel, starting from the default load address (eg. something else occupied part the region). 3. In the memmap provided by EFI firmware, there is a memory region starts below LOAD_PHYSICAL_ADDR, and suitable for containing the kernel. EFI stub will perform a kernel relocation when condition 1 is met. But due to condition 2, EFI stub can't relocate kernel to the preferred address, so it fallback to ask EFI firmware to alloc lowest usable memory region, got the low region mentioned in condition 3, and relocated kernel there. It's incorrect to relocate the kernel below LOAD_PHYSICAL_ADDR. This is the lowest acceptable kernel relocation address. The first thing goes wrong is in arch/x86/boot/compressed/head_64.S. Kernel decompression will force use LOAD_PHYSICAL_ADDR as the output address if kernel is located below it. Then the relocation before decompression, which move kernel to the end of the decompression buffer, will overwrite other memory region, as there is no enough memory there. To fix it, just don't let EFI stub relocate the kernel to any address lower than lowest acceptable address. [ ardb: introduce efi_low_alloc_above() to reduce the scope of the change ] Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191029173755.27149-6-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-31	efi: libstub/arm: Account for firmware reserved memory at the base of RAM	Ard Biesheuvel
	The EFI stubloader for ARM starts out by allocating a 32 MB window at the base of RAM, in order to ensure that the decompressor (which blindly copies the uncompressed kernel into that window) does not overwrite other allocations that are made while running in the context of the EFI firmware. In some cases, (e.g., U-Boot running on the Raspberry Pi 2), this is causing boot failures because this initial allocation conflicts with a page of reserved memory at the base of RAM that contains the SMP spin tables and other pieces of firmware data and which was put there by the bootloader under the assumption that the TEXT_OFFSET window right below the kernel is only used partially during early boot, and will be left alone once the memory reservations are processed and taken into account. So let's permit reserved memory regions to exist in the region starting at the base of RAM, and ending at TEXT_OFFSET - 5 * PAGE_SIZE, which is the window below the kernel that is not touched by the early boot code. Tested-by: Guillaume Gardet <Guillaume.Gardet@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Chester Lin <clin@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191029173755.27149-5-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-31	efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness	Dominik Brodowski
	Commit 428826f5358c ("fdt: add support for rng-seed") introduced add_bootloader_randomness(), permitting randomness provided by the bootloader or firmware to be credited as entropy. However, the fact that the UEFI support code was already wired into the RNG subsystem via a call to add_device_randomness() was overlooked, and so it was not converted at the same time. Note that this UEFI (v2.4 or newer) feature is currently only implemented for EFI stub booting on ARM, and further note that CONFIG_RANDOM_TRUST_BOOTLOADER must be enabled, and this should be done only if there indeed is sufficient trust in the bootloader _and_ its source of randomness. [ ardb: update commit log ] Tested-by: Bhupesh Sharma <bhsharma@redhat.com> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191029173755.27149-4-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-31	efi/tpm: Return -EINVAL when determining tpm final events log size fails	Jerry Snitselaar
	Currently nothing checks the return value of efi_tpm_eventlog_init(), but in case that changes in the future make sure an error is returned when it fails to determine the tpm final events log size. Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Fixes: e658c82be556 ("efi/tpm: Only set 'efi_tpm_final_log_size' after ...") Link: https://lkml.kernel.org/r/20191029173755.27149-3-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-31	efi: Make CONFIG_EFI_RCI2_TABLE selectable on x86 only	Narendra K
	For the EFI_RCI2_TABLE Kconfig option, 'make oldconfig' asks the user for input on platforms where the option may not be applicable. This patch modifies the Kconfig option to ask the user for input only when CONFIG_X86 or CONFIG_COMPILE_TEST is set to y. Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Narendra K <Narendra.K@dell.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191029173755.27149-2-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-31	mt7601u: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops	zhong jiang
	It is more clear to use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs file operation rather than DEFINE_SIMPLE_ATTRIBUTE. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Acked-by: Jakub Kicinski <kubakici@wp.pl> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-10-31	rtlwifi: rtl8821ae: Drop condition with no effect	Saurav Girepunje
	As the "else if" and "else" branch body are identical the condition has no effect. So drop the "else if" condition. Signed-off-by: Saurav Girepunje <saurav.girepunje@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-10-31	b43: dma: Fix use true/false for bool type variable	Saurav Girepunje
	use true/false for bool type variables assignment. Signed-off-by: Saurav Girepunje <saurav.girepunje@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-10-31	b43: main: Fix use true/false for bool type	Saurav Girepunje
	use true/false on bool type variable assignment. Signed-off-by: Saurav Girepunje <saurav.girepunje@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-10-31	rtlwifi: rtl8192c: Drop condition with no effect	Saurav Girepunje
	As the "else if" and "else" branch body are identical the condition has no effect. So drop the "else if" condition. Signed-off-by: Saurav Girepunje <saurav.girepunje@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-10-31	rtw88: remove redundant null pointer check on arrays	Colin Ian King
	The checks to see if swing_table->n or swing_table->p are null are redundant since n and p are arrays and can never be null if swing_table is non-null. I believe these are redundant checks and can be safely removed, especially the checks implies that these are not arrays which can lead to confusion. Addresses-Coverity: ("Array compared against 0") Fixes: c97ee3e0bea2 ("rtw88: add power tracking support") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-10-31	rtw88: avoid FW info flood	Yan-Hsuan Chuang
	The FW info was printed everytime driver is powered on, such as leaving IDLE state. It will flood the kernel log. Move the FW info printing to callback when FW is loaded, so that will only be printed once when device is probed. Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-10-31	rtw88: fix potential read outside array boundary	Tzu-En Huang
	The level of cckpd is from 0 to 4, and it is the index of array pd_lvl[] and cs_lvl[]. However, the length of both arrays are 4, which is smaller than the possible maximum input index. Enumerate cck level to make sure the max level will not be wrong if new level is added in future. Fixes: 479c4ee931a6 ("rtw88: add dynamic cck pd mechanism") Signed-off-by: Tzu-En Huang <tehuang@realtek.com> Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-10-31	rtw88: rearrange if..else statements for rx rate indexes	Yan-Hsuan Chuang
	Driver just memset() rx_status to 0 before assigning rate indexes. And driver could never hit the 'else' because the driver checks if 'pkt_stat->rate >= DESC_RATEMCS0', so the 'else' statement can be removed. Also rearrange the if..else statements because DESC_RATEMCS0 is actually larger than DESC_RATE1M ~ DESC_RATE54M, move the check of 'pkt_stat->rate >= DESC_RATEMCS0' to the last to keep an increasing order. Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-10-31	rtw88: use rtw_phy_pg_cfg_pair struct, not arrays	Yan-Hsuan Chuang
	Use proper struct for BB PG tables. TODO: we need to find a way to store the tables that have condition values. Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-10-31	rtw88: 8822b: add RFE type 3 support	Yan-Hsuan Chuang
	Some of the modules use RFE type 3, add corresponding tables for them. Tested-by: G.schlmm <g.schlmm@googlemail.com> Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-10-31	rtw88: fix sparse warnings for power tracking	Yan-Hsuan Chuang
	sparse warnings: drivers/net/wireless/realtek/rtw88/rtw8822b.c:1440:6: sparse: sparse: symbol 'rtw8822b_pwr_track' was not declared. Should it be static? drivers/net/wireless/realtek/rtw88/rtw8822c.c:1008:6: sparse: sparse: symbol 'rtw8822c_pwrtrack_init' was not declared. Should it be static? Fixes: c97ee3e0bea2 ("rtw88: add power tracking support") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-10-31	rtw88: fix sparse warnings for DPK	Yan-Hsuan Chuang
	sparse warnings: drivers/net/wireless/realtek/rtw88/rtw8822c.c:2871:6: sparse: sparse: symbol 'rtw8822c_dpk_cal_coef1' was not declared. Should it be static? drivers/net/wireless/realtek/rtw88/rtw8822c.c:3112:6: sparse: sparse: symbol 'rtw8822c_dpk_track' was not declared. Should it be static? Fixes: 5227c2ee453d ("rtw88: 8822c: add SW DPK support") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-10-31	Merge tag 'dmaengine-fix-5.4-rc6' of ↵	Linus Torvalds
	git://git.infradead.org/users/vkoul/slave-dma Pull dmaengine fixes from Vinod Koul: "A few fixes to the dmaengine drivers: - fix in sprd driver for link list and potential memory leak - tegra transfer failure fix - imx size check fix for script_number - xilinx fix for 64bit AXIDMA and control reg update - qcom bam dma resource leak fix - cppi slave transfer fix when idle" * tag 'dmaengine-fix-5.4-rc6' of git://git.infradead.org/users/vkoul/slave-dma: dmaengine: cppi41: Fix cppi41_dma_prep_slave_sg() when idle dmaengine: qcom: bam_dma: Fix resource leak dmaengine: sprd: Fix the possible memory leak issue dmaengine: xilinx_dma: Fix control reg update in vdma_channel_set_config dmaengine: xilinx_dma: Fix 64-bit simple AXIDMA transfer dmaengine: imx-sdma: fix size check for sdma script_number dmaengine: tegra210-adma: fix transfer failure dmaengine: sprd: Fix the link-list pointer register configuration issue
2019-10-30	Merge branch 'hv_netvsc-fix-error-handling-in-netvsc_attach-set_features'	David S. Miller
	Haiyang Zhang says: ==================== hv_netvsc: fix error handling in netvsc_attach/set_features The error handling code path in these functions are not correct. This patch set fixes them. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30	hv_netvsc: Fix error handling in netvsc_attach()	Haiyang Zhang
	If rndis_filter_open() fails, we need to remove the rndis device created in earlier steps, before returning an error code. Otherwise, the retry of netvsc_attach() from its callers will fail and hang. Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30	hv_netvsc: Fix error handling in netvsc_set_features()	Haiyang Zhang
	When an error is returned by rndis_filter_set_offload_params(), we should still assign the unaffected features to ndev->features. Otherwise, these features will be missing. Fixes: d6792a5a0747 ("hv_netvsc: Add handler for LRO setting change") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30	cxgb4: fix panic when attaching to ULD fail	Vishal Kulkarni
	Release resources when attaching to ULD fail. Otherwise, data mismatch is seen between LLD and ULD later on, which lead to kernel panic when accessing resources that should not even exist in the first place. Fixes: 94cdb8bb993a ("cxgb4: Add support for dynamic allocation of resources for ULD") Signed-off-by: Shahjada Abul Husain <shahjada@chelsio.com> Signed-off-by: Vishal Kulkarni <vishal@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30	Merge branch 'Control-action-percpu-counters-allocation-by-netlink-flag'	David S. Miller
	Vlad Buslov says: ==================== Control action percpu counters allocation by netlink flag Currently, significant fraction of CPU time during TC filter allocation is spent in percpu allocator. Moreover, percpu allocator is protected with single global mutex which negates any potential to improve its performance by means of recent developments in TC filter update API that removed rtnl lock for some Qdiscs and classifiers. In order to significantly improve filter update rate and reduce memory usage we would like to allow users to skip percpu counters allocation for specific action if they don't expect high traffic rate hitting the action, which is a reasonable expectation for hardware-offloaded setup. In that case any potential gains to software fast-path performance gained by usage of percpu-allocated counters compared to regular integer counters protected by spinlock are not important, but amount of additional CPU and memory consumed by them is significant. In order to allow configuring action counters allocation type at runtime, implement following changes: - Implement helper functions to update the action counters and use them in affected actions instead of updating counters directly. This steps abstracts actions implementation from counter types that are being used for particular action instance at runtime. - Modify the new helpers to use percpu counters if they were allocated during action initialization and use regular counters otherwise. - Extend action UAPI TCA_ACT space with TCA_ACT_FLAGS field. Add TCA_ACT_FLAGS_NO_PERCPU_STATS action flag and update hardware-offloaded actions to not allocate percpu counters when the flag is set. With this changes users that prefer action update slow-path speed over software fast-path speed can dynamically request actions to skip percpu counters allocation without affecting other users. Now, lets look at actual performance gains provided by this change. Simple test is used to measure insertion rate - iproute2 TC is executed in parallel by xargs in batch mode, its total execution time is measured by shell builtin "time" command. The command runs 20 concurrent tc instances, each with its own batch file with 100k rules: $ time ls add* \| xargs -n 1 -P 20 sudo tc -b Two main rule profiles are tested. First is simple L2 flower classifier with single gact drop action. The configuration is chosen as worst case scenario because with single-action rules pressure on percpu allocator is minimized. Example rule: filter add dev ens1f0 protocol ip ingress prio 1 handle 1 flower skip_hw src_mac e4:11:0:0:0:0 dst_mac e4:12:0:0:0:0 action drop Second profile is typical real-world scenario that uses flower classifier with some L2-4 fields and two actions (tunnel_key+mirred). Example rule: filter add dev ens1f0_0 protocol ip ingress prio 1 handle 1 flower skip_hw src_mac e4:11:0:0:0:0 dst_mac e4:12:0:0:0:0 src_ip 192.168.111.1 dst_ip 192.168.111.2 ip_proto udp dst_port 1 src_port 1 action tunnel_key set id 1 src_ip 2.2.2.2 dst_ip 2.2.2.3 dst_port 4789 action mirred egress redirect dev vxlan1 Profile \| percpu \| no_percpu \| X improvement \| (k rules/sec) \| (k rules/sec) \| -------------------+---------------+---------------+--------------- Gact drop \| 203 \| 259 \| 1.28 tunnel_key+mirred \| 92 \| 204 \| 2.22 For simple drop action removing percpu allocation leads to ~25% insertion rate improvement. Perf profiles highlights the bottlenecks. Perf profile of run with percpu allocation (gact drop): + 89.11% 0.48% tc [kernel.vmlinux] [k] entry_SYSCALL_64 + 88.58% 0.04% tc [kernel.vmlinux] [k] do_syscall_64 + 87.50% 0.04% tc libc-2.29.so [.] __libc_sendmsg + 86.96% 0.04% tc [kernel.vmlinux] [k] __sys_sendmsg + 86.85% 0.01% tc [kernel.vmlinux] [k] ___sys_sendmsg + 86.60% 0.05% tc [kernel.vmlinux] [k] sock_sendmsg + 86.55% 0.12% tc [kernel.vmlinux] [k] netlink_sendmsg + 86.04% 0.13% tc [kernel.vmlinux] [k] netlink_unicast + 85.42% 0.03% tc [kernel.vmlinux] [k] netlink_rcv_skb + 84.68% 0.04% tc [kernel.vmlinux] [k] rtnetlink_rcv_msg + 84.56% 0.24% tc [kernel.vmlinux] [k] tc_new_tfilter + 75.73% 0.65% tc [cls_flower] [k] fl_change + 71.30% 0.03% tc [kernel.vmlinux] [k] tcf_exts_validate + 71.27% 0.13% tc [kernel.vmlinux] [k] tcf_action_init + 71.06% 0.01% tc [kernel.vmlinux] [k] tcf_action_init_1 + 70.41% 0.04% tc [act_gact] [k] tcf_gact_init + 53.59% 1.21% tc [kernel.vmlinux] [k] __mutex_lock.isra.0 + 52.34% 0.34% tc [kernel.vmlinux] [k] tcf_idr_create - 51.23% 2.17% tc [kernel.vmlinux] [k] pcpu_alloc - 49.05% pcpu_alloc + 39.35% __mutex_lock.isra.0 4.99% memset_erms + 2.16% pcpu_alloc_area + 2.17% __libc_sendmsg + 45.89% 44.33% tc [kernel.vmlinux] [k] osq_lock + 9.94% 0.04% tc [kernel.vmlinux] [k] tcf_idr_check_alloc + 7.76% 0.00% tc [kernel.vmlinux] [k] tcf_idr_insert + 6.50% 0.03% tc [kernel.vmlinux] [k] tfilter_notify + 6.24% 6.11% tc [kernel.vmlinux] [k] mutex_spin_on_owner + 5.73% 5.32% tc [kernel.vmlinux] [k] memset_erms + 5.31% 0.18% tc [kernel.vmlinux] [k] tcf_fill_node Here bottleneck is clearly in pcpu_alloc() function that takes more than half CPU time, which is mostly wasted busy-waiting for internal percpu allocator global lock. With percpu allocation removed (gact drop): + 87.50% 0.51% tc [kernel.vmlinux] [k] entry_SYSCALL_64 + 86.94% 0.07% tc [kernel.vmlinux] [k] do_syscall_64 + 85.75% 0.04% tc libc-2.29.so [.] __libc_sendmsg + 85.00% 0.07% tc [kernel.vmlinux] [k] __sys_sendmsg + 84.84% 0.07% tc [kernel.vmlinux] [k] ___sys_sendmsg + 84.59% 0.01% tc [kernel.vmlinux] [k] sock_sendmsg + 84.58% 0.14% tc [kernel.vmlinux] [k] netlink_sendmsg + 83.95% 0.12% tc [kernel.vmlinux] [k] netlink_unicast + 83.34% 0.01% tc [kernel.vmlinux] [k] netlink_rcv_skb + 82.39% 0.12% tc [kernel.vmlinux] [k] rtnetlink_rcv_msg + 82.16% 0.25% tc [kernel.vmlinux] [k] tc_new_tfilter + 75.13% 0.84% tc [cls_flower] [k] fl_change + 69.92% 0.05% tc [kernel.vmlinux] [k] tcf_exts_validate + 69.87% 0.11% tc [kernel.vmlinux] [k] tcf_action_init + 69.61% 0.02% tc [kernel.vmlinux] [k] tcf_action_init_1 - 68.80% 0.10% tc [act_gact] [k] tcf_gact_init - 68.70% tcf_gact_init + 36.08% tcf_idr_check_alloc + 31.88% tcf_idr_insert + 63.72% 0.58% tc [kernel.vmlinux] [k] __mutex_lock.isra.0 + 58.80% 56.68% tc [kernel.vmlinux] [k] osq_lock + 36.08% 0.04% tc [kernel.vmlinux] [k] tcf_idr_check_alloc + 31.88% 0.01% tc [kernel.vmlinux] [k] tcf_idr_insert The gact actions (like all other actions types) are inserted in single idr instance protected by global (per namespace) lock that becomes new bottleneck with such simple rule profile and prevents achieving 2x+ performance increase that can be expected by looking at profiling data for insertion action with percpu counter. Perf profile of run with percpu allocation (tunnel_key+mirred): + 91.95% 0.21% tc [kernel.vmlinux] [k] entry_SYSCALL_64 + 91.74% 0.06% tc [kernel.vmlinux] [k] do_syscall_64 + 90.74% 0.01% tc libc-2.29.so [.] __libc_sendmsg + 90.52% 0.01% tc [kernel.vmlinux] [k] __sys_sendmsg + 90.50% 0.04% tc [kernel.vmlinux] [k] ___sys_sendmsg + 90.41% 0.02% tc [kernel.vmlinux] [k] sock_sendmsg + 90.38% 0.04% tc [kernel.vmlinux] [k] netlink_sendmsg + 90.10% 0.06% tc [kernel.vmlinux] [k] netlink_unicast + 89.76% 0.01% tc [kernel.vmlinux] [k] netlink_rcv_skb + 89.28% 0.04% tc [kernel.vmlinux] [k] rtnetlink_rcv_msg + 89.15% 0.03% tc [kernel.vmlinux] [k] tc_new_tfilter + 83.41% 0.33% tc [cls_flower] [k] fl_change + 81.17% 0.04% tc [kernel.vmlinux] [k] tcf_exts_validate + 81.13% 0.06% tc [kernel.vmlinux] [k] tcf_action_init + 81.04% 0.04% tc [kernel.vmlinux] [k] tcf_action_init_1 - 73.59% 2.16% tc [kernel.vmlinux] [k] pcpu_alloc - 71.42% pcpu_alloc + 61.41% __mutex_lock.isra.0 5.02% memset_erms + 2.93% pcpu_alloc_area + 2.16% __libc_sendmsg + 63.58% 0.17% tc [kernel.vmlinux] [k] tcf_idr_create + 63.40% 0.60% tc [kernel.vmlinux] [k] __mutex_lock.isra.0 + 57.85% 56.38% tc [kernel.vmlinux] [k] osq_lock + 46.27% 0.13% tc [act_tunnel_key] [k] tunnel_key_init + 34.26% 0.02% tc [act_mirred] [k] tcf_mirred_init + 10.99% 0.00% tc [kernel.vmlinux] [k] dst_cache_init + 5.32% 5.11% tc [kernel.vmlinux] [k] memset_erms With two times more actions pressure on percpu allocator doubles, so now it takes ~74% of CPU execution time. With percpu allocation removed (tunnel_key+mirred): + 86.02% 0.50% tc [kernel.vmlinux] [k] entry_SYSCALL_64 + 85.51% 0.12% tc [kernel.vmlinux] [k] do_syscall_64 + 84.40% 0.03% tc libc-2.29.so [.] __libc_sendmsg + 83.84% 0.03% tc [kernel.vmlinux] [k] __sys_sendmsg + 83.72% 0.01% tc [kernel.vmlinux] [k] ___sys_sendmsg + 83.56% 0.01% tc [kernel.vmlinux] [k] sock_sendmsg + 83.50% 0.08% tc [kernel.vmlinux] [k] netlink_sendmsg + 83.02% 0.17% tc [kernel.vmlinux] [k] netlink_unicast + 82.48% 0.00% tc [kernel.vmlinux] [k] netlink_rcv_skb + 81.89% 0.11% tc [kernel.vmlinux] [k] rtnetlink_rcv_msg + 81.71% 0.25% tc [kernel.vmlinux] [k] tc_new_tfilter + 73.99% 0.63% tc [cls_flower] [k] fl_change + 69.72% 0.00% tc [kernel.vmlinux] [k] tcf_exts_validate + 69.72% 0.09% tc [kernel.vmlinux] [k] tcf_action_init + 69.53% 0.05% tc [kernel.vmlinux] [k] tcf_action_init_1 + 53.08% 0.91% tc [kernel.vmlinux] [k] __mutex_lock.isra.0 + 45.52% 43.99% tc [kernel.vmlinux] [k] osq_lock - 36.02% 0.21% tc [act_tunnel_key] [k] tunnel_key_init - 35.81% tunnel_key_init + 15.95% tcf_idr_check_alloc + 13.91% tcf_idr_insert - 4.70% dst_cache_init + 4.68% pcpu_alloc + 33.22% 0.04% tc [kernel.vmlinux] [k] tcf_idr_check_alloc + 32.34% 0.05% tc [act_mirred] [k] tcf_mirred_init + 28.24% 0.01% tc [kernel.vmlinux] [k] tcf_idr_insert + 7.79% 0.05% tc [kernel.vmlinux] [k] idr_alloc_u32 + 7.67% 7.35% tc [kernel.vmlinux] [k] idr_get_free + 6.46% 6.22% tc [kernel.vmlinux] [k] mutex_spin_on_owner + 5.11% 0.05% tc [kernel.vmlinux] [k] tfilter_notify With percpu allocation removed insertion rate is increased by ~120%. Such rule profile scales much better than simple single action because both types of actions were competing for single lock in percpu allocator, but not for action idr lock, which is per-action. Note that percpu allocator is still used by dst_cache in tunnel_key actions and consumes 4.68% CPU time. Dst_cache seems like good opportunity for further insertion rate optimization but is not addressed by this change. Another improvement provided by this change is significantly reduced memory usage. The test is implemented by sampling "used memory" value from "vmstat -s" command output. Following table includes memory usage measurements for same two configurations that were used for measuring insertion rate: Profile \| Mem per rule \| Mem per rule no_percpu \| Less memory used \| (KB) \| (KB) \| (KB) -------------------+--------------+------------------------+------------------ Gact drop \| 3.91 \| 2.51 \| 1.4 tunnel_key+mirred \| 6.73 \| 3.91 \| 2.8 Results indicate that memory usage of percpu allocator per action is ~1.4 KB. Note that any measurements of percpu allocator memory usage is inherently tied to particular setup since memory usage is linear to number of cores in system. It is to be expected that on current top of the line servers percpu allocator memory usage will be 2-5x more than on 24 CPUs setup that was used for testing. Setup details: 2x Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, 32GB memory Patches applied on top of net-next branch: commit 2203cbf2c8b58a1e3bef98c47531d431d11639a0 (net-next) Author: Russell King <rmk+kernel@armlinux.org.uk> Date: Tue Oct 15 11:38:39 2019 +0100 net: sfp: move fwnode parsing into sfp-bus layer Changes V1 -> V2: - Include memory measurements. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30	tc-testing: implement tests for new fast_init action flag	Vlad Buslov
	Add basic tests to verify action creation with new fast_init flag for all actions that support the flag. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30	net: sched: update action implementations to support flags	Vlad Buslov
	Extend struct tc_action with new "tcfa_flags" field. Set the field in tcf_idr_create() function and provide new helper tcf_idr_create_from_flags() that derives 'cpustats' boolean from flags value. Update individual hardware-offloaded actions init() to pass their "flags" argument to new helper in order to skip percpu stats allocation when user requested it through flags. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30	net: sched: extend TCA_ACT space with TCA_ACT_FLAGS	Vlad Buslov
	Extend TCA_ACT space with nla_bitfield32 flags. Add TCA_ACT_FLAGS_NO_PERCPU_STATS as the only allowed flag. Parse the flags in tcf_action_init_1() and pass resulting value as additional argument to a_o->init(). Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30	net: sched: modify stats helper functions to support regular stats	Vlad Buslov
	Modify stats update helper functions introduced in previous patches in this series to fallback to regular tc_action->tcfa_{b\|q}stats if cpu stats are not allocated for the action argument. If regular non-percpu allocated counters are in use, then obtain action tcfa_lock while modifying them. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30	net: sched: don't expose action qstats to skb_tc_reinsert()	Vlad Buslov
	Previous commit introduced helper function for updating qstats and refactored set of actions to use the helpers, instead of modifying qstats directly. However, one of the affected action exposes its qstats to skb_tc_reinsert(), which then modifies it. Refactor skb_tc_reinsert() to return integer error code and don't increment overlimit qstats in case of error, and use the returned error code in tcf_mirred_act() to manually increment the overlimit counter with new helper function. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30	net: sched: extract qstats update code into functions	Vlad Buslov
	Extract common code that increments cpu_qstats counters into standalone act API functions. Change hardware offloaded actions that use percpu counter allocation to use the new functions instead of accessing cpu_qstats directly. This commit doesn't change functionality. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30	net: sched: extract bstats update code into function	Vlad Buslov
	Extract common code that increments cpu_bstats counter into standalone act API function. Change hardware offloaded actions that use percpu counter allocation to use the new function instead of incrementing cpu_bstats directly. This commit doesn't change functionality. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30	net: sched: extract common action counters update code into function	Vlad Buslov
	Currently, all implementations of tc_action_ops->stats_update() callback have almost exactly the same implementation of counters update code (besides gact which also updates drop counter). In order to simplify support for using both percpu-allocated and regular action counters depending on run-time flag in following patches, extract action counters update code into standalone function in act API. This commit doesn't change functionality. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-31	bpf: Fix bpf jit kallsym access	Alexei Starovoitov
	Jiri reported crash when JIT is on, but net.core.bpf_jit_kallsyms is off. bpf_prog_kallsyms_find() was skipping addr->bpf_prog resolution logic in oops and stack traces. That's incorrect. It should only skip addr->name resolution for 'cat /proc/kallsyms'. That's what bpf_jit_kallsyms and bpf_jit_harden protect. Fixes: 3dec541b2e63 ("bpf: Add support for BTF pointers to x86 JIT") Reported-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191030233019.1187404-1-ast@kernel.org
2019-10-30	net: qrtr: Simplify 'qrtr_tun_release()'	Christophe JAILLET
	Use 'skb_queue_purge()' instead of re-implementing it. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30	Merge branch '1GbE' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 1GbE Intel Wired LAN Driver Updates 2019-10-29 This series contains updates to e1000e, igb, ixgbe and i40e drivers. Sasha adds support for Intel client platforms Comet Lake and Tiger Lake to the e1000e driver. Also adds a fix for a compiler warning that was recently introduced, when CONFIG_PM_SLEEP is not defined, so wrap the code that requires this kernel configuration to be defined. Alex fixes a potential race condition between network configuration and power management for e1000e, which is similar to a past issue in the igb driver. Also provided a bit of code cleanup since the driver no longer checks for __E1000_DOWN. Josh Hunt adds UDP segmentation offload support for igb, ixgbe and i40e. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30	wimax: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops	zhong jiang
	It is more clear to use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs file operation rather than DEFINE_SIMPLE_ATTRIBUTE. It is detected with the help of coccinelle. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30	net: dsa: add ethtool pause configuration support	Heiner Kallweit
	This patch adds glue logic to make pause settings per port configurable vie ethtool. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30	vxlan: drop "vxlan" parameter in vxlan_fdb_alloc()	Guillaume Nault
	This parameter has never been used. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30	net: phy: marvell: add downshift support for 88E1145	Heiner Kallweit
	Add downshift support for 88E1145, it uses the same downshift configuration registers as 88E1111. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>