linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2019-11-20	net: sfp: add support for module quirks	Russell King
	Add support for applying module quirks to the list of supported ethtool link modes. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20	tcp: warn if offset reach the maxlen limit when using snprintf	Hangbin Liu
	snprintf returns the number of chars that would be written, not number of chars that were actually written. As such, 'offs' may get larger than 'tbl.maxlen', causing the 'tbl.maxlen - offs' being < 0, and since the parameter is size_t, it would overflow. Since using scnprintf may hide the limit error, while the buffer is still enough now, let's just add a WARN_ON_ONCE in case it reach the limit in future. v2: Use WARN_ON_ONCE as Jiri and Eric suggested. Suggested-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20	ip_gre: Make none-tun-dst gre tunnel store tunnel info as metadat_dst in recv	wenxu
	Currently collect_md gre tunnel will store the tunnel info(metadata_dst) to skb_dst. And now the non-tun-dst gre tunnel already can add tunnel header through lwtunnel. When received a arp_request on the non-tun-dst gre tunnel. The packet of arp response will send through the non-tun-dst tunnel without tunnel info which will lead the arp response packet to be dropped. If the non-tun-dst gre tunnel also store the tunnel info as metadata_dst, The arp response packet will set the releted tunnel info in the iptunnel_metadata_reply. The following is the test script: ip netns add cl ip l add dev vethc type veth peer name eth0 netns cl ifconfig vethc 172.168.0.7/24 up ip l add dev tun1000 type gretap key 1000 ip link add user1000 type vrf table 1 ip l set user1000 up ip l set dev tun1000 master user1000 ifconfig tun1000 10.0.1.1/24 up ip netns exec cl ifconfig eth0 172.168.0.17/24 up ip netns exec cl ip l add dev tun type gretap local 172.168.0.17 remote 172.168.0.7 key 1000 ip netns exec cl ifconfig tun 10.0.1.7/24 up ip r r 10.0.1.7 encap ip id 1000 dst 172.168.0.17 key dev tun1000 table 1 With this patch ip netns exec cl ping 10.0.1.1 can success Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20	net-sysfs: fix netdev_queue_add_kobject() breakage	Eric Dumazet
	kobject_put() should only be called in error path. Fixes: b8eb718348b8 ("net-sysfs: Fix reference count leak in rx\|netdev_queue_add_kobject") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jouni Hogander <jouni.hogander@unikie.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21	KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path	Greg Kurz
	We need to check the host page size is big enough to accomodate the EQ. Let's do this before taking a reference on the EQ page to avoid a potential leak if the check fails. Cc: stable@vger.kernel.org # v5.2 Fixes: 13ce3297c576 ("KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration") Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-11-21	KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one	Greg Kurz
	The EQ page is allocated by the guest and then passed to the hypervisor with the H_INT_SET_QUEUE_CONFIG hcall. A reference is taken on the page before handing it over to the HW. This reference is dropped either when the guest issues the H_INT_RESET hcall or when the KVM device is released. But, the guest can legitimately call H_INT_SET_QUEUE_CONFIG several times, either to reset the EQ (vCPU hot unplug) or to set a new EQ (guest reboot). In both cases the existing EQ page reference is leaked because we simply overwrite it in the XIVE queue structure without calling put_page(). This is especially visible when the guest memory is backed with huge pages: start a VM up to the guest userspace, either reboot it or unplug a vCPU, quit QEMU. The leak is observed by comparing the value of HugePages_Free in /proc/meminfo before and after the VM is run. Ideally we'd want the XIVE code to handle the EQ page de-allocation at the platform level. This isn't the case right now because the various XIVE drivers have different allocation needs. It could maybe worth introducing hooks for this purpose instead of exposing XIVE internals to the drivers, but this is certainly a huge work to be done later. In the meantime, for easier backport, fix both vCPU unplug and guest reboot leaks by introducing a wrapper around xive_native_configure_queue() that does the necessary cleanup. Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Cc: stable@vger.kernel.org # v5.2 Fixes: 13ce3297c576 ("KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration") Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org> Tested-by: Lijun Pan <ljp@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-11-21	Merge tag 'drm-fixes-5.4-2019-11-20' of ↵	Dave Airlie
	git://people.freedesktop.org/~agd5f/linux into drm-fixes drm-fixes-5.4-2019-11-20: amdgpu: - Remove experimental flag for navi14 - Fix confusing power message failures on older VI parts - Hang fix for gfxoff when using the read register interface - Two stability regression fixes for Raven Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191120235130.23755-1-alexander.deucher@amd.com
2019-11-20	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next	David S. Miller
	Daniel Borkmann says: ==================== pull-request: bpf-next 2019-11-20 The following pull-request contains BPF updates for your net-next tree. We've added 81 non-merge commits during the last 17 day(s) which contain a total of 120 files changed, 4958 insertions(+), 1081 deletions(-). There are 3 trivial conflicts, resolve it by always taking the chunk from 196e8ca74886c433: <<<<<<< HEAD ======= void bpf_map_area_mmapable_alloc(u64 size, int numa_node); >>>>>>> 196e8ca74886c433dcfc64a809707074b936aaf5 <<<<<<< HEAD void bpf_map_area_alloc(u64 size, int numa_node) ======= static void __bpf_map_area_alloc(u64 size, int numa_node, bool mmapable) >>>>>>> 196e8ca74886c433dcfc64a809707074b936aaf5 <<<<<<< HEAD if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { ======= / kmalloc()'ed memory can't be mmap()'ed */ if (!mmapable && size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { >>>>>>> 196e8ca74886c433dcfc64a809707074b936aaf5 The main changes are: 1) Addition of BPF trampoline which works as a bridge between kernel functions, BPF programs and other BPF programs along with two new use cases: i) fentry/fexit BPF programs for tracing with practically zero overhead to call into BPF (as opposed to k[ret]probes) and ii) attachment of the former to networking related programs to see input/output of networking programs (covering xdpdump use case), from Alexei Starovoitov. 2) BPF array map mmap support and use in libbpf for global data maps; also a big batch of libbpf improvements, among others, support for reading bitfields in a relocatable manner (via libbpf's CO-RE helper API), from Andrii Nakryiko. 3) Extend s390x JIT with usage of relative long jumps and loads in order to lift the current 64/512k size limits on JITed BPF programs there, from Ilya Leoshkevich. 4) Add BPF audit support and emit messages upon successful prog load and unload in order to have a timeline of events, from Daniel Borkmann and Jiri Olsa. 5) Extension to libbpf and xdpsock sample programs to demo the shared umem mode (XDP_SHARED_UMEM) as well as RX-only and TX-only sockets, from Magnus Karlsson. 6) Several follow-up bug fixes for libbpf's auto-pinning code and a new API call named bpf_get_link_xdp_info() for retrieving the full set of prog IDs attached to XDP, from Toke Høiland-Jørgensen. 7) Add BTF support for array of int, array of struct and multidimensional arrays and enable it for skb->cb[] access in kfree_skb test, from Martin KaFai Lau. 8) Fix AF_XDP by using the correct number of channels from ethtool, from Luigi Rizzo. 9) Two fixes for BPF selftest to get rid of a hang in test_tc_tunnel and to avoid xdping to be run as standalone, from Jiri Benc. 10) Various BPF selftest fixes when run with latest LLVM trunk, from Yonghong Song. 11) Fix a memory leak in BPF fentry test run data, from Colin Ian King. 12) Various smaller misc cleanups and improvements mostly all over BPF selftests and samples, from Daniel T. Lee, Andre Guedes, Anders Roxell, Mao Wenan, Yue Haibing. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20	Revert "drm/amd/display: enable S/G for RAVEN chip"	Alex Deucher
	This reverts commit 1c4259159132ae4ceaf7c6db37a6cf76417f73d9. S/G display is not stable with the IOMMU enabled on some platforms. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=205523 Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2019-11-20	drm/amdgpu: disable gfxoff on original raven	Alex Deucher
	There are still combinations of sbios and firmware that are not stable. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=204689 Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2019-11-20	drm/amdgpu: disable gfxoff when using register read interface	Alex Deucher
	When gfxoff is enabled, accessing gfx registers via MMIO can lead to a hang. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=205497 Acked-by: Xiaojie Yuan <xiaojie.yuan@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2019-11-20	drm/amd/powerplay: correct fine grained dpm force level setting	Evan Quan
	For fine grained dpm, there is only two levels supported. However to reflect correctly the current clock frequency, there is an intermediate level faked. Thus on forcing level setting, we need to treat level 2 correctly as level 1. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Kevin Wang <kevin1.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-20	drm/amd/powerplay: issue no PPSMC_MSG_GetCurrPkgPwr on unsupported ASICs	Evan Quan
	Otherwise, the error message prompted will confuse user. Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2019-11-20	drm/amdgpu: remove experimental flag for Navi14	Alex Deucher
	5.4 and newer works fine with navi14. Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-20	block,bfq: Skip tracing hooks if possible	Dmitry Monakhov
	In most cases blk_tracing is not active, but bfq_log_bfqq macro generate pid_str unconditionally, which result in significant overhead. ## Test modprobe null_blk echo bfq > /sys/block/nullb0/queue/scheduler fio --name=t --ioengine=libaio --direct=1 --filename=/dev/nullb0 \ --runtime=30 --time_based=1 --rw=write --iodepth=128 --bs=4k # Results \| \| baseline \| w/ patch \| gain \| \| iops \| 113.19K \| 126.42K \| +11% \| Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Dmitry Monakhov <dmonakhov@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-20	x86/ioperm: Fix use of deprecated config option	Alexander Duyck
	The commit 111e7b15cf10 ("x86/ioperm: Extend IOPL config to control ioperm() as well") replaced X86_IOPL_EMULATION with X86_IOPL_IOPERM. However it appears that there was at least one spot missed as tss_update_io_bitmap() still had a reference to it contained in the code. The result of this is that it exposed a NULL pointer dereference as seen below with a linux-next next-20191120 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 5 PID: 1542 Comm: ovs-vswitchd Tainted: G W 5.4.0-rc8-next-20191120 #125 RIP: 0010:tss_update_io_bitmap+0x4e/0x180 Code: 10 31 c0 65 48 03 1d 69 54 5d 6d 65 48 8b 04 25 40 8c 01 00 48 8b 10 \ f7 c2 00 00 40 00 0f 84 8c 00 00 00 4c 8b a0 c0 22 00 00 <49> 8b 04 \ 24 48 39 43 68 74 2e 8b 53 70 41 39 54 24 0c 48 8d 7b 78 RSP: 0018:ffffb8888a0ebf08 EFLAGS: 00010006 RAX: ffff8a429811a680 RBX: ffff8a4c3f946000 RCX: 0000000000000011 RDX: 0000000000400080 RSI: 0000000000400080 RDI: 0000000000000000 RBP: ffffb8888a0ebf30 R08: 00007ffffb5d7ce0 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f68a9635c40(0000) GS:ffff8a4c3f940000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000103572a001 CR4: 00000000001606e0 Call Trace: ? syscall_slow_exit_work+0x39/0xdb do_syscall_64+0x1a5/0x200 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f68a7aff797 Fixes: 111e7b15cf10 ("x86/ioperm: Extend IOPL config to control ioperm() as well") Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191120222426.3060.18462.stgit@localhost.localdomain
2019-11-20	Revert "dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues"	Mike Snitzer
	This reverts commit a1b89132dc4f61071bdeaab92ea958e0953380a1. Revert required hand-patching due to subsequent changes that were applied since commit a1b89132dc4f61071bdeaab92ea958e0953380a1. Requires: ed0302e83098d ("dm crypt: make workqueue names device-specific") Cc: stable@vger.kernel.org Bug: https://bugzilla.kernel.org/show_bug.cgi?id=199857 Reported-by: Vito Caputo <vcaputo@pengaru.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-11-20	bpf: Switch bpf_map_{area_alloc,area_mmapable_alloc}() to u64 size	Daniel Borkmann
	Given we recently extended the original bpf_map_area_alloc() helper in commit fc9702273e2e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY"), we need to apply the same logic as in ff1c08e1f74b ("bpf: Change size to u64 for bpf_map_{area_alloc, charge_init}()"). To avoid conflicts, extend it for bpf-next. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-11-20	bpf: Emit audit messages upon successful prog load and unload	Daniel Borkmann
	Allow for audit messages to be emitted upon BPF program load and unload for having a timeline of events. The load itself is in syscall context, so additional info about the process initiating the BPF prog creation can be logged and later directly correlated to the unload event. The only info really needed from BPF side is the globally unique prog ID where then audit user space tooling can query / dump all info needed about the specific BPF program right upon load event and enrich the record, thus these changes needed here can be kept small and non-intrusive to the core. Raw example output: # auditctl -D # auditctl -a always,exit -F arch=x86_64 -S bpf # ausearch --start recent -m 1334 [...] ---- time->Wed Nov 20 12:45:51 2019 type=PROCTITLE msg=audit(1574271951.590:8974): proctitle="./test_verifier" type=SYSCALL msg=audit(1574271951.590:8974): arch=c000003e syscall=321 success=yes exit=14 a0=5 a1=7ffe2d923e80 a2=78 a3=0 items=0 ppid=742 pid=949 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=2 comm="test_verifier" exe="/root/bpf-next/tools/testing/selftests/bpf/test_verifier" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null) type=UNKNOWN[1334] msg=audit(1574271951.590:8974): auid=0 uid=0 gid=0 ses=2 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=949 comm="test_verifier" exe="/root/bpf-next/tools/testing/selftests/bpf/test_verifier" prog-id=3260 event=LOAD ---- time->Wed Nov 20 12:45:51 2019 type=UNKNOWN[1334] msg=audit(1574271951.590:8975): prog-id=3260 event=UNLOAD ---- [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191120213816.8186-1-jolsa@kernel.org
2019-11-20	Merge tag 'mlx5-fixes-2019-11-20' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2019-11-20 This series introduces some fixes to mlx5 driver. Please pull and let me know if there is any problem. For -stable v4.9: ('net/mlx5e: Fix set vf link state error flow') For -stable v4.14 ('net/mlxfw: Verify FSM error code translation doesn't exceed array size') For -stable v4.19 ('net/mlx5: Fix auto group size calculation') For -stable v5.3 ('net/mlx5e: Fix error flow cleanup in mlx5e_tc_tun_create_header_ipv4/6') ('net/mlx5e: Do not use non-EXT link modes in EXT mode') ('net/mlx5: Update the list of the PCI supported devices') ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20	Merge branch 'r8169-smaller-improvements-to-firmware-handling'	David S. Miller
	Heiner Kallweit says: ==================== r8169: smaller improvements to firmware handling This series includes few smaller improvements to firmware handling. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20	r8169: add check for PHY_MDIO_CHG to rtl_nic_fw_data_ok	Heiner Kallweit
	Only values 0 and 1 are currently defined as parameters for PHY_MDIO_CHG. Instead of silently ignoring unknown values and misinterpreting the firmware code let's explicitly check. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20	r8169: use macro FIELD_SIZEOF in definition of FW_OPCODE_SIZE	Heiner Kallweit
	Using macro FIELD_SIZEOF makes this define easier understandable. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20	r8169: change mdelay to msleep in rtl_fw_write_firmware	Heiner Kallweit
	We're not in atomic context here, therefore switch to msleep. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20	r8152: Re-order napi_disable in rtl8152_close	Prashant Malani
	Both rtl_work_func_t() and rtl8152_close() call napi_disable(). Since the two calls aren't protected by a lock, if the close function starts executing before the work function, we can get into a situation where the napi_disable() function is called twice in succession (first by rtl8152_close(), then by set_carrier()). In such a situation, the second call would loop indefinitely, since rtl8152_close() doesn't call napi_enable() to clear the NAPI_STATE_SCHED bit. The rtl8152_close() function in turn issues a cancel_delayed_work_sync(), and so it would wait indefinitely for the rtl_work_func_t() to complete. Since rtl8152_close() is called by a process holding rtnl_lock() which is requested by other processes, this eventually leads to a system deadlock and crash. Re-order the napi_disable() call to occur after the work function disabling and urb cancellation calls are issued. Change-Id: I6ef0b703fc214998a037a68f722f784e1d07815e Reported-by: http://crbug.com/1017928 Signed-off-by: Prashant Malani <pmalani@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20	Merge branch 'qca_spi-fixes'	David S. Miller
	Stefan Wahren says: ==================== net: qca_spi: Fix receive and reset issues This small patch series fixes two major issues in the SPI driver for the QCA700x. It has been tested on a Charge Control C 300 (NXP i.MX6ULL + 2x QCA7000). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20	net: qca_spi: Move reset_count to struct qcaspi	Stefan Wahren
	The reset counter is specific for every QCA700x chip. So move this into the private driver struct. Otherwise we get unpredictable reset behavior in setups with multiple QCA700x chips. Fixes: 291ab06ecf67 (net: qualcomm: new Ethernet over SPI driver for QCA7000) Signed-off-by: Stefan Wahren <stefan.wahren@in-tech.com> Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20	net: qca_spi: fix receive buffer size check	Michael Heimpold
	When receiving many or larger packets, e.g. when doing a file download, it was observed that the read buffer size register reports up to 4 bytes more than the current define allows in the check. If this is the case, then no data transfer is initiated to receive the packets (and thus to empty the buffer) which results in a stall of the interface. These 4 bytes are a hardware generated frame length which is prepended to the actual frame, thus we have to respect it during our check. Fixes: 026b907d58c4 ("net: qca_spi: Add available buffer space verification") Signed-off-by: Michael Heimpold <michael.heimpold@in-tech.com> Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20	net: ipconfig: Wait for deferred device probes	Thomas Bogendoerfer
	If network device drives are using deferred probing, it was possible that waiting for devices to show up in ipconfig was already over, when the device eventually showed up. By calling wait_for_device_probe() we now make sure deferred probing is done before checking for available devices. Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20	vsock/vmci: make vmci_vsock_cb_host_called static	Mao Wenan
	When using make C=2 drivers/misc/vmw_vmci/vmci_driver.o to compile, below warning can be seen: drivers/misc/vmw_vmci/vmci_driver.c:33:6: warning: symbol 'vmci_vsock_cb_host_called' was not declared. Should it be static? This patch make symbol vmci_vsock_cb_host_called static. Fixes: b1bba80a4376 ("vsock/vmci: register vmci_transport only when VMCI guest/host are active") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Mao Wenan <maowenan@huawei.com> Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20	Merge branch 'ibmvnic-regression'	David S. Miller
	Juliet Kim says: ==================== Support both XIVE and XICS modes in ibmvnic This series aims to support both XICS and XIVE with avoiding a regression in behavior when a system runs in XICS mode. Patch 1 reverts commit 11d49ce9f7946dfed4dcf5dbde865c78058b50ab (“net/ibmvnic: Fix EOI when running in XIVE mode.”) Patch 2 Ignore H_FUNCTION return from H_EOI to tolerate XIVE mode ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20	net/ibmvnic: Ignore H_FUNCTION return from H_EOI to tolerate XIVE mode	Juliet Kim
	Reversion of commit 11d49ce9f7946dfed4dcf5dbde865c78058b50ab (“net/ibmvnic: Fix EOI when running in XIVE mode.”) leaves us calling H_EOI even in XIVE mode. That will fail with H_FUNCTION because H_EOI is not supported in that mode. That failure is harmless. Ignore it so we can use common code for both XICS and XIVE. Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20	Revert "net/ibmvnic: Fix EOI when running in XIVE mode"	Juliet Kim
	This reverts commit 11d49ce9f7946dfed4dcf5dbde865c78058b50ab (“net/ibmvnic: Fix EOI when running in XIVE mode.”) since that has the unintended effect of changing the interrupt priority and emits warning when running in legacy XICS mode. Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20	Merge branch 'page_pool-DMA-sync'	David S. Miller
	Lorenzo Bianconi says: ==================== add DMA-sync-for-device capability to page_pool API Introduce the possibility to sync DMA memory for device in the page_pool API. This feature allows to sync proper DMA size and not always full buffer (dma_sync_single_for_device can be very costly). Please note DMA-sync-for-CPU is still device driver responsibility. Relying on page_pool DMA sync mvneta driver improves XDP_DROP pps of about 170Kpps: - XDP_DROP DMA sync managed by mvneta driver: ~420Kpps - XDP_DROP DMA sync managed by page_pool API: ~585Kpps Do not change naming convention for the moment since the changes will hit other drivers as well. I will address it in another series. Changes since v4: - do not allow the driver to set max_len to 0 - convert PP_FLAG_DMA_MAP/PP_FLAG_DMA_SYNC_DEV to BIT() macro Changes since v3: - move dma_sync_for_device before putting the page in ptr_ring in __page_pool_recycle_into_ring since ptr_ring can be consumed concurrently. Simplify the code moving dma_sync_for_device before running __page_pool_recycle_direct/__page_pool_recycle_into_ring Changes since v2: - rely on PP_FLAG_DMA_SYNC_DEV flag instead of dma_sync Changes since v1: - rename sync in dma_sync - set dma_sync_size to 0xFFFFFFFF in page_pool_recycle_direct and page_pool_put_page routines - Improve documentation ==================== Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20	net: mvneta: get rid of huge dma sync in mvneta_rx_refill	Lorenzo Bianconi
	Get rid of costly dma_sync_single_for_device in mvneta_rx_refill since now the driver can let page_pool API to manage needed DMA sync with a proper size. - XDP_DROP DMA sync managed by mvneta driver: ~420Kpps - XDP_DROP DMA sync managed by page_pool API: ~585Kpps Tested-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20	net: page_pool: add the possibility to sync DMA memory for device	Lorenzo Bianconi
	Introduce the following parameters in order to add the possibility to sync DMA memory for device before putting allocated pages in the page_pool caches: - PP_FLAG_DMA_SYNC_DEV: if set in page_pool_params flags, all pages that the driver gets from page_pool will be DMA-synced-for-device according to the length provided by the device driver. Please note DMA-sync-for-CPU is still device driver responsibility - offset: DMA address offset where the DMA engine starts copying rx data - max_len: maximum DMA memory size page_pool is allowed to flush. This is currently used in __page_pool_alloc_pages_slow routine when pages are allocated from page allocator These parameters are supposed to be set by device drivers. This optimization reduces the length of the DMA-sync-for-device. The optimization is valid because pages are initially DMA-synced-for-device as defined via max_len. At RX time, the driver will perform a DMA-sync-for-CPU on the memory for the packet length. What is important is the memory occupied by packet payload, because this is the area CPU is allowed to read and modify. As we don't track cache-lines written into by the CPU, simply use the packet payload length as dma_sync_size at page_pool recycle time. This also take into account any tail-extend. Tested-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20	net: mvneta: rely on page_pool_recycle_direct in mvneta_run_xdp	Lorenzo Bianconi
	Rely on page_pool_recycle_direct and not on xdp_return_buff in mvneta_run_xdp. This is a preliminary patch to limit the dma sync len to the one strictly necessary Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20	net/mlxfw: Verify FSM error code translation doesn't exceed array size	Eran Ben Elisha
	Array mlxfw_fsm_state_err_str contains value to string translation, when values are provided by mlxfw_dev. If value is larger than MLXFW_FSM_STATE_ERR_MAX, return "unknown error" as expected instead of reading an address than exceed array size. Fixes: 410ed13cae39 ("Add the mlxfw module for Mellanox firmware flash process") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20	net/mlx5: Update the list of the PCI supported devices	Shani Shapp
	Add the upcoming ConnectX-6 LX device ID. Fixes: 85327a9c4150 ("net/mlx5: Update the list of the PCI supported devices") Signed-off-by: Shani Shapp <shanish@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20	net/mlx5: Fix auto group size calculation	Maor Gottlieb
	Once all the large flow groups (defined by the user when the flow table is created - max_num_groups) were created, then all the following new flow groups will have only one flow table entry, even though the flow table has place to larger groups. Fix the condition to prefer large flow group. Fixes: f0d22d187473 ("net/mlx5_core: Introduce flow steering autogrouped flow table") Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20	net/mlx5e: Add missing capability bit check for IP-in-IP	Marina Varshaver
	Device that doesn't support IP-in-IP offloads has to filter csum and gso offload support, otherwise kernel will conclude that device is capable of offloading csum and gso for IP-in-IP tunnels and that might result in IP-in-IP tunnel not functioning. Fixes: 25948b87dda2 ("net/mlx5e: Support TSO and TX checksum offloads for IP-in-IP") Signed-off-by: Marina Varshaver <marinav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20	net/mlx5e: Do not use non-EXT link modes in EXT mode	Eran Ben Elisha
	On some old Firmwares, connector type value was not supported, and value read from FW was 0. For those, driver used link mode in order to set connector type in link_ksetting. After FW exposed the connector type, driver translated the value to ethtool definitions. However, as 0 is a valid value, before returning PORT_OTHER, driver run the check of link mode in order to maintain backward compatibility. Cited patch added support to EXT mode. With both features (connector type and EXT link modes) ,if connector_type read from FW is 0 and EXT mode is set, driver mistakenly compare EXT link modes to non-EXT link mode. Fixed that by skipping this comparison if we are in EXT mode, as connector type value is valid in this scenario. Fixes: 6a897372417e ("net/mlx5: ethtool, Add ethtool support for 50Gbps per lane link modes") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20	net/mlx5e: Fix set vf link state error flow	Roi Dayan
	Before this commit the ndo always returned success. Fix that. Fixes: 1ab2068a4c66 ("net/mlx5: Implement vports admin state backup/restore") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20	net/mlx5: DR, Limit STE hash table enlarge based on bytemask	Alex Vesker
	When an ste hash table has too many collision we enlarge it to a bigger hash table (rehash). Rehashing collision improvement depends on the bytemask value. The more 1 bits we have in bytemask means better spreading in the table. Without this fix tables can grow in size without providing any improvement which can lead to memory depletion and failures. This patch will limit table rehash to reduce memory and improve the performance. Fixes: 41d07074154c ("net/mlx5: DR, Expose steering rule functionality") Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20	net/mlx5: DR, Skip rehash for tables with byte mask zero	Alex Vesker
	The byte mask fields affect on the hash index distribution, when the byte mask is zero, the hash calculation will always be equal to the same index. To avoid unneeded rehash of hash tables mark the table to skip rehash. This is needed by the next patch which will limit table rehash to reduce memory consumption. Fixes: 41d07074154c ("net/mlx5: DR, Expose steering rule functionality") Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20	net/mlx5: DR, Fix invalid EQ vector number on CQ creation	Alex Vesker
	When creating a CQ, the CPU id is used for the vector value. This would fail in-case the CPU id was higher than the maximum vector value. Fixes: 297cccebdc5a ("net/mlx5: DR, Expose an internal API to issue RDMA operations") Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20	net/mlx5e: Reorder mirrer action parsing to check for encap first	Vlad Buslov
	Mirred action parsing code in parse_tc_fdb_actions() first checks if out_dev has same parent id, and only verifies that there is a pending encap action that was parsed before. Recent change in vxlan module made function netdev_port_same_parent_id() to return true when called for mlx5 eswitch representor and vxlan device created explicitly on mlx5 representor device (vxlan devices created with "external" flag without explicitly specifying parent interface are not affected). With call to netdev_port_same_parent_id() returning true, incorrect code path is chosen and encap rules fail to offload because vxlan dev is not a valid eswitch forwarding dev. Dmesg log of error: [ 1784.389797] devices ens1f0_0 vxlan1 not on same switch HW, can't offload forwarding In order to fix the issue, rearrange conditional in parse_tc_fdb_actions() to check for pending encap action before checking if out_dev has the same parent id. Fixes: 0ce1822c2a08 ("vxlan: add adjacent link to limit depth level") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20	net/mlx5e: Fix ingress rate configuration for representors	Eli Cohen
	Current code uses the old method of prio encoding in flow_cls_common_offload. Fix to follow the changes introduced in commit ef01adae0e43 ("net: sched: use major priority number as hardware priority"). Fixes: fcb64c0f5640 ("net/mlx5: E-Switch, add ingress rate support") Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20	net/mlx5e: Fix error flow cleanup in mlx5e_tc_tun_create_header_ipv4/6	Eli Cohen
	Be sure to release the neighbour in case of failures after successful route lookup. Fixes: 101f4de9dd52 ("net/mlx5e: Move TC tunnel offloading code to separate source file") Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-20	net: sched: pie: enable timestamp based delay calculation	Gautam Ramakrishnan
	RFC 8033 suggests an alternative approach to calculate the queue delay in PIE by using a timestamp on every enqueued packet. This patch adds an implementation of that approach and sets it as the default method to calculate queue delay. The previous method (based on Little's law) to calculate queue delay is set as optional. Signed-off-by: Gautam Ramakrishnan <gautamramk@gmail.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Acked-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>