git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2018-01-23	Merge branch '40GbE' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2018-01-23 This series contains updates to i40e and i40evf only. Pawel enables FlatNVM support on x722 devices by allowing nvmupdate tool to configure the preservation flags in the AdminQ command. Mitch fixes a potential divide by zero error when DCB is enabled and the firmware fails to configure the VSI, so check for this state. Fixed a bug where the driver could fail to adhere to ETS bandwidth allocations if 8 traffic classes were configured on the switch. Sudheer fixes a potential deadlock by avoiding to call flush_schedule_work() in i40evf_remove(), since cancel_work_sync() and cancel_delayed_work_sync() already cleans up necessary work items. Fixed an issue with the problematic detection and recovery from hung queues in the PF which was causing lost interrupts. This is done by triggering a software interrupt so that interrupts are forced on and if we are already in napi_poll and an interrupt fires, napi_poll will not be rescheduled and the interrupt is lost. Avinash fixes an issue in the VF where is was possible to issue a reset_task while the device is currently being removed. Michal fixes an issue occurring while calling i40e_led_set() with the blink parameter set to true, which was causing the activity LED instead of the link LED to blink for port identification. Shiraz changes the client interface to not call client close/open on netdev down/up events, since this causes a lot of thrash that is not needed. Instead, disable the PE TCP-ENA flag during a netdev down event and re-enable on a netdev up event, since this blocks all TCP traffic to the RDMA protocol engine. Alan fixes an issue which was causing a potential transmit hang by ignoring the PF link up message if the VF state is not yet in the RUNNING state. Amritha fixes the channel VSI recreation during the reset flow to reconfigure the transmit rings and the queue context associated with the channel VSI. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	vmxnet3: repair memory leak	Neil Horman
	with the introduction of commit b0eb57cb97e7837ebb746404c2c58c6f536f23fa, it appears that rq->buf_info is improperly handled. While it is heap allocated when an rx queue is setup, and freed when torn down, an old line of code in vmxnet3_rq_destroy was not properly removed, leading to rq->buf_info[0] being set to NULL prior to its being freed, causing a memory leak, which eventually exhausts the system on repeated create/destroy operations (for example, when the mtu of a vmxnet3 interface is changed frequently. Fix is pretty straight forward, just move the NULL set to after the free. Tested by myself with successful results Applies to net, and should likely be queued for stable, please Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-By: boyang@redhat.com CC: boyang@redhat.com CC: Shrikrishna Khare <skhare@vmware.com> CC: "VMware, Inc." <pv-drivers@vmware.com> CC: David S. Miller <davem@davemloft.net> Acked-by: Shrikrishna Khare <skhare@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL	Ben Hutchings
	Commit 513674b5a2c9 ("net: reevalulate autoflowlabel setting after sysctl setting") removed the initialisation of ipv6_pinfo::autoflowlabel and added a second flag to indicate whether this field or the net namespace default should be used. The getsockopt() handling for this case was not updated, so it currently returns 0 for all sockets for which IPV6_AUTOFLOWLABEL is not explicitly enabled. Fix it to return the effective value, whether that has been set at the socket or net namespace level. Fixes: 513674b5a2c9 ("net: reevalulate autoflowlabel setting after sysctl ...") Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	Merge branch 'act_csum-spinlock-remove'	David S. Miller
	Davide Caratti says: ==================== net/sched: remove spinlock from 'csum' action Similarly to what has been done earlier with other actions [1][2], this series tries to improve the performance of 'csum' tc action, removing a spinlock in the data path. Patch 1 lets act_csum use per-CPU counters; patch 2 removes spin_{,un}lock_bh() calls from the act() method. test procedure (using pktgen from https://github.com/netoptimizer): # ip link add name eth1 type dummy # ip link set dev eth1 up # tc qdisc add dev eth1 root handle 1: prio # for a in pass drop; do > tc filter del dev eth1 parent 1: pref 10 matchall action csum udp > tc filter add dev eth1 parent 1: pref 10 matchall action csum udp $a > for n in 2 4; do > ./pktgen_bench_xmit_mode_queue_xmit.sh -v -s 64 -t $n -n 1000000 -i eth1 > done > done test results: \| \| before patch \| after patch $a \| $n \| avg. pps/thread \| avg. pps/thread -----+----+-----------------+---------------- pass \| 2 \| 1671463 ± 4% \| 1920789 ± 3% pass \| 4 \| 648797 ± 1% \| 738190 ± 1% drop \| 2 \| 3212692 ± 2% \| 3719811 ± 2% drop \| 4 \| 1078824 ± 1% \| 1328099 ± 1% references: [1] https://www.spinics.net/lists/netdev/msg334760.html [2] https://www.spinics.net/lists/netdev/msg465862.html v3 changes: - use rtnl_dereference() in place of rcu_dereference() in tcf_csum_dump() v2 changes: - add 'drop' test, it produces more contentions - use RCU-protected struct to store 'action' and 'update_flags', to avoid reading the values from subsequent configurations ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	net/sched: act_csum: don't use spinlock in the fast path	Davide Caratti
	use RCU instead of spin_{,unlock}_bh() to protect concurrent read/write on act_csum configuration, to reduce the effects of contention in the data path when multiple readers are present. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	net/sched: act_csum: use per-core statistics	Davide Caratti
	use per-CPU counters, like other TC actions do, instead of maintaining one set of stats across all cores. This allows updating act_csum stats without the need of protecting them using spin_{,un}lock_bh() invocations. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	pppoe: take ->needed_headroom of lower device into account on xmit	Guillaume Nault
	In pppoe_sendmsg(), reserving dev->hard_header_len bytes of headroom was probably fine before the introduction of ->needed_headroom in commit f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom"). But now, virtual devices typically advertise the size of their overhead in dev->needed_headroom, so we must also take it into account in skb_reserve(). Allocation size of skb is also updated to take dev->needed_tailroom into account and replace the arbitrary 32 bytes with the real size of a PPPoE header. This issue was discovered by syzbot, who connected a pppoe socket to a gre device which had dev->header_ops->create == ipgre_header and dev->hard_header_len == 0. Therefore, PPPoE didn't reserve any headroom, and dev_hard_header() crashed when ipgre_header() tried to prepend its header to skb->data. skbuff: skb_under_panic: text:000000001d390b3a len:31 put:24 head:00000000d8ed776f data:000000008150e823 tail:0x7 end:0xc0 dev:gre0 ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:104! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 3670 Comm: syzkaller801466 Not tainted 4.15.0-rc7-next-20180115+ #97 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:skb_panic+0x162/0x1f0 net/core/skbuff.c:100 RSP: 0018:ffff8801d9bd7840 EFLAGS: 00010282 RAX: 0000000000000083 RBX: ffff8801d4f083c0 RCX: 0000000000000000 RDX: 0000000000000083 RSI: 1ffff1003b37ae92 RDI: ffffed003b37aefc RBP: ffff8801d9bd78a8 R08: 1ffff1003b37ae8a R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff86200de0 R13: ffffffff84a981ad R14: 0000000000000018 R15: ffff8801d2d34180 FS: 00000000019c4880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000208bc000 CR3: 00000001d9111001 CR4: 00000000001606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: skb_under_panic net/core/skbuff.c:114 [inline] skb_push+0xce/0xf0 net/core/skbuff.c:1714 ipgre_header+0x6d/0x4e0 net/ipv4/ip_gre.c:879 dev_hard_header include/linux/netdevice.h:2723 [inline] pppoe_sendmsg+0x58e/0x8b0 drivers/net/ppp/pppoe.c:890 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg+0xca/0x110 net/socket.c:640 sock_write_iter+0x31a/0x5d0 net/socket.c:909 call_write_iter include/linux/fs.h:1775 [inline] do_iter_readv_writev+0x525/0x7f0 fs/read_write.c:653 do_iter_write+0x154/0x540 fs/read_write.c:932 vfs_writev+0x18a/0x340 fs/read_write.c:977 do_writev+0xfc/0x2a0 fs/read_write.c:1012 SYSC_writev fs/read_write.c:1085 [inline] SyS_writev+0x27/0x30 fs/read_write.c:1082 entry_SYSCALL_64_fastpath+0x29/0xa0 Admittedly PPPoE shouldn't be allowed to run on non Ethernet-like interfaces, but reserving space for ->needed_headroom is a more fundamental issue that needs to be addressed first. Same problem exists for __pppoe_xmit(), which also needs to take dev->needed_headroom into account in skb_cow_head(). Fixes: f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom") Reported-by: syzbot+ed0838d0fa4c4f2b528e20286e6dc63effc7c14d@syzkaller.appspotmail.com Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	net: link_watch: mark bonding link events urgent	Roopa Prabhu
	It takes 1sec for bond link down notification to hit user-space when all slaves of the bond go down. 1sec is too long for protocol daemons in user-space relying on bond notification to recover (eg: multichassis lag implementations in user-space). Since the link event code already marks team device port link events as urgent, this patch moves the code to cover all lag ports and master. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24	cxl: Remove support for "Processing accelerators" class	Frederic Barrat
	The cxl driver currently declares in its table of supported PCI devices the class "Processing accelerators". Therefore it may be called to probe for opencapi devices, which generates errors, as the config space of a cxl device is not compatible with opencapi. So remove support for the generic class, as we now have (at least) two drivers for devices of the same class. Most cxl devices are FPGAs with a PSL which will show a known device ID of 0x477. Other devices are really supported by the cxlflash driver and are already listed in the table. So removing the class is expected to go unnoticed. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24	ocxl: Add Makefile and Kconfig	Frederic Barrat
	OCXL_BASE triggers the platform support needed by the driver. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24	ocxl: Add trace points	Frederic Barrat
	Define a few trace points so that we can use the standard tracing mechanism for debug and/or monitoring. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24	ocxl: Add a kernel API for other opencapi drivers	Frederic Barrat
	Some of the functions done by the generic driver should also be needed by other opencapi drivers: attaching a context to an adapter, translation fault handling, AFU interrupt allocation... So to avoid code duplication, the driver provides a kernel API that other drivers can use, similar to calling a in-kernel library. It is still a bit theoretical, for lack of real hardware, and will likely need adjustements down the road. But we used the cxlflash driver as a guinea pig. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24	ocxl: Add AFU interrupt support	Frederic Barrat
	Add user APIs through ioctl to allocate, free, and be notified of an AFU interrupt. For opencapi, an AFU can trigger an interrupt on the host by sending a specific command targeting a 64-bit object handle. On POWER9, this is implemented by mapping a special page in the address space of a process and a write to that page will trigger an interrupt. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24	ocxl: Driver code for 'generic' opencapi devices	Frederic Barrat
	Add an ocxl driver to handle generic opencapi devices. Of course, it's not meant to be the only opencapi driver, any device is free to implement its own. But if a host application only needs basic services like attaching to an opencapi adapter, have translation faults handled or allocate AFU interrupts, it should suffice. The AFU config space must follow the opencapi specification and use the expected vendor/device ID to be seen by the generic driver. The driver exposes the device AFUs as a char device in /dev/ocxl/ Note that the driver currently doesn't handle memory attached to the opencapi device. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Alastair D'Silva <alastair@d-silva.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24	powerpc/powernv: Capture actag information for the device	Frederic Barrat
	In the opencapi protocol, host memory contexts are referenced by a 'actag'. During setup, a driver must tell the device how many actags it can used, and what values are acceptable. On POWER9, the NPU can handle 64 actags per link, so they must be shared between all the PCI functions of the link. To get a global picture of how many actags are used by each AFU of every function, we capture some data at the end of PCI enumeration, so that actags can be shared fairly if needed. This is not powernv specific per say, but rather a consequence of the opencapi configuration specification being quite general. The number of available actags on POWER9 makes it more likely to be hit. This is somewhat mitigated by the fact that existing AFUs are coded by requesting a reasonable count of actags and existing devices carry only one AFU. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24	powerpc/powernv: Add platform-specific services for opencapi	Frederic Barrat
	Implement a few platform-specific calls which can be used by drivers: - provide the Transaction Layer capabilities of the host, so that the driver can find some common ground and configure the device and host appropriately. - provide the hw interrupt to be used for translation faults raised by the NPU - map/unmap some NPU mmio registers to get the fault context when the NPU raises an address translation fault The rest are wrappers around the previously-introduced opal calls. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24	powerpc/powernv: Add opal calls for opencapi	Frederic Barrat
	Add opal calls to interact with the NPU: OPAL_NPU_SPA_SETUP: set the Shared Process Area (SPA) The SPA is a table containing one entry (Process Element) per memory context which can be accessed by the opencapi device. OPAL_NPU_SPA_CLEAR_CACHE: clear the context cache The NPU keeps a cache of recently accessed memory contexts. When a Process Element is removed from the SPA, the cache for the link must be cleared. OPAL_NPU_TL_SET: configure the Transaction Layer The Transaction Layer specification defines several templates for messages to be exchanged on the link. During link setup, the host and device must negotiate what templates are supported on both sides and at what rates those messages can be sent. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24	powerpc/powernv: Set correct configuration space size for opencapi devices	Andrew Donnellan
	The configuration space for opencapi devices doesn't have a PCI Express capability, therefore confusing linux in thinking it's of an old PCI type with a 256-byte configuration space size, instead of the desired 4k. So add a PCI fixup to declare the correct size. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-24	powerpc/powernv: Introduce new PHB type for opencapi links	Frederic Barrat
	The NPU was already abstracted by opal as a virtual PHB for nvlink, but it helps to be able to differentiate between a nvlink or opencapi PHB, as it's not completely transparent to linux. In particular, PE assignment differs and we'll also need the information in later patches. So rename existing PNV_PHB_NPU type to PNV_PHB_NPU_NVLINK and add a new type PNV_PHB_NPU_OCAPI. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-23	Merge branch '10GbE' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 10GbE Intel Wired LAN Driver Updates 2018-01-23 This series contains updates to ixgbe only. Shannon Nelson provides an implementation of the ipsec hardware offload feature for the ixgbe driver for these devices: x540, x550, 82599. The ixgbe NICs support ipsec offload for 1024 Rx and 1024 Tx Security Associations (SAs), using up to 128 inbound IP addresses, and using the rfc4106(gcm(aes)) encryption. This code does not yet support checksum offload, or TSO in conjunction with the ipsec offload - those will be added in the future. This code shows improvements in both packet throughput and CPU utilization. For example, here are some quicky numbers that show the magnitude of the performance gain on a single run of "iperf -c <dest>" with the ipsec offload on both ends of a point-to-point connection: 9.4 Gbps - normal case 7.6 Gbps - ipsec with offload 343 Mbps - ipsec no offload ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	Merge branch 'GEHC-Bx50-Switch-Support'	David S. Miller
	Sebastian Reichel says: ==================== GEHC Bx50 Switch Support This adds support for the internal switch found in GE Healthcare B450v3, B650v3 and B850v3. All devices use a GPIO bitbanged MDIO bus to communicate with the switch and a PCIe based network card for exchanging network data. The cpu network data link requires, that the switch's internal phy interface is enabled, so support for that is added by the first patch in this series. The patch series is based on v4.15-rc8. Changes since PATCHv4: * Introduce dsa_port_link_(un)register_of and mark the fixed variant static. * Update patch description to describe the phy<->phy connection from i210 to the Marvell switch Changes since PATCHv3: * Enable the phy in dsa_port_setup() instead of abusing the fixed link setup function Changes since PATCHv2: * Add phy nodes to switch in bx50.dtsi and reference them from switch ports * Enable cpu-port's phy based on 'phy-handle' instead of 'phy-mode' Changes since PATCHv1: * Use 'marvell,mv88e6085' instead of introducing compatible string for mv88e6240. * Fix indention of DT nodes * Only enable 'cpu' phy, if explicitly set to "internal". ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	ARM: dts: imx6q-b450v3: Add switch port configuration	Sebastian Reichel
	This adds support for the Marvell switch and names the network ports according to the labels, that can be found next to the connectors. The switch is connected to the host system using a PCI based network card. The PCI bus configuration has been written using the following information: root@b450v3# lspci -tv -[0000:00]---00.0-[01]----00.0 Intel Corporation I210 Gigabit Network Connection root@b450v3# lspci -nn 00:00.0 PCI bridge [0604]: Synopsys, Inc. Device [16c3:abcd] (rev 01) 01:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03) Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	ARM: dts: imx6q-b650v3: Add switch port configuration	Sebastian Reichel
	This adds support for the Marvell switch and names the network ports according to the labels, that can be found next to the connectors. The switch is connected to the host system using a PCI based network card. The PCI bus configuration has been written using the following information: root@b650v3# lspci -tv -[0000:00]---00.0-[01]----00.0 Intel Corporation I210 Gigabit Network Connection root@b650v3# lspci -nn 00:00.0 PCI bridge [0604]: Synopsys, Inc. Device [16c3:abcd] (rev 01) 01:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03) Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	ARM: dts: imx6q-b850v3: Add switch port configuration	Sebastian Reichel
	This adds support for the Marvell switch and names the network ports according to the labels, that can be found next to the connectors ("ID", "IX", "ePort 1", "ePort 2"). The switch is connected to the host system using a PCI based network card. The PCI bus configuration has been written using the following information: root@b850v3# lspci -tv -[0000:00]---00.0-[01]----00.0-[02-05]--+-01.0-[03]----00.0 Intel Corporation I210 Gigabit Network Connection +-02.0-[04]----00.0 Intel Corporation I210 Gigabit Network Connection \-03.0-[05]-- root@b850v3# lspci -nn 00:00.0 PCI bridge [0604]: Synopsys, Inc. Device [16c3:abcd] (rev 01) 01:00.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8605 PCI Express 4-port Gen2 Switch [10b5:8605] (rev ab) 02:01.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8605 PCI Express 4-port Gen2 Switch [10b5:8605] (rev ab) 02:02.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8605 PCI Express 4-port Gen2 Switch [10b5:8605] (rev ab) 02:03.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8605 PCI Express 4-port Gen2 Switch [10b5:8605] (rev ab) 03:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03) 04:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03) Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	ARM: dts: imx6q-bx50v3: Add internal switch	Sebastian Reichel
	B850v3, B650v3 and B450v3 all have a GPIO bit banged MDIO bus to communicate with a Marvell switch. On all devices the switch is connected to a PCI based network card, which needs to be referenced by DT, so this also adds the common PCI root node. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	net: dsa: Support internal phy on 'cpu' port	Sebastian Reichel
	This adds support for enabling the internal PHY for a 'cpu' port. It has been tested on GE B850v3, B650v3 and B450v3, which have a built-in MV88E6240 switch hardwired to a PCIe based network card. On these machines the internal PHY of the i210 network card and the Marvell switch are connected to each other and must be enabled for properly using the switch. While the i210 PHY will be enabled when the network interface is enabled, the switch's port is not exposed as network interface. Additionally the mv88e6xxx driver resets the chip during probe, so the PHY is disabled without this patch. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	vsprintf: Do not have bprintf dereference pointers	Steven Rostedt (VMware)
	When trace_printk() was introduced, it was discussed that making it be as low overhead as possible, that the processing of the format string should be delayed until it is read. That is, a "trace_printk()" should not convert the %d into numbers and so on, but instead, save the fmt string and all the args in the buffer at the time of recording. When the trace_printk() data is read, it would then parse the format string and do the conversions of the saved arguments in the tracing buffer. The code to perform this was added to vsprintf where vbin_printf() would save the arguments of a specified format string in a buffer, then bstr_printf() could be used to convert the buffer with the same format string into the final output, as if vsprintf() was called in one go. The issue arises when dereferenced pointers are used. The problem is that something like %*pbl which reads a bitmask, will save the pointer to the bitmask in the buffer. Then the reading of the buffer via bstr_printf() will then look at the pointer to process the final output. Obviously the value of that pointer could have changed since the time it was recorded to the time the buffer is read. Worse yet, the bitmask could be unmapped, and the reading of the trace buffer could actually cause a kernel oops. Another problem is that user space tools such as perf and trace-cmd do not have access to the contents of these pointers, and they become useless when the tracing buffer is extracted. Instead of having vbin_printf() simply save the pointer in the buffer for later processing, have it perform the formatting at the time bin_printf() is called. This will fix the issue of dereferencing pointers at a later time, and has the extra benefit of having user space tools understand these values. Since perf and trace-cmd already can handle %p[sSfF] via saving kallsyms, their pointers are saved and not processed during vbin_printf(). If they were converted, it would break perf and trace-cmd, as they would not know how to deal with the conversion. Link: http://lkml.kernel.org/r/20171228204025.14a71d8f@gandalf.local.home Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23	ftrace: Mark function tracer test functions noinline/noclone	Andi Kleen
	The ftrace function tracer self tests calls some functions to verify the get traced. This relies on them not being inlined. Previously this was ensured by putting them into another file, but with LTO the compiler can inline across files, which makes the tests fail. Mark these functions as noinline and noclone. Link: http://lkml.kernel.org/r/20171221233732.31896-1-andi@firstfloor.org Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23	trace_uprobe: Display correct offset in uprobe_events	Ravi Bangoria
	Recently, how the pointers being printed with %p has been changed by commit ad67b74d2469 ("printk: hash addresses printed with %p"). This is causing a regression while showing offset in the uprobe_events file. Instead of %p, use %px to display offset. Before patch: # perf probe -vv -x /tmp/a.out main Opening /sys/kernel/debug/tracing//uprobe_events write=1 Writing event: p:probe_a/main /tmp/a.out:0x58c # cat /sys/kernel/debug/tracing/uprobe_events p:probe_a/main /tmp/a.out:0x0000000049a0f352 After patch: # cat /sys/kernel/debug/tracing/uprobe_events p:probe_a/main /tmp/a.out:0x000000000000058c Link: http://lkml.kernel.org/r/20180106054246.15375-1-ravi.bangoria@linux.vnet.ibm.com Cc: stable@vger.kernel.org Fixes: ad67b74d2469 ("printk: hash addresses printed with %p") Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23	tracing: Make sure the parsed string always terminates with '\0'	Changbin Du
	Always mark the parsed string with a terminated nul '\0' character. This removes the need for the users to have to append the '\0' before using the parsed string. Link: http://lkml.kernel.org/r/1516093350-12045-4-git-send-email-changbin.du@intel.com Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Changbin Du <changbin.du@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23	tracing: Clear parser->idx if only spaces are read	Changbin Du
	If only spaces were read while parsing the next string, then parser->idx should be cleared in order to make trace_parser_loaded() return false. Link: http://lkml.kernel.org/r/1516093350-12045-3-git-send-email-changbin.du@intel.com Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Changbin Du <changbin.du@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23	tracing: Detect the string nul character when parsing user input string	Changbin Du
	User space can pass in a C nul character '\0' along with its input. The function trace_get_user() will try to process it as a normal character, and that will fail to parse. open("/sys/kernel/debug/tracing//set_ftrace_pid", O_WRONLY\|O_TRUNC) = 3 write(3, " \0", 2) = -1 EINVAL (Invalid argument) while parse can handle spaces, so below works. $ echo "" > set_ftrace_pid $ echo " " > set_ftrace_pid $ echo -n " " > set_ftrace_pid Have the parser stop on '\0' and cease any further parsing. Only process the characters up to the nul '\0' character and do not process it. Link: http://lkml.kernel.org/r/1516093350-12045-2-git-send-email-changbin.du@intel.com Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Changbin Du <changbin.du@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23	tracing: Update stack trace skipping for ORC unwinder	Steven Rostedt (VMware)
	With the addition of ORC unwinder and FRAME POINTER unwinder, the stack trace skipping requirements have changed. I went through the tracing stack trace dumps with ORC and with frame pointers and recalculated the proper values. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23	ftrace, orc, x86: Handle ftrace dynamically allocated trampolines	Steven Rostedt (VMware)
	The function tracer can create a dynamically allocated trampoline that is called by the function mcount or fentry hook that is used to call the function callback that is registered. The problem is that the orc undwinder will bail if it encounters one of these trampolines. This breaks the stack trace of function callbacks, which include the stack tracer and setting the stack trace for individual functions. Since these dynamic trampolines are basically copies of the static ftrace trampolines defined in ftrace_*.S, we do not need to create new orc entries for the dynamic trampolines. Finding the return address on the stack will be identical as the functions that were copied to create the dynamic trampolines. When encountering a ftrace dynamic trampoline, we can just use the orc entry of the ftrace static function that was copied for that trampoline. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23	PCI: Add pci_enable_atomic_ops_to_root()	Jay Cornwall
	The Atomic Operations feature (PCIe r4.0, sec 6.15) allows atomic transctions to be requested by, routed through and completed by PCIe components. Routing and completion do not require software support. Component support for each is detectable via the DEVCAP2 register. A Requester may use AtomicOps only if its PCI_EXP_DEVCTL2_ATOMIC_REQ is set. This should be set only if the Completer and all intermediate routing elements support AtomicOps. A concrete example is the AMD Fiji-class GPU (which is capable of making AtomicOp requests), below a PLX 8747 switch (advertising AtomicOp routing) with a Haswell host bridge (advertising AtomicOp completion support). Add pci_enable_atomic_ops_to_root() for per-device control over AtomicOp requests. This checks to be sure the Root Port supports completion of the desired AtomicOp sizes and the path to the Root Port supports routing the AtomicOps. Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> [bhelgaas: changelog, comments, whitespace] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-01-23	Merge tag 'pci-v4.15-fixes-3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fix from Bjorn Helgaas: "Fix AMD regression due to not re-enabling the big window on resume (Christian König)" * tag 'pci-v4.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: x86/PCI: Enable AMD 64-bit window on resume
2018-01-23	PCI: Expose ari_enabled in sysfs	Stuart Hayes
	Some multifunction PCI devices with more than 8 functions use "alternative routing-ID interpretation" (ARI), which means the 8-bit device/function number field will be interpreted as 8 bits specifying the function number (the device number is 0 implicitly), rather than the upper 5 bits specifying the device number and the lower 3 bits specifying the function number. The kernel can enable and use this. Expose in a sysfs attribute whether the kernel has enabled ARI, so that a program in userspace won't have to parse PCI devices and PCI configuration space to figure out if it is enabled. This will allow better predictable network naming using PCI function numbers without using PCI bus or device numbers, which is desirable because bus and device numbers can change with system configuration but function numbers will not. Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-01-23	PCI: pciehp: Assume NoCompl+ for Thunderbolt ports	Lukas Wunner
	Certain Thunderbolt 1 controllers claim to support Command Completed events (value of 0b in the No Command Completed Support field of the Slot Capabilities register) but in reality they neither set the Command Completed bit in the Slot Status register nor signal a Command Completed interrupt: 8086:1513 CV82524 [Light Ridge 4C 2010] 8086:151a DSL2310 [Eagle Ridge 2C 2011] 8086:151b CVL2510 [Light Peak 2C 2010] 8086:1547 DSL3510 [Cactus Ridge 4C 2012] 8086:1548 DSL3310 [Cactus Ridge 2C 2012] 8086:1549 DSL2210 [Port Ridge 1C 2011] All known newer chips (Redwood Ridge and onwards) set No Command Completed Support, indicating that they do not support Command Completed events. The user-visible impact is that after unplugging such a device, 2 seconds elapse until pciehp is unbound. That's because on ->remove, pcie_write_cmd() is called via pcie_disable_notification() and every call to pcie_write_cmd() takes 2 seconds (1 second for each invocation of pcie_wait_cmd()): [ 337.942727] pciehp 0000:0a:00.0:pcie204: Timeout on hotplug command 0x1038 (issued 21176 msec ago) [ 340.014735] pciehp 0000:0a:00.0:pcie204: Timeout on hotplug command 0x0000 (issued 2072 msec ago) That by itself has always been unpleasant, but the situation has become worse with commit cc27b735ad3a ("PCI/portdrv: Turn off PCIe services during shutdown"): Now pciehp is unbound on ->shutdown. Because Thunderbolt controllers typically have 4 hotplug ports, every reboot and shutdown is now delayed by 8 seconds, plus another 2 seconds for every attached Thunderbolt 1 device. Thunderbolt hotplug slots are not physical slots that one inserts cards into, but rather logical hotplug slots implemented in silicon. Devices appear beyond those logical slots once a PCI tunnel is established on top of the Thunderbolt Converged I/O switch. One would expect commands written to the Slot Control register to be executed immediately by the silicon, so for simplicity we always assume NoCompl+ for Thunderbolt ports. Fixes: cc27b735ad3a ("PCI/portdrv: Turn off PCIe services during shutdown") Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: stable@vger.kernel.org # v4.12+ Cc: Sinan Kaya <okaya@codeaurora.org> Cc: Yehezkel Bernat <yehezkel.bernat@intel.com> Cc: Michael Jamet <michael.jamet@intel.com> Cc: Andreas Noever <andreas.noever@gmail.com>
2018-01-23	arm64: Turn on KPTI only on CPUs that need it	Jayachandran C
	Whitelist Broadcom Vulcan/Cavium ThunderX2 processors in unmap_kernel_at_el0(). These CPUs are not vulnerable to CVE-2017-5754 and do not need KPTI when KASLR is off. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Jayachandran C <jnair@caviumnetworks.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-23	arm64: Branch predictor hardening for Cavium ThunderX2	Jayachandran C
	Use PSCI based mitigation for speculative execution attacks targeting the branch predictor. We use the same mechanism as the one used for Cortex-A CPUs, we expect the PSCI version call to have a side effect of clearing the BTBs. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Jayachandran C <jnair@caviumnetworks.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-01-23	Merge tag 'nfs-rdma-for-4.16-1' of ↵	Trond Myklebust
	git://git.linux-nfs.org/projects/anna/linux-nfs NFS-over-RDMA client updates for Linux 4.16 New features: - xprtrdma tracepoints Bugfixes and cleanups: - Fix memory leak if rpcrdma_buffer_create() fails - Fix allocating extra rpcrdma_reps for the backchannel - Remove various unused and redundant variables and lock cycles - Fix IPv6 support in xprt_rdma_set_port() - Fix memory leak by calling buf_free for callback replies - Fix "bytes registered" accounting - Fix kernel-doc comments - SUNRPC tracepoint cleanups for consistent information - Optimizations for __rpc_execute()
2018-01-23	i40e: Fix channel addition in reset flow	Amritha Nambiar
	Fix recreating the channel VSIs during the reset flow to reconfigure the Tx rings and the queue context associated with the channel VSI. Also update the next_base_queue for the VSI while rebuilding the channel VSIs after a reset. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	i40evf: ignore link up if not running	Alan Brady
	If we receive the link status message from PF with link up before queues are actually enabled, it will trigger a TX hang. This fixes the issue by ignoring a link up message if the VF state is not yet in RUNNING state. Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	i40e: Delete an error message for a failed memory allocation in ↵	Markus Elfring
	i40e_init_interrupt_scheme() Omit an extra message for a memory allocation failure in this function. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	i40e: Disable iWARP VSI PETCP_ENA flag on netdev down events	Shiraz Saleem
	Client close is overloaded to handle both un-registration and netdev down event. On netdev down, i40iw client close is called which unregisters the RDMA dev and this is too destructive since the netdev is still registered. Do not call client close/open on netdev down/up events. Instead disable the PE TCP_ENA flag during a netdev down event. This blocks all TCP traffic to the RDMA Protocol Engine. On netdev up, re-enable the flag. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	i40e: simplify pointer dereferences	Mitch Williams
	Now that i40e_vsi_config_tc() has the pf and hw variable defined, use them, instead of dereferencing vsi->back. Much easier to read. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	i40e: check for invalid DCB config	Mitch Williams
	The driver (and the entire netdev layer for that matter) assumes that TC0 will always be present in our DCB configuration. Unfortunately, this isn't always the case. Rather than fail to configure the VSI, let's go ahead and try to make it work, even though DCB will end up being disabled by the kernel. If the driver fails to configure DCB, the driver queries what's valid, then writes that back to the hardware, always forcing TC0. This fixes a bug where the driver could fail to adhere to ETS BW allocations if 8 TCs were configured on the switch. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	i40e/i40evf: Detect and recover hung queue scenario	Sudheer Mogilappagari
	In VFs, there is a known issue which can cause writebacks to not occur when interrupts are disabled and there are less than 4 descriptors resulting in TX timeout. Timeout can also occur due to lost interrupt. The current implementation for detecting and recovering from hung queues in the PF is problematic because it actually actively encourages lost interrupts. By triggering a SW interrupt, interrupts are forced on. If we are already in napi_poll and an interrupt fires, napi_poll will not be rescheduled and the interrupt is effectively lost; thereby potentially causing hung queues. This patch checks whether packets are being processed between every watchdog cycle and determine potential hung queue and fires triggers SW interrupt only for that particular queue. Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	i40e: Fix for blinking activity instead of link LEDs	Michal Kuchta
	This fix solves an issue occurring while calling i40e_led_set function from the driver with "blink" parameter set as TRUE. This call resulted in Activity LED blinking instead of Link LED, which may lead to errors in physically identifying the port, since Activity LED may be blinking for different reasons as well. Signed-off-by: Michal Kuchta <michal.kuchta@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	i40evf: Don't schedule reset_task when device is being removed	Avinash Dayanand
	When a host disables and enables a PF device, all the associated VFs are removed and added back in. It also generates a PFR which in turn resets all the connected VFs. This behaviour is different from that of Linux guest on Linux host. Hence we end up in a situation where there's a PFR and device removal at the same time. And watchdog doesn't have a clue about this and schedules a reset_task. This patch adds code to send signal to reset_task that the device is currently being removed. Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>