linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2018-01-23	netdevsim: add extack support for TC eBPF offload	Quentin Monnet
	Use the recently added extack support for TC eBPF filters in netdevsim. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	Merge branch '40GbE' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2018-01-23 This series contains updates to i40e and i40evf only. Pawel enables FlatNVM support on x722 devices by allowing nvmupdate tool to configure the preservation flags in the AdminQ command. Mitch fixes a potential divide by zero error when DCB is enabled and the firmware fails to configure the VSI, so check for this state. Fixed a bug where the driver could fail to adhere to ETS bandwidth allocations if 8 traffic classes were configured on the switch. Sudheer fixes a potential deadlock by avoiding to call flush_schedule_work() in i40evf_remove(), since cancel_work_sync() and cancel_delayed_work_sync() already cleans up necessary work items. Fixed an issue with the problematic detection and recovery from hung queues in the PF which was causing lost interrupts. This is done by triggering a software interrupt so that interrupts are forced on and if we are already in napi_poll and an interrupt fires, napi_poll will not be rescheduled and the interrupt is lost. Avinash fixes an issue in the VF where is was possible to issue a reset_task while the device is currently being removed. Michal fixes an issue occurring while calling i40e_led_set() with the blink parameter set to true, which was causing the activity LED instead of the link LED to blink for port identification. Shiraz changes the client interface to not call client close/open on netdev down/up events, since this causes a lot of thrash that is not needed. Instead, disable the PE TCP-ENA flag during a netdev down event and re-enable on a netdev up event, since this blocks all TCP traffic to the RDMA protocol engine. Alan fixes an issue which was causing a potential transmit hang by ignoring the PF link up message if the VF state is not yet in the RUNNING state. Amritha fixes the channel VSI recreation during the reset flow to reconfigure the transmit rings and the queue context associated with the channel VSI. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	vmxnet3: repair memory leak	Neil Horman
	with the introduction of commit b0eb57cb97e7837ebb746404c2c58c6f536f23fa, it appears that rq->buf_info is improperly handled. While it is heap allocated when an rx queue is setup, and freed when torn down, an old line of code in vmxnet3_rq_destroy was not properly removed, leading to rq->buf_info[0] being set to NULL prior to its being freed, causing a memory leak, which eventually exhausts the system on repeated create/destroy operations (for example, when the mtu of a vmxnet3 interface is changed frequently. Fix is pretty straight forward, just move the NULL set to after the free. Tested by myself with successful results Applies to net, and should likely be queued for stable, please Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-By: boyang@redhat.com CC: boyang@redhat.com CC: Shrikrishna Khare <skhare@vmware.com> CC: "VMware, Inc." <pv-drivers@vmware.com> CC: David S. Miller <davem@davemloft.net> Acked-by: Shrikrishna Khare <skhare@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL	Ben Hutchings
	Commit 513674b5a2c9 ("net: reevalulate autoflowlabel setting after sysctl setting") removed the initialisation of ipv6_pinfo::autoflowlabel and added a second flag to indicate whether this field or the net namespace default should be used. The getsockopt() handling for this case was not updated, so it currently returns 0 for all sockets for which IPV6_AUTOFLOWLABEL is not explicitly enabled. Fix it to return the effective value, whether that has been set at the socket or net namespace level. Fixes: 513674b5a2c9 ("net: reevalulate autoflowlabel setting after sysctl ...") Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	Merge branch 'act_csum-spinlock-remove'	David S. Miller
	Davide Caratti says: ==================== net/sched: remove spinlock from 'csum' action Similarly to what has been done earlier with other actions [1][2], this series tries to improve the performance of 'csum' tc action, removing a spinlock in the data path. Patch 1 lets act_csum use per-CPU counters; patch 2 removes spin_{,un}lock_bh() calls from the act() method. test procedure (using pktgen from https://github.com/netoptimizer): # ip link add name eth1 type dummy # ip link set dev eth1 up # tc qdisc add dev eth1 root handle 1: prio # for a in pass drop; do > tc filter del dev eth1 parent 1: pref 10 matchall action csum udp > tc filter add dev eth1 parent 1: pref 10 matchall action csum udp $a > for n in 2 4; do > ./pktgen_bench_xmit_mode_queue_xmit.sh -v -s 64 -t $n -n 1000000 -i eth1 > done > done test results: \| \| before patch \| after patch $a \| $n \| avg. pps/thread \| avg. pps/thread -----+----+-----------------+---------------- pass \| 2 \| 1671463 ± 4% \| 1920789 ± 3% pass \| 4 \| 648797 ± 1% \| 738190 ± 1% drop \| 2 \| 3212692 ± 2% \| 3719811 ± 2% drop \| 4 \| 1078824 ± 1% \| 1328099 ± 1% references: [1] https://www.spinics.net/lists/netdev/msg334760.html [2] https://www.spinics.net/lists/netdev/msg465862.html v3 changes: - use rtnl_dereference() in place of rcu_dereference() in tcf_csum_dump() v2 changes: - add 'drop' test, it produces more contentions - use RCU-protected struct to store 'action' and 'update_flags', to avoid reading the values from subsequent configurations ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	net/sched: act_csum: don't use spinlock in the fast path	Davide Caratti
	use RCU instead of spin_{,unlock}_bh() to protect concurrent read/write on act_csum configuration, to reduce the effects of contention in the data path when multiple readers are present. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	net/sched: act_csum: use per-core statistics	Davide Caratti
	use per-CPU counters, like other TC actions do, instead of maintaining one set of stats across all cores. This allows updating act_csum stats without the need of protecting them using spin_{,un}lock_bh() invocations. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	pppoe: take ->needed_headroom of lower device into account on xmit	Guillaume Nault
	In pppoe_sendmsg(), reserving dev->hard_header_len bytes of headroom was probably fine before the introduction of ->needed_headroom in commit f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom"). But now, virtual devices typically advertise the size of their overhead in dev->needed_headroom, so we must also take it into account in skb_reserve(). Allocation size of skb is also updated to take dev->needed_tailroom into account and replace the arbitrary 32 bytes with the real size of a PPPoE header. This issue was discovered by syzbot, who connected a pppoe socket to a gre device which had dev->header_ops->create == ipgre_header and dev->hard_header_len == 0. Therefore, PPPoE didn't reserve any headroom, and dev_hard_header() crashed when ipgre_header() tried to prepend its header to skb->data. skbuff: skb_under_panic: text:000000001d390b3a len:31 put:24 head:00000000d8ed776f data:000000008150e823 tail:0x7 end:0xc0 dev:gre0 ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:104! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 3670 Comm: syzkaller801466 Not tainted 4.15.0-rc7-next-20180115+ #97 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:skb_panic+0x162/0x1f0 net/core/skbuff.c:100 RSP: 0018:ffff8801d9bd7840 EFLAGS: 00010282 RAX: 0000000000000083 RBX: ffff8801d4f083c0 RCX: 0000000000000000 RDX: 0000000000000083 RSI: 1ffff1003b37ae92 RDI: ffffed003b37aefc RBP: ffff8801d9bd78a8 R08: 1ffff1003b37ae8a R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff86200de0 R13: ffffffff84a981ad R14: 0000000000000018 R15: ffff8801d2d34180 FS: 00000000019c4880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000208bc000 CR3: 00000001d9111001 CR4: 00000000001606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: skb_under_panic net/core/skbuff.c:114 [inline] skb_push+0xce/0xf0 net/core/skbuff.c:1714 ipgre_header+0x6d/0x4e0 net/ipv4/ip_gre.c:879 dev_hard_header include/linux/netdevice.h:2723 [inline] pppoe_sendmsg+0x58e/0x8b0 drivers/net/ppp/pppoe.c:890 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg+0xca/0x110 net/socket.c:640 sock_write_iter+0x31a/0x5d0 net/socket.c:909 call_write_iter include/linux/fs.h:1775 [inline] do_iter_readv_writev+0x525/0x7f0 fs/read_write.c:653 do_iter_write+0x154/0x540 fs/read_write.c:932 vfs_writev+0x18a/0x340 fs/read_write.c:977 do_writev+0xfc/0x2a0 fs/read_write.c:1012 SYSC_writev fs/read_write.c:1085 [inline] SyS_writev+0x27/0x30 fs/read_write.c:1082 entry_SYSCALL_64_fastpath+0x29/0xa0 Admittedly PPPoE shouldn't be allowed to run on non Ethernet-like interfaces, but reserving space for ->needed_headroom is a more fundamental issue that needs to be addressed first. Same problem exists for __pppoe_xmit(), which also needs to take dev->needed_headroom into account in skb_cow_head(). Fixes: f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom") Reported-by: syzbot+ed0838d0fa4c4f2b528e20286e6dc63effc7c14d@syzkaller.appspotmail.com Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	net: link_watch: mark bonding link events urgent	Roopa Prabhu
	It takes 1sec for bond link down notification to hit user-space when all slaves of the bond go down. 1sec is too long for protocol daemons in user-space relying on bond notification to recover (eg: multichassis lag implementations in user-space). Since the link event code already marks team device port link events as urgent, this patch moves the code to cover all lag ports and master. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	Merge branch '10GbE' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 10GbE Intel Wired LAN Driver Updates 2018-01-23 This series contains updates to ixgbe only. Shannon Nelson provides an implementation of the ipsec hardware offload feature for the ixgbe driver for these devices: x540, x550, 82599. The ixgbe NICs support ipsec offload for 1024 Rx and 1024 Tx Security Associations (SAs), using up to 128 inbound IP addresses, and using the rfc4106(gcm(aes)) encryption. This code does not yet support checksum offload, or TSO in conjunction with the ipsec offload - those will be added in the future. This code shows improvements in both packet throughput and CPU utilization. For example, here are some quicky numbers that show the magnitude of the performance gain on a single run of "iperf -c <dest>" with the ipsec offload on both ends of a point-to-point connection: 9.4 Gbps - normal case 7.6 Gbps - ipsec with offload 343 Mbps - ipsec no offload ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	Merge branch 'GEHC-Bx50-Switch-Support'	David S. Miller
	Sebastian Reichel says: ==================== GEHC Bx50 Switch Support This adds support for the internal switch found in GE Healthcare B450v3, B650v3 and B850v3. All devices use a GPIO bitbanged MDIO bus to communicate with the switch and a PCIe based network card for exchanging network data. The cpu network data link requires, that the switch's internal phy interface is enabled, so support for that is added by the first patch in this series. The patch series is based on v4.15-rc8. Changes since PATCHv4: * Introduce dsa_port_link_(un)register_of and mark the fixed variant static. * Update patch description to describe the phy<->phy connection from i210 to the Marvell switch Changes since PATCHv3: * Enable the phy in dsa_port_setup() instead of abusing the fixed link setup function Changes since PATCHv2: * Add phy nodes to switch in bx50.dtsi and reference them from switch ports * Enable cpu-port's phy based on 'phy-handle' instead of 'phy-mode' Changes since PATCHv1: * Use 'marvell,mv88e6085' instead of introducing compatible string for mv88e6240. * Fix indention of DT nodes * Only enable 'cpu' phy, if explicitly set to "internal". ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	ARM: dts: imx6q-b450v3: Add switch port configuration	Sebastian Reichel
	This adds support for the Marvell switch and names the network ports according to the labels, that can be found next to the connectors. The switch is connected to the host system using a PCI based network card. The PCI bus configuration has been written using the following information: root@b450v3# lspci -tv -[0000:00]---00.0-[01]----00.0 Intel Corporation I210 Gigabit Network Connection root@b450v3# lspci -nn 00:00.0 PCI bridge [0604]: Synopsys, Inc. Device [16c3:abcd] (rev 01) 01:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03) Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	ARM: dts: imx6q-b650v3: Add switch port configuration	Sebastian Reichel
	This adds support for the Marvell switch and names the network ports according to the labels, that can be found next to the connectors. The switch is connected to the host system using a PCI based network card. The PCI bus configuration has been written using the following information: root@b650v3# lspci -tv -[0000:00]---00.0-[01]----00.0 Intel Corporation I210 Gigabit Network Connection root@b650v3# lspci -nn 00:00.0 PCI bridge [0604]: Synopsys, Inc. Device [16c3:abcd] (rev 01) 01:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03) Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	ARM: dts: imx6q-b850v3: Add switch port configuration	Sebastian Reichel
	This adds support for the Marvell switch and names the network ports according to the labels, that can be found next to the connectors ("ID", "IX", "ePort 1", "ePort 2"). The switch is connected to the host system using a PCI based network card. The PCI bus configuration has been written using the following information: root@b850v3# lspci -tv -[0000:00]---00.0-[01]----00.0-[02-05]--+-01.0-[03]----00.0 Intel Corporation I210 Gigabit Network Connection +-02.0-[04]----00.0 Intel Corporation I210 Gigabit Network Connection \-03.0-[05]-- root@b850v3# lspci -nn 00:00.0 PCI bridge [0604]: Synopsys, Inc. Device [16c3:abcd] (rev 01) 01:00.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8605 PCI Express 4-port Gen2 Switch [10b5:8605] (rev ab) 02:01.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8605 PCI Express 4-port Gen2 Switch [10b5:8605] (rev ab) 02:02.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8605 PCI Express 4-port Gen2 Switch [10b5:8605] (rev ab) 02:03.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8605 PCI Express 4-port Gen2 Switch [10b5:8605] (rev ab) 03:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03) 04:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03) Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	ARM: dts: imx6q-bx50v3: Add internal switch	Sebastian Reichel
	B850v3, B650v3 and B450v3 all have a GPIO bit banged MDIO bus to communicate with a Marvell switch. On all devices the switch is connected to a PCI based network card, which needs to be referenced by DT, so this also adds the common PCI root node. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	net: dsa: Support internal phy on 'cpu' port	Sebastian Reichel
	This adds support for enabling the internal PHY for a 'cpu' port. It has been tested on GE B850v3, B650v3 and B450v3, which have a built-in MV88E6240 switch hardwired to a PCIe based network card. On these machines the internal PHY of the i210 network card and the Marvell switch are connected to each other and must be enabled for properly using the switch. While the i210 PHY will be enabled when the network interface is enabled, the switch's port is not exposed as network interface. Additionally the mv88e6xxx driver resets the chip during probe, so the PHY is disabled without this patch. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	tracing: Update stack trace skipping for ORC unwinder	Steven Rostedt (VMware)
	With the addition of ORC unwinder and FRAME POINTER unwinder, the stack trace skipping requirements have changed. I went through the tracing stack trace dumps with ORC and with frame pointers and recalculated the proper values. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23	ftrace, orc, x86: Handle ftrace dynamically allocated trampolines	Steven Rostedt (VMware)
	The function tracer can create a dynamically allocated trampoline that is called by the function mcount or fentry hook that is used to call the function callback that is registered. The problem is that the orc undwinder will bail if it encounters one of these trampolines. This breaks the stack trace of function callbacks, which include the stack tracer and setting the stack trace for individual functions. Since these dynamic trampolines are basically copies of the static ftrace trampolines defined in ftrace_*.S, we do not need to create new orc entries for the dynamic trampolines. Finding the return address on the stack will be identical as the functions that were copied to create the dynamic trampolines. When encountering a ftrace dynamic trampoline, we can just use the orc entry of the ftrace static function that was copied for that trampoline. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23	Merge tag 'pci-v4.15-fixes-3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fix from Bjorn Helgaas: "Fix AMD regression due to not re-enabling the big window on resume (Christian König)" * tag 'pci-v4.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: x86/PCI: Enable AMD 64-bit window on resume
2018-01-23	i40e: Fix channel addition in reset flow	Amritha Nambiar
	Fix recreating the channel VSIs during the reset flow to reconfigure the Tx rings and the queue context associated with the channel VSI. Also update the next_base_queue for the VSI while rebuilding the channel VSIs after a reset. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	i40evf: ignore link up if not running	Alan Brady
	If we receive the link status message from PF with link up before queues are actually enabled, it will trigger a TX hang. This fixes the issue by ignoring a link up message if the VF state is not yet in RUNNING state. Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	i40e: Delete an error message for a failed memory allocation in ↵	Markus Elfring
	i40e_init_interrupt_scheme() Omit an extra message for a memory allocation failure in this function. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	i40e: Disable iWARP VSI PETCP_ENA flag on netdev down events	Shiraz Saleem
	Client close is overloaded to handle both un-registration and netdev down event. On netdev down, i40iw client close is called which unregisters the RDMA dev and this is too destructive since the netdev is still registered. Do not call client close/open on netdev down/up events. Instead disable the PE TCP_ENA flag during a netdev down event. This blocks all TCP traffic to the RDMA Protocol Engine. On netdev up, re-enable the flag. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	i40e: simplify pointer dereferences	Mitch Williams
	Now that i40e_vsi_config_tc() has the pf and hw variable defined, use them, instead of dereferencing vsi->back. Much easier to read. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	i40e: check for invalid DCB config	Mitch Williams
	The driver (and the entire netdev layer for that matter) assumes that TC0 will always be present in our DCB configuration. Unfortunately, this isn't always the case. Rather than fail to configure the VSI, let's go ahead and try to make it work, even though DCB will end up being disabled by the kernel. If the driver fails to configure DCB, the driver queries what's valid, then writes that back to the hardware, always forcing TC0. This fixes a bug where the driver could fail to adhere to ETS BW allocations if 8 TCs were configured on the switch. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	i40e/i40evf: Detect and recover hung queue scenario	Sudheer Mogilappagari
	In VFs, there is a known issue which can cause writebacks to not occur when interrupts are disabled and there are less than 4 descriptors resulting in TX timeout. Timeout can also occur due to lost interrupt. The current implementation for detecting and recovering from hung queues in the PF is problematic because it actually actively encourages lost interrupts. By triggering a SW interrupt, interrupts are forced on. If we are already in napi_poll and an interrupt fires, napi_poll will not be rescheduled and the interrupt is effectively lost; thereby potentially causing hung queues. This patch checks whether packets are being processed between every watchdog cycle and determine potential hung queue and fires triggers SW interrupt only for that particular queue. Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	i40e: Fix for blinking activity instead of link LEDs	Michal Kuchta
	This fix solves an issue occurring while calling i40e_led_set function from the driver with "blink" parameter set as TRUE. This call resulted in Activity LED blinking instead of Link LED, which may lead to errors in physically identifying the port, since Activity LED may be blinking for different reasons as well. Signed-off-by: Michal Kuchta <michal.kuchta@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	i40evf: Don't schedule reset_task when device is being removed	Avinash Dayanand
	When a host disables and enables a PF device, all the associated VFs are removed and added back in. It also generates a PFR which in turn resets all the connected VFs. This behaviour is different from that of Linux guest on Linux host. Hence we end up in a situation where there's a PFR and device removal at the same time. And watchdog doesn't have a clue about this and schedules a reset_task. This patch adds code to send signal to reset_task that the device is currently being removed. Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	i40evf: remove flush_scheduled_work call in i40evf_remove	Sudheer Mogilappagari
	flush_schedule_work blocks until completion of all scheduled work items in global work-queue. This can cause deadlock in some cases. i40evf_remove() cleans up necessary work items with cancel_delayed_work_sync and cancel_work_sync. This fix removes flush_schedule_work call inside i40evf_remove(). Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	i40e: avoid divide by zero	Mitch Williams
	In some weird circumstances with DCB enabled, the firmware can fail to configure the VSI, leaving us with zero traffic classes. Check for this state when we configure RSS to avoid a panic. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	i40e/i40evf: Enable NVMUpdate to retrieve AdminQ and add preservation flags ↵	Pawel Jablonski
	for NVM update This patch adds new I40E_NVMUPD_GET_AQ_EVENT state to allow retrieval of AdminQ events as a result of AdminQ commands sent to firmware. Add preservation flags support on X722 devices for NVM update AdminQ function wrapper. Add new parameter and handling to nvmupdate admin queue function intended to allow nvmupdate tool to configure the preservation flags in the AdminQ command. This is required to implement FlatNVM on X722 devices. Signed-off-by: Pawel Jablonski <pawel.jablonski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller
	en_rx_am.c was deleted in 'net-next' but had a bug fixed in it in 'net'. The esp{4,6}_offload.c conflicts were overlapping changes. The 'out' label is removed so we just return ERR_PTR(-EINVAL) directly. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23	x86/ftrace: Fix ORC unwinding from ftrace handlers	Josh Poimboeuf
	Steven Rostedt discovered that the ftrace stack tracer is broken when it's used with the ORC unwinder. The problem is that objtool is instructed by the Makefile to ignore the ftrace_64.S code, so it doesn't generate any ORC data for it. Fix it by making the asm code objtool-friendly: - Objtool doesn't like the fact that save_mcount_regs pushes RBP at the beginning, but it's never restored (directly, at least). So just skip the original RBP push, which is only needed for frame pointers anyway. - Annotate some functions as normal callable functions with ENTRY/ENDPROC. - Add an empty unwind hint to return_to_handler(). The return address isn't on the stack, so there's nothing ORC can do there. It will just punt in the unlikely case it tries to unwind from that code. With all that fixed, remove the OBJECT_FILES_NON_STANDARD Makefile annotation so objtool can read the file. Link: http://lkml.kernel.org/r/20180123040746.ih4ep3tk4pbjvg7c@treble Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-23	bpf: test_maps: cleanup sockmaps when test ends	Prashant Bhole
	Bug: BPF programs and maps related to sockmaps test exist in memory even after test_maps ends. This patch fixes it as a short term workaround (sockmap kernel side needs real fixing) by empyting sockmaps when test ends. Fixes: 6f6d33f3b3d0f ("bpf: selftests add sockmap tests") Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> [ daniel: Note on workaround. ] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-23	ixgbe: register ipsec offload with the xfrm subsystem	Shannon Nelson
	With all the support code in place we can now link in the ipsec offload operations and set the ESP feature flag for the XFRM subsystem to see. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	ixgbe: ipsec offload stats	Shannon Nelson
	Add a simple statistic to count the ipsec offloads. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	ixgbe: process the Tx ipsec offload	Shannon Nelson
	If the skb has a security association referenced in the skb, then set up the Tx descriptor with the ipsec offload bits. While we're here, we fix an oddly named field in the context descriptor struct. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	ixgbe: process the Rx ipsec offload	Shannon Nelson
	If the chip sees and decrypts an ipsec offload, set up the skb sp pointer with the ralated SA info. Since the chip is rude enough to keep to itself the table index it used for the decryption, we have to do our own table lookup, using the hash for speed. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	selftests/bpf: fix test_dev_cgroup	Alexei Starovoitov
	The test incorrectly doing mkdir /mnt/cgroup-test-work-dirtest-bpf-based-device-cgroup instead of mkdir /mnt/cgroup-test-work-dir/test-bpf-based-device-cgroup somehow such mkdir succeeds and new directory appears: /mnt/cgroup-test-work-dir/cgroup-test-work-dirtest-bpf-based-device-cgroup Later cleanup via nftw("/mnt/cgroup-test-work-dir", ...); doesn't walk this directory. "rmdir /mnt/cgroup-test-work-dir" succeeds, but bpf program and dangling cgroup stays in memory. That's a separate issue on a cgroup side. For now fix the test. Fixes: 37f1ba0909df ("selftests/bpf: add a test for device cgroup controller") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-23	ixgbe: restore offloaded SAs after a reset	Shannon Nelson
	On a chip reset most of the table contents are lost, so must be restored. This scans the driver's ipsec tables and restores both the filled and empty table slots to their pre-reset values. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	selftests/bpf: speedup test_maps	Alexei Starovoitov
	test_hashmap_walk takes very long time on debug kernel with kasan on. Reduce the number of iterations in this test without sacrificing test coverage. Also add printfs as progress indicator. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-23	ixgbe: add ipsec offload add and remove SA	Shannon Nelson
	Add the functions for setting up and removing offloaded SAs (Security Associations) with the x540 hardware. We set up the callback structure but we don't yet set the hardware feature bit to be sure the XFRM service won't actually try to use us for an offload yet. The software tables are made up to mimic the hardware tables to make it easier to track what's in the hardware, and the SA table index is used for the XFRM offload handle. However, there is a hashing field in the Rx SA tracking that will be used to facilitate faster table searches in the Rx fast path. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	ixgbe: add ipsec data structures	Shannon Nelson
	Set up the data structures to be used by the ipsec offload. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	ixgbe: add ipsec engine start and stop routines	Shannon Nelson
	Add in the code for running and stopping the hardware ipsec encryption/decryption engine. It is good to keep the engine off when not in use in order to save on the power draw. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	ixgbe: add ipsec register access routines	Shannon Nelson
	Add a few routines to make access to the ipsec registers just a little easier, and throw in the beginnings of an initialization. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	Linus Torvalds
	Pull networking fixes from David Miller: 1) Fix divide by zero in mlx5, from Talut Batheesh. 2) Guard against invalid GSO packets coming from untrusted guests and arriving in qdisc_pkt_len_init(), from Eric Dumazet. 3) Similarly add such protection to the various protocol GSO handlers. From Willem de Bruijn. 4) Fix regression added to IGMP source address checking for IGMPv3 reports, from Felix Feitkau. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: tls: Correct length of scatterlist in tls_sw_sendpage be2net: restore properly promisc mode after queues reconfiguration net: igmp: fix source address check for IGMPv3 reports gso: validate gso_type in GSO handlers net: qdisc_pkt_len_init() should be more robust ibmvnic: Allocate and request vpd in init_resources ibmvnic: Revert to previous mtu when unsupported value requested ibmvnic: Modify buffer size and number of queues on failover rds: tcp: compute m_ack_seq as offset from ->write_seq usbnet: silence an unnecessary warning cxgb4: fix endianness for vlan value in cxgb4_tc_flower cxgb4: set filter type to 1 for ETH_P_IPV6 net/mlx5e: Fix fixpoint divide exception in mlx5e_am_stats_compare
2018-01-23	ixgbe: clean up ipsec defines	Shannon Nelson
	Clean up the ipsec/macsec descriptor bit definitions to match the rest of the defines and file organization. Also recognise the bit-definition overlap in the error mask macro. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23	tools/bpf: fix a test failure in selftests prog test_verifier	Yonghong Song
	Commit 111e6b45315c ("selftests/bpf: make test_verifier run most programs") enables tools/testing/selftests/bpf/test_verifier unit cases to run via bpf_prog_test_run command. With the latest code base, test_verifier had one test case failure: ... #473/p check deducing bounds from const, 2 FAIL retval 1 != 0 0: (b7) r0 = 1 1: (75) if r0 s>= 0x1 goto pc+1 R0=inv1 R1=ctx(id=0,off=0,imm=0) R10=fp0,call_-1 2: (95) exit from 1 to 3: R0=inv1 R1=ctx(id=0,off=0,imm=0) R10=fp0,call_-1 3: (d5) if r0 s<= 0x1 goto pc+1 R0=inv1 R1=ctx(id=0,off=0,imm=0) R10=fp0,call_-1 4: (95) exit from 3 to 5: R0=inv1 R1=ctx(id=0,off=0,imm=0) R10=fp0,call_-1 5: (1f) r1 -= r0 6: (95) exit processed 7 insns (limit 131072), stack depth 0 ... The test case does not set return value in the test structure and hence the return value from the prog run is assumed to be 0. However, the actual return value is 1. As a result, the test failed. The fix is to correctly set the return value in the test structure. Fixes: 111e6b45315c ("selftests/bpf: make test_verifier run most programs") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-23	bpf: fix incorrect kmalloc usage in lpm_trie MAP_GET_NEXT_KEY rcu region	Yonghong Song
	In commit b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map"), the implemented MAP_GET_NEXT_KEY callback function is guarded with rcu read lock. In the function body, "kmalloc(size, GFP_USER \| __GFP_NOWARN)" is used which may sleep and violate rcu read lock region requirements. This patch fixed the issue by using GFP_ATOMIC instead to avoid blocking kmalloc. Tested with CONFIG_DEBUG_ATOMIC_SLEEP=y as suggested by Eric Dumazet. Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map") Signed-off-by: Yonghong Song <yhs@fb.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-23	sctp: reset ret in again path in sctp_for_each_transport	Xin Long
	Commit 97a6ec4ac021 ("rhashtable: Change rhashtable_walk_start to return void") only initialized ret for the first time, when going to again path, the next tsp could be NULL. Without resetting ret, cb_done would be called with tsp as NULL. A kernel crash was caused by this when running sctpdiag testcase in sctp-tests. Note that this issue doesn't affect net.git yet. Fixes: 97a6ec4ac021 ("rhashtable: Change rhashtable_walk_start to return void") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>