summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-05-16scsi: lpfc: Fix NVME I+T not registering NVME as a supported FC4 typeJames Smart
When the driver send the RPA command, it does not send supported FC4 Type NVME to the management server. Encode NVME (type x28) in the AttribEntry in the RPA command. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-05-16scsi: lpfc: Added recovery logic for running out of NVMET IO context resourcesJames Smart
Previous logic would just drop the IO. Added logic to queue the IO to wait for an IO context resource from an IO thats already in progress. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-05-16scsi: lpfc: Separate NVMET RQ buffer posting from IO resources SGL/iocbq/contextJames Smart
Currently IO resources are mapped 1 to 1 with RQ buffers posted Added logic to separate RQE buffers from IO op resources (sgl/iocbq/context). During initialization, the driver will determine how many SGLs it will allocate for NVMET (based on what the firmware reports) and associate a NVMET IOCBq and NVMET context structure with each one. Now that hdr/data buffers are immediately reposted back to the RQ, 512 RQEs for each MRQ is sufficient. Also, since NVMET data buffers are now 128 bytes, lpfc_nvmet_mrq_post is not necessary anymore as we will always post the max (512) buffers per NVMET MRQ. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-05-16scsi: lpfc: Separate NVMET data buffer pool fir ELS/CT.James Smart
Using 2048 byte buffer and onle 128 bytes is needed. Create nee LFPC_NVMET_DATA_BUF_SIZE define to use for NVMET RQ/MRQs. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-05-16scsi: lpfc: Fix NMI watchdog assertions when running nvmet IOPS testsJames Smart
After running IOPS test for 30 second we get kernel:NMI watchdog: Watchdog detected hard LOCKUP on cpu 0 The driver is speend too much time in its ISR. In ISR EQ and CQ processing routines, if we hit the entry_repost numbers of EQE/CQEs just break out of the routine as opposed to hitting the doorbell with NOARM and continue processing. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-05-16scsi: lpfc: Fix NVMEI driver not decrementing counter causing bad rport state.James Smart
During driver boot, a latency in the NVMET driver side causes the incoming NVMEI PRLI to get rejected by the NVMET driver. When this happens, the NVMEI driver runs out of PRLI retries. Bouncing the link does not fix the situation. If the NVMEI driver decides, on PRLI completion failures, to retry the PRLI, always decrement the fc4_prli_sent counter. This allows the PRLI completion to resolve to UNMAPPED when NVMET rejects the PRLI. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-05-16scsi: lpfc: Fix nvmet RQ resource needs for large block writes.James Smart
Large block writes to the nvme target were failing because the default number of RQs posted was insufficient. Expand the NVMET RQs to 2048 RQEs and ensure a minimum of 512 RQEs are posted, no matter how many MRQs are configured. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-05-16scsi: lpfc: Adding additional stats counters for nvme.James Smart
More debug messages added for nvme statistics. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-05-16scsi: lpfc: Fix system crash when port is reset.James Smart
The driver panic when using the els_wq during port reset. Check for NULL els_wq before dereferencing. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-05-16scsi: lpfc: Fix used-RPI accounting problem.James Smart
With 255 vports created a link trasition can casue a crash. When going through discovery after a link bounce the driver is using rpis before the cmd FCOE_POST_HDR_TEMPLATES completes. By doing that the next rpi bumps the rpi range out of the boundary. The fix it to increment the next_rpi only when the FCOE_POST_HDR_TEMPLATE succeeds. Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-05-16scsi: libfc: fix incorrect variable assignmentGustavo A. R. Silva
Previous assignment was causing the use of the uninitialized variable _explan_ inside fc_seq_ls_rjt() function, which in this particular case is being called by fc_seq_els_rsp_send(). [mkp: fixed typo] Addresses-Coverity-ID: 1398125 Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-05-16scsi: sd: Ignore sync cache failures when not supportedDerek Basehore
Some external hard drives don't support the sync command even though the hard drive has write cache enabled. In this case, upon suspend request, sync cache failures are ignored if the error code in the sense header is ILLEGAL_REQUEST. There's not much we can do for these drives, so we shouldn't fail to suspend for this error case. The drive may stay powered if that's the setup for the port it's plugged into. Signed-off-by: Derek Basehore <dbasehore@chromium.org> Signed-off-by: Thierry Escande <thierry.escande@collabora.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-05-17drm/nouveau/fifo/gk104-: Silence a locking warningDan Carpenter
Presumably we can never actually hit this return, but static checkers complain that we should unlock before we return. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2017-05-17drm/nouveau/secboot: plug memory leak in ls_ucode_img_load_gr() error pathChristophe JAILLET
The last goto looks spurious because it releases less resources than the previous one. Also free 'img->sig' if 'ls_ucode_img_build()' fails. Fixes: 9d896f3e41a6 ("drm/nouveau/secboot: abstract LS firmware loading functions") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2017-05-17drm/nouveau: Fix drm poll_helper handlingPeter Ujfalusi
Commit cae9ff036eea effectively disabled the drm poll_helper by checking the wrong flag to see if the driver should enable the poll or not: mode_config.poll_enabled is only set to true by poll_init and it is not indicating if the poll is enabled or not. nouveau_display_create() will initialize the poll and going to disable it right away. After poll_init() the mode_config.poll_enabled will be true, but the poll itself is disabled. To avoid the race caused by calling the poll_enable() from different paths, this patch will enable the poll from one place, in the nouveau_display_hpd_work(). In case the pm_runtime is disabled we will enable the poll in nouveau_drm_load() once. Fixes: cae9ff036eea ("drm/nouveau: Don't enabling polling twice on runtime resume") Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Reviewed-by: Lyude <lyude@redhat.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2017-05-16i2c: mv64xxx: don't override deferred probing when getting irqThomas Petazzoni
There is no reason to use platform_get_irq() for non-DT probing and irq_of_parse_and_map() for DT probing. Indeed, platform_get_irq() works fine for both. In addition, using platform_get_irq() properly returns -EPROBE_DEFER when the interrupt controller is not yet available, so instead of inventing our own error code (-ENXIO), return the one provided by platform_get_irq(). Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2017-05-16uio: fix incorrect memory leak cleanupSuman Anna
Commit 75f0aef6220d ("uio: fix memory leak") has fixed up some memory leaks during the failure paths of the addition of uio attributes, but still is not correct entirely. A kobject_uevent() failure still needs a kobject_put() and the kobject container structure allocation failure before the kobject_init() doesn't need a kobject_put(). Fix this properly. Fixes: 75f0aef6220d ("uio: fix memory leak") Signed-off-by: Suman Anna <s-anna@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-16misc: pci_endpoint_test: select CRC32Tobias Regnery
There is the following link error with CONFIG_PCI_ENDPOINT_TEST=y and CONFIG_CRC32=m: drivers/built-in.o: In function 'pci_endpoint_test_ioctl': pci_endpoint_test.c:(.text+0xf1251): undefined reference to 'crc32_le' pci_endpoint_test.c:(.text+0xf1322): undefined reference to 'crc32_le' pci_endpoint_test.c:(.text+0xf13b2): undefined reference to 'crc32_le' pci_endpoint_test.c:(.text+0xf141e): undefined reference to 'crc32_le' Fix this by selecting CRC32 in the PCI_ENDPOINT_TEST kconfig entry. Fixes: 2c156ac71c6b ("misc: Add host side PCI driver for PCI test function device") Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-16char: lp: fix possible integer overflow in lp_setup()Willy Tarreau
The lp_setup() code doesn't apply any bounds checking when passing "lp=none", and only in this case, resulting in an overflow of the parport_nr[] array. All versions in Git history are affected. Reported-By: Roee Hay <roee.hay@hcl.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: stable@vger.kernel.org Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-16Merge tag 'pstore-v4.12-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore fix from Kees Cook: "Fix bad EFI vars iterator usage" * tag 'pstore-v4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: efi-pstore: Fix read iter after pstore API refactor
2017-05-16liquidio: use pcie_flr instead of duplicating itChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16net: phy: Remove residual magic from PHY driversAndrew Lunn
commit fa8cddaf903c ("net phylib: Remove unnecessary condition check in phy") removed the only place where the PHY flag PHY_HAS_MAGICANEG was checked. But it left the flag being set in the drivers. Remove the flag. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16bnx2x: Remove open coded carrier checkLeon Romanovsky
There is inline function to test if carrier present, so it makes open-coded solution redundant. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16mlx5e: add CONFIG_INET dependencyArnd Bergmann
We now reference the arp_tbl, which requires IPv4 support to be enabled in the kernel, otherwise we get a link error: drivers/net/built-in.o: In function `mlx5e_tc_update_neigh_used_value': (.text+0x16afec): undefined reference to `arp_tbl' drivers/net/built-in.o: In function `mlx5e_rep_neigh_init': en_rep.c:(.text+0x16c16d): undefined reference to `arp_tbl' drivers/net/built-in.o: In function `mlx5e_rep_netevent_event': en_rep.c:(.text+0x16cbb5): undefined reference to `arp_tbl' This adds a Kconfig dependency for it. Fixes: 232c001398ae ("net/mlx5e: Add support to neighbour update flow") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16tcp: internal implementation for pacingEric Dumazet
BBR congestion control depends on pacing, and pacing is currently handled by sch_fq packet scheduler for performance reasons, and also because implemening pacing with FQ was convenient to truly avoid bursts. However there are many cases where this packet scheduler constraint is not practical. - Many linux hosts are not focusing on handling thousands of TCP flows in the most efficient way. - Some routers use fq_codel or other AQM, but still would like to use BBR for the few TCP flows they initiate/terminate. This patch implements an automatic fallback to internal pacing. Pacing is requested either by BBR or use of SO_MAX_PACING_RATE option. If sch_fq happens to be in the egress path, pacing is delegated to the qdisc, otherwise pacing is done by TCP itself. One advantage of pacing from TCP stack is to get more precise rtt estimations, and less work done from TX completion, since TCP Small queue limits are not generally hit. Setups with single TX queue but many cpus might even benefit from this. Note that unlike sch_fq, we do not take into account header sizes. Taking care of these headers would add additional complexity for no practical differences in behavior. Some performance numbers using 800 TCP_STREAM flows rate limited to ~48 Mbit per second on 40Gbit NIC. If MQ+pfifo_fast is used on the NIC : $ sar -n DEV 1 5 | grep eth 14:48:44 eth0 725743.00 2932134.00 46776.76 4335184.68 0.00 0.00 1.00 14:48:45 eth0 725349.00 2932112.00 46751.86 4335158.90 0.00 0.00 0.00 14:48:46 eth0 725101.00 2931153.00 46735.07 4333748.63 0.00 0.00 0.00 14:48:47 eth0 725099.00 2931161.00 46735.11 4333760.44 0.00 0.00 1.00 14:48:48 eth0 725160.00 2931731.00 46738.88 4334606.07 0.00 0.00 0.00 Average: eth0 725290.40 2931658.20 46747.54 4334491.74 0.00 0.00 0.40 $ vmstat 1 5 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 4 0 0 259825920 45644 2708324 0 0 21 2 247 98 0 0 100 0 0 4 0 0 259823744 45644 2708356 0 0 0 0 2400825 159843 0 19 81 0 0 0 0 0 259824208 45644 2708072 0 0 0 0 2407351 159929 0 19 81 0 0 1 0 0 259824592 45644 2708128 0 0 0 0 2405183 160386 0 19 80 0 0 1 0 0 259824272 45644 2707868 0 0 0 32 2396361 158037 0 19 81 0 0 Now use MQ+FQ : lpaa23:~# echo fq >/proc/sys/net/core/default_qdisc lpaa23:~# tc qdisc replace dev eth0 root mq $ sar -n DEV 1 5 | grep eth 14:49:57 eth0 678614.00 2727930.00 43739.13 4033279.14 0.00 0.00 0.00 14:49:58 eth0 677620.00 2723971.00 43674.69 4027429.62 0.00 0.00 1.00 14:49:59 eth0 676396.00 2719050.00 43596.83 4020125.02 0.00 0.00 0.00 14:50:00 eth0 675197.00 2714173.00 43518.62 4012938.90 0.00 0.00 1.00 14:50:01 eth0 676388.00 2719063.00 43595.47 4020171.64 0.00 0.00 0.00 Average: eth0 676843.00 2720837.40 43624.95 4022788.86 0.00 0.00 0.40 $ vmstat 1 5 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 2 0 0 259832240 46008 2710912 0 0 21 2 223 192 0 1 99 0 0 1 0 0 259832896 46008 2710744 0 0 0 0 1702206 198078 0 17 82 0 0 0 0 0 259830272 46008 2710596 0 0 0 0 1696340 197756 1 17 83 0 0 4 0 0 259829168 46024 2710584 0 0 16 0 1688472 197158 1 17 82 0 0 3 0 0 259830224 46024 2710408 0 0 0 0 1692450 197212 0 18 82 0 0 As expected, number of interrupts per second is very different. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Van Jacobson <vanj@google.com> Cc: Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16Merge branch 'udp-scalability-improvements'David S. Miller
Paolo Abeni says: ==================== udp: scalability improvements This patch series implement an idea suggested by Eric Dumazet to reduce the contention of the udp sk_receive_queue lock when the socket is under flood. An ancillary queue is added to the udp socket, and the socket always tries first to read packets from such queue. If it's empty, we splice the content from sk_receive_queue into the ancillary queue. The first patch introduces some helpers to keep the udp code small, and the following two implement the ancillary queue strategy. The code is split to hopefully help the reviewing process. The measured overall gain under udp flood is up to the 30% depending on the numa layout and the number of ingress queue used by the relevant nic. The performance numbers have been gathered using pktgen as sender, with 64 bytes packets, random src port on a host b2b connected via a 10Gbs link with the dut. The receiver used the udp_sink program by Jesper [1] and an h/w l4 rx hash on the ingress nic, so that the number of ingress nic rx queues hit by the udp traffic could be controlled via ethtool -L. The udp_sink program was bound to the first idle cpu, to get more stable numbers. On a single numa node receiver: nic rx queues vanilla patched kernel 1 1820 kpps 1900 kpps 2 1950 kpps 2500 kpps 16 1670 kpps 2120 kpps When using a single nic rx queue, busy polling was also enabled, elsewhere, in the above scenario, the bh processing becomes the bottle-neck and this produces large artifacts in the measured performances (e.g. improving the udp sink run time, decreases the overall tput, since more action from the scheduler comes into play). [1] https://github.com/netoptimizer/network-testing/blob/master/src/udp_sink.c v1 -> v2: Patches 1/3 and 2/3 are unchanged, in patch 3/3 the rx_queue_lock_held param of udp_rmem_release() is now a bool. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16udp: keep the sk_receive_queue held when splicingPaolo Abeni
On packet reception, when we are forced to splice the sk_receive_queue, we can keep the related lock held, so that we can avoid re-acquiring it, if fwd memory scheduling is required. v1 -> v2: the rx_queue_lock_held param in udp_rmem_release() is now a bool Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16udp: use a separate rx queue for packet receptionPaolo Abeni
under udp flood the sk_receive_queue spinlock is heavily contended. This patch try to reduce the contention on such lock adding a second receive queue to the udp sockets; recvmsg() looks first in such queue and, only if empty, tries to fetch the data from sk_receive_queue. The latter is spliced into the newly added queue every time the receive path has to acquire the sk_receive_queue lock. The accounting of forward allocated memory is still protected with the sk_receive_queue lock, so udp_rmem_release() needs to acquire both locks when the forward deficit is flushed. On specific scenarios we can end up acquiring and releasing the sk_receive_queue lock multiple times; that will be covered by the next patch Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16net/sock: factor out dequeue/peek with offset codePaolo Abeni
And update __sk_queue_drop_skb() to work on the specified queue. This will help the udp protocol to use an additional private rx queue in a later patch. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16xfs: only return detailed fsmap info if the caller has CAP_SYS_ADMINDarrick J. Wong
There were a number of handwaving complaints that one could "possibly" use inode numbers and extent maps to fingerprint a filesystem hosting multiple containers and somehow use the information to guess at the contents of other containers and attack them. Despite the total lack of any demonstration that this is actually possible, it's easier to restrict access now and broaden it later, so use the rmapbt fsmap backends only if the caller has CAP_SYS_ADMIN. Unprivileged users will just have to make do with only getting the free space and static metadata placement information. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2017-05-16KVM: x86: lower default for halt_poll_nsPaolo Bonzini
In some fio benchmarks, halt_poll_ns=400000 caused CPU utilization to increase heavily even in cases where the performance improvement was small. In particular, bandwidth divided by CPU usage was as much as 60% lower. To some extent this is the expected effect of the patch, and the additional CPU utilization is only visible when running the benchmarks. However, halving the threshold also halves the extra CPU utilization (from +30-130% to +20-70%) and has no negative effect on performance. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-05-16dm bufio: make the parameter "retain_bytes" unsigned longMikulas Patocka
Change the type of the parameter "retain_bytes" from unsigned to unsigned long, so that on 64-bit machines the user can set more than 4GiB of data to be retained. Also, change the type of the variable "count" in the function "__evict_old_buffers" to unsigned long. The assignment "count = c->n_buffers[LIST_CLEAN] + c->n_buffers[LIST_DIRTY];" could result in unsigned long to unsigned overflow and that could result in buffers not being freed when they should. While at it, avoid division in get_retain_buffers(). Division is slow, we can change it to shift because we have precalculated the log2 of block size. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-05-16net: Improve handling of failures on link and route dumpsDavid Ahern
In general, rtnetlink dumps do not anticipate failure to dump a single object (e.g., link or route) on a single pass. As both route and link objects have grown via more attributes, that is no longer a given. netlink dumps can handle a failure if the dump function returns an error; specifically, netlink_dump adds the return code to the response if it is <= 0 so userspace is notified of the failure. The missing piece is the rtnetlink dump functions returning the error. Fix route and link dump functions to return the errors if no object is added to an skb (detected by skb->len != 0). IPv6 route dumps (rt6_dump_route) already return the error; this patch updates IPv4 and link dumps. Other dump functions may need to be ajusted as well. Reported-by: Jan Moskyto Matejka <mq@ucw.cz> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16net/smc: Add warning about remote memory exposureChristoph Hellwig
The driver explicitly bypasses APIs to register all memory once a connection is made, and thus allows remote access to memory. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Leon Romanovsky <leon@kernel.org> Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16smc: switch to usage of IB_PD_UNSAFE_GLOBAL_RKEYUrsula Braun
Currently, SMC enables remote access to physical memory when a user has successfully configured and established an SMC-connection until ten minutes after the last SMC connection is closed. Because this is considered a security risk, drivers are supposed to use IB_PD_UNSAFE_GLOBAL_RKEY in such a case. This patch changes the current SMC code to use IB_PD_UNSAFE_GLOBAL_RKEY. This improves user awareness, but does not remove the security risk itself. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16efi-pstore: Fix read iter after pstore API refactorKees Cook
During the internal pstore API refactoring, the EFI vars read entry was accidentally made to update a stack variable instead of the pstore private data pointer. This corrects the problem (and removes the now needless argument). Fixes: 125cc42baf8a ("pstore: Replace arguments for read() API") Signed-off-by: Kees Cook <keescook@chromium.org>
2017-05-16Merge branch 'nfp-LSO-checksum-and-XDP-datapath-updates'David S. Miller
Jakub Kicinski says: ==================== nfp: LSO, checksum and XDP datapath updates This series introduces a number of refinements to standard features like LSO and checksum offload. Three major features are support for CHECKSUM_COMPLETE, refinement of TSO handling and another small speed up for XDP TX. This series also switches from depending on some app FW<>driver ABI versions to heavier use of capabilities. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16nfp: eliminate an if statement in calculation of completed framesJakub Kicinski
Given that our rings are always a power of 2, we can simplify the calculation of number of completed TX descriptors by using masking instead of if statement based on whether the index have wrapped or not. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16nfp: add a helper for wrapping descriptor indexJakub Kicinski
We have a number of places where we calculate the descriptor index based on a value which may have overflown. Create a macro for masking with the ring size. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16nfp: complete the XDP TX ring only when it's fullJakub Kicinski
Since XDP TX ring holds "spare" RX buffers anyway, we don't have to rush the completion. We can wait until ring fills up completely before trying to reclaim buffers. If RX poll has ended an no buffer has been queued for XDP TX we have no guarantee we will see another interrupt, so run the reclaim there as well, to make sure TX statistics won't become stale. This should help us reclaim more buffers per single queue controller register read. Note that the XDP completion is very trivial, it only adds up the sizes of transmitted frames for statistics so the latency spike should be acceptable. In case user sets the ring sizes to something crazy, limit the completion to 2k entries. The check if the ring is empty at the beginning of xdp_complete() is no longer needed - the callers will perform it. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16nfp: add CHECKSUM_COMPLETE supportJakub Kicinski
Introduce NFP_NET_CFG_CTRL_CSUM_COMPLETE capability and implement parsing of CHECKSUM_COMPLETE metadata. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Edwin Peer <edwin.peer@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16nfp: version independent support for chained RSS metadataEdwin Peer
ABI version 4 introduced metadata chaining. Using the ABI version to signal metadata chaining precludes firmware that advertises new capabilities which rely on prepended metadata from working on older kernels. Capability bits are thus better suited to signalling the chained metadata format. A new version of the RSS capability is introduced to distinguish between the differing metadata formats for ABI versions other than 4. Signed-off-by: Edwin Peer <edwin.peer@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16nfp: don't assume RSS and IRQ moderation are always enabledJakub Kicinski
Even if capability for RSS and IRQ moderation are present we may have not initialized them for control vNIC. Depend on selected features mask (ctrl) rather than capabilities (cap) to determine which features should be enabled. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16nfp: support LSO2 capabilityEdwin Peer
Firmware advertising the LSO2 capability exploits driver provided L3 and L4 offsets in order to avoid parsing packet headers in the TX path. The vlan field in struct nfp_net_tx_desc is repurposed, making TXVLAN a mutually exclusive configuration to LSO2. Signed-off-by: Edwin Peer <edwin.peer@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16nfp: rename l4_offset in struct nfp_net_tx_desc to lso_hdrlenEdwin Peer
The l4_offset field referred to by NFD is confusingly named. It is not the offset of the L4 transport header, but rather the L4 payload. The LSO2 capability supported by alternative device firmware requires the actual L4 offset, thus the rename seems prudent. Signed-off-by: Edwin Peer <edwin.peer@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16nfp: don't enable TSO on the device when disabledJakub Kicinski
We advertise TSO to the stack but leave it disabled by default. Make sure it's not only disabled in the netdev features but also on the device itself. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16Merge branch 'i2c-mux/for-current' of https://github.com/peda-r/i2c-mux into ↵Wolfram Sang
i2c/for-current Pull bugfixes from the i2c mux subsubsystem: This fixes an old bug in resource cleanup on failure in i2c-mux-reg and a new log spamming bug from this merge window in the i2c-mux core.
2017-05-16ipmr: vrf: Find VIFs using the actual deviceThomas Winter
The skb->dev that is passed into ip_mr_input is the loX device for VRFs. When we lookup a vif for this dev, none is found as we do not create vifs for loopbacks. Instead lookup a vif for the actual device that the packet was received on, eg the vlan. Signed-off-by: Thomas Winter <Thomas.Winter@alliedtelesis.co.nz> cc: David Ahern <dsa@cumulusnetworks.com> cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> cc: roopa <roopa@cumulusnetworks.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16tcp: eliminate negative reordering in tcp_clean_rtx_queueSoheil Hassas Yeganeh
tcp_ack() can call tcp_fragment() which may dededuct the value tp->fackets_out when MSS changes. When prior_fackets is larger than tp->fackets_out, tcp_clean_rtx_queue() can invoke tcp_update_reordering() with negative values. This results in absurd tp->reodering values higher than sysctl_tcp_max_reordering. Note that tcp_update_reordering indeeds sets tp->reordering to min(sysctl_tcp_max_reordering, metric), but because the comparison is signed, a negative metric always wins. Fixes: c7caf8d3ed7a ("[TCP]: Fix reord detection due to snd_una covered holes") Reported-by: Rebecca Isaacs <risaacs@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Martin Schwidefsky: - convert the debug feature to refcount_t - reduce the copy size for strncpy_from_user - 8 bug fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/virtio: change virtio_feature_desc:features type to __le32 s390: convert debug_info.ref_count from atomic_t to refcount_t s390: move _text symbol to address higher than zero s390/qdio: increase string buffer size s390/ccwgroup: increase string buffer size s390/topology: let topology_mnest_limit() return unsigned char s390/uaccess: use sane length for __strncpy_from_user() s390/uprobes: fix compile for !KPROBES s390/ftrace: fix compile for !MODULES s390/cputime: fix incorrect system time