summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-03-25gtp: support SGSN-side tunnelsJonas Bonn
The GTP-tunnel driver is explicitly GGSN-side as it searches for PDP contexts based on the incoming packets _destination_ address. If we want to place ourselves on the SGSN side of the tunnel, then we want to be identifying PDP contexts based on _source_ address. Let it be noted that in a "real" configuration this module would never be used: the SGSN normally does not see IP packets as input. The justification for this functionality is for PGW load-testing applications where the input to the SGSN is locally generally IP traffic. This patch adds a "role" argument at GTP-link creation time to specify whether we are on the GGSN or SGSN side of the tunnel; this flag is then used to determine which part of the IP packet to use in determining the PDP context. Signed-off-by: Jonas Bonn <jonas@southpole.se> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Harald Welte <laforge@gnumonks.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-25gtp: rename SGSN netlink attributeJonas Bonn
This is a mostly cosmetic rename of the SGSN netlink attribute to the GTP link. The justification for this is that we will be making the module support decapsulation of "downstream" SGSN packets, in which case the netlink parameter actually refers to the upstream GGSN peer. Renaming the parameter makes the relationship clearer. The legacy name is maintained as a define in the header file in order to not break existing code. Signed-off-by: Jonas Bonn <jonas@southpole.se> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Harald Welte <laforge@gnumonks.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-25net: hns: avoid gcc-7.0.1 warning for uninitialized dataArnd Bergmann
hns_dsaf_set_mac_key() calls dsaf_set_field() on an uninitialized field, which will then change only a few of its bits, causing a warning with the latest gcc: hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_set_mac_uc_entry': hisilicon/hns/hns_dsaf_reg.h:1046:12: error: 'mac_key.low.bits.port_vlan' may be used uninitialized in this function [-Werror=maybe-uninitialized] (origin) &= (~(mask)); \ ^~ hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_set_mac_mc_entry': hisilicon/hns/hns_dsaf_reg.h:1046:12: error: 'mac_key.low.bits.port_vlan' may be used uninitialized in this function [-Werror=maybe-uninitialized] hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_add_mac_mc_port': hisilicon/hns/hns_dsaf_reg.h:1046:12: error: 'mac_key.low.bits.port_vlan' may be used uninitialized in this function [-Werror=maybe-uninitialized] hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_del_mac_entry': hisilicon/hns/hns_dsaf_reg.h:1046:12: error: 'mac_key.low.bits.port_vlan' may be used uninitialized in this function [-Werror=maybe-uninitialized] hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_rm_mac_addr': hisilicon/hns/hns_dsaf_reg.h:1046:12: error: 'mac_key.low.bits.port_vlan' may be used uninitialized in this function [-Werror=maybe-uninitialized] hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_del_mac_mc_port': hisilicon/hns/hns_dsaf_reg.h:1046:12: error: 'mac_key.low.bits.port_vlan' may be used uninitialized in this function [-Werror=maybe-uninitialized] hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_get_mac_uc_entry': hisilicon/hns/hns_dsaf_reg.h:1046:12: error: 'mac_key.low.bits.port_vlan' may be used uninitialized in this function [-Werror=maybe-uninitialized] hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_get_mac_mc_entry': hisilicon/hns/hns_dsaf_reg.h:1046:12: error: 'mac_key.low.bits.port_vlan' may be used uninitialized in this function [-Werror=maybe-uninitialized] The code is actually correct since we always set all 16 bits of the port_vlan field, but gcc correctly points out that the first access does contain uninitialized data. This initializes the field to zero first before setting the individual bits. Fixes: 5483bfcb169c ("net: hns: modify tcam table and set mac key") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-25net: hns: fix uninitialized data useArnd Bergmann
When dev_dbg() is enabled, we print uninitialized data, as gcc-7.0.1 now points out: ethernet/hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_set_promisc_tcam': ethernet/hisilicon/hns/hns_dsaf_main.c:2947:75: error: 'tbl_tcam_data.low.val' may be used uninitialized in this function [-Werror=maybe-uninitialized] ethernet/hisilicon/hns/hns_dsaf_main.c:2947:75: error: 'tbl_tcam_data.high.val' may be used uninitialized in this function [-Werror=maybe-uninitialized] We also pass the data into hns_dsaf_tcam_mc_cfg(), which might later use it (not sure about that), so it seems safer to just always initialize the tbl_tcam_data structure. Fixes: 1f5fa2dd1cfa ("net: hns: fix for promisc mode in HNS driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-25Merge branch 'qmap-mux'David S. Miller
Daniele Palmas says: ==================== net: usb: qmi_wwan: add qmap mux protocol support This patch adds support for qmap mux protocol available in recent Qualcomm based modems. The qmap mux protocol can be used for multiplexing data packets in order to have multiple ip streams through the same physical device. Two new sysfs files are added for adding/removing the qmap mux based interfaces (named qmimux): /sys/class/net/<iface>/qmi/add_mux /sys/class/net/<iface>/qmi/del_mux Main patch author is Bjørn Mork <bjorn@mork.no> An userspace implementation of the qmi requests needed to support multiple ip streams is already available (namely libqmi since version 1.18.0). The qmap mux feature has been recently implemented in Codeaurora gobinet out-of-kernel driver that was the inspiration for this development. Tests have been performed with Telit LE922A6 (PID 0x1040) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-25Documentation: ABI: testing: sysfs-class-net-qmi: add new qmap mux files ↵Daniele Palmas
description This patch updates the documentation related to the new files added for qmap mux support. Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-25net: usb: qmi_wwan: add qmap mux protocol supportDaniele Palmas
This patch adds support for qmap mux protocol available in recent Qualcomm based modems. The qmap mux protocol can be used for multiplexing data packets in order to have multiple ip streams through the same physical device. Two new sysfs files are added for adding/removing the qmap mux based interfaces (named qmimux): - /sys/class/net/<iface>/qmi/add_mux - /sys/class/net/<iface>/qmi/del_mux Main patch author is Bjørn Mork <bjorn@mork.no> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-25mlxsw: spectrum_kvdl: Cosmetic kvdl allocator API changeArkadi Sharshevsky
Currently the return allocated index and err value are multiplexed. This patch changes the API to decouple the ret value from the allocated index. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-25Merge tag 'fscrypt-for-linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt Pull fscrypto fixes from Ted Ts'o: "A code cleanup and bugfix for fs/crypto" * tag 'fscrypt-for-linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt: fscrypt: eliminate ->prepare_context() operation fscrypt: remove broken support for detecting keyring key revocation
2017-03-25Merge tag 'hwmon-for-linus-v4.11-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - bug fixes in asus_atk0110, it87 and max31790 drivers - added missing API definition to hwmon core * tag 'hwmon-for-linus-v4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (asus_atk0110) fix uninitialized data access hwmon: Add missing HWMON_T_ALARM hwmon: (it87) Avoid registering the same chip on both SIO addresses hwmon: (max31790) Set correct PWM value
2017-03-25Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma fixes from Doug Ledford: "This has been a slow -rc cycle for the RDMA subsystem. We really haven't had a lot of rc fixes come in. This pull request is the first of this entire rc cycle and it has all of the suitable fixes so far and it's still only about 20 patches. The fix for the minor breakage cause by the dma mapping patchset is in here, as well as a couple other potential oops fixes, but the rest is more minor. Summary: - fix for dma_ops change in this kernel, resolving the s390, powerpc, and IOMMU operation - a few other oops fixes - the rest are all minor fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/qib: fix false-postive maybe-uninitialized warning RDMA/iser: Fix possible mr leak on device removal event IB/device: Convert ib-comp-wq to be CPU-bound IB/cq: Don't process more than the given budget IB/rxe: increment msn only when completing a request uapi: fix rdma/mlx5-abi.h userspace compilation errors IB/core: Restore I/O MMU, s390 and powerpc support IB/rxe: Update documentation link RDMA/ocrdma: fix a type issue in ocrdma_put_pd_num() IB/rxe: double free on error RDMA/vmw_pvrdma: Activate device on ethernet link up RDMA/vmw_pvrdma: Dont hardcode QP header page RDMA/vmw_pvrdma: Cleanup unused variables infiniband: Fix alignment of mmap cookies to support VIPT caching IB/core: Protect against self-requeue of a cq work item i40iw: Receive netdev events post INET_NOTIFIER state
2017-03-25Merge branch 'stable-4.11' of git://git.infradead.org/users/pcmoore/auditLinus Torvalds
Pull audit fix from Paul Moore: "We've got an audit fix, and unfortunately it is big. While I'm not excited that we need to be sending you something this large during the -rcX phase, it does fix some very real, and very tangled, problems relating to locking, backlog queues, and the audit daemon connection. This code has passed our testsuite without problem and it has held up to my ad-hoc stress tests (arguably better than the existing code), please consider pulling this as fix for the next v4.11-rcX tag" * 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit: audit: fix auditd/kernel connection state tracking
2017-03-25ext4: fix two spelling nitsTheodore Ts'o
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-03-25ext4: lock the xattr block before checksuming itTheodore Ts'o
We must lock the xattr block before calculating or verifying the checksum in order to avoid spurious checksum failures. https://bugzilla.kernel.org/show_bug.cgi?id=193661 Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2017-03-25Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "A handful of Sunxi and Rockchip clk driver fixes and a core framework one where we need to copy a string because we can't guarantee it isn't freed sometime later" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: sunxi-ng: fix recalc_rate formula of NKMP clocks clk: sunxi-ng: Fix div/mult settings for osc12M on A64 clk: rockchip: Make uartpll a child of the gpll on rk3036 clk: rockchip: add "," to mux_pll_src_apll_dpll_gpll_usb480m_p on rk3036 clk: core: Copy connection id dt-bindings: arm: update Armada CP110 system controller binding clk: sunxi-ng: sun6i: Fix enable bit offset for hdmi-ddc module clock clk: sunxi: ccu-sun5i needs nkmp clk: sunxi-ng: mp: Adjust parent rate for pre-dividers
2017-03-25iio: bmg160: reset chip when probingQuentin Schulz
The gyroscope chip might need to be reset to be used. Without the chip being reset, the driver stopped at the first regmap_read (to get the CHIP_ID) and failed to probe. The datasheet of the gyroscope says that a minimum wait of 30ms after the reset has to be done. This patch has been checked on a BMX055 and the datasheet of the BMG160 and the BMI055 give the same reset register and bits. Signed-off-by: Quentin Schulz <quentin.schulz@free-electrons.com> Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
2017-03-25iio: cros_ec_sensors: Fix return value to get raw and calibbias data.Enric Balletbo i Serra
The cros_ec_sensors_read function must return the type of value on all cases. This was always true except for RAW and CALIBBIAS data which returned an error or 0. This patch just fixes the mistake I introduced when submitting the series. Fixes: commit c14dca07a31d (iio: cros_ec_sensors: add ChromeOS EC Contiguous Sensors driver) Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
2017-03-24uapi: add missing install of userio.hNaohiro Aota
While commit 5523662edd4f ("Input: add userio module") added userio.h under the uapi/ directory, it forgot to add the header file to Kbuild. Thus, the file was missing from header installation. Signed-off-by: Naohiro Aota <naota@elisp.net> Reviewed-by: Lyude Paul <thatslyude@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2017-03-24bpf: improve verifier packet range checksAlexei Starovoitov
llvm can optimize the 'if (ptr > data_end)' checks to be in the order slightly different than the original C code which will confuse verifier. Like: if (ptr + 16 > data_end) return TC_ACT_SHOT; // may be followed by if (ptr + 14 > data_end) return TC_ACT_SHOT; while llvm can see that 'ptr' is valid for all 16 bytes, the verifier could not. Fix verifier logic to account for such case and add a test. Reported-by: Huapeng Zhou <hzhou@fb.com> Fixes: 969bf05eb3ce ("bpf: direct packet access") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24ping: implement proper lockingEric Dumazet
We got a report of yet another bug in ping http://www.openwall.com/lists/oss-security/2017/03/24/6 ->disconnect() is not called with socket lock held. Fix this by acquiring ping rwlock earlier. Thanks to Daniel, Alexander and Andrey for letting us know this problem. Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Daniel Jiang <danieljiang0415@gmail.com> Reported-by: Solar Designer <solar@openwall.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24Merge branch 'epoll-busypoll'David S. Miller
Alexander Duyck says: ==================== Add busy poll support for epoll This patch set adds support for using busy polling with epoll. The main idea behind this is that we record the NAPI ID for the last event that is moved onto the ready list for the epoll context and then when we no longer have any events on the ready list we begin polling with that ID. If the busy polling does not yield any events then we will reset the NAPI ID to 0 and wait until a new event is added to the ready list with a valid NAPI ID before we will resume busy polling. Most of the changes in this set authored by me are meant to be cleanup or fixes for various things. For example, I am trying to make it so that we don't perform hash look-ups for the NAPI instance when we are only working with sender_cpu and the like. At the heart of this set is the last 3 patches which enable epoll support and add support for obtaining the NAPI ID of a given socket. With these it becomes possible for an application to make use of epoll and get optimal busy poll utilization by stacking multiple sockets with the same NAPI ID on the same epoll context. v1: The first version of this series only allowed epoll to busy poll if all of the sockets with a NAPI ID shared the same NAPI ID. I feel we were too strict with this requirement, so I changed the behavior for v2. v2: The second version was pretty much a full rewrite of the first set. The main changes consisted of pulling apart several patches to better address the need to clean up a few items and to make the code easier to review. In the set however I went a bit overboard and was trying to fix an issue that would only occur with 500+ years of uptime, and in the process limited the range for busy_poll/busy_read unnecessarily. v3: Split off the code for limiting busy_poll and busy_read into a separate patch for net. Updated patch that changed busy loop time tracking so that it uses "local_clock() >> 10" as we originally did. Tweaked "Change return type.." patch by moving declaration of "work" inside the loop where is was accessed and always reset to 0. Added "Acked-by" for patches that received acks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net: Introduce SO_INCOMING_NAPI_IDSridhar Samudrala
This socket option returns the NAPI ID associated with the queue on which the last frame is received. This information can be used by the apps to split the incoming flows among the threads based on the Rx queue on which they are received. If the NAPI ID actually represents a sender_cpu then the value is ignored and 0 is returned. Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24epoll: Add busy poll support to epoll with socket fds.Sridhar Samudrala
This patch adds busy poll support to epoll. The implementation is meant to be opportunistic in that it will take the NAPI ID from the last socket that is added to the ready list that contains a valid NAPI ID and it will use that for busy polling until the ready list goes empty. Once the ready list goes empty the NAPI ID is reset and busy polling is disabled until a new socket is added to the ready list. In addition when we insert a new socket into the epoll we record the NAPI ID and assume we are going to receive events on it. If that doesn't occur it will be evicted as the active NAPI ID and we will resume normal behavior. An application can use SO_INCOMING_CPU or SO_REUSEPORT_ATTACH_C/EBPF socket options to spread the incoming connections to specific worker threads based on the incoming queue. This enables epoll for each worker thread to have only sockets that receive packets from a single queue. So when an application calls epoll_wait() and there are no events available to report, busy polling is done on the associated queue to pull the packets. Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net: Commonize busy polling code to focus on napi_id instead of socketSridhar Samudrala
Move the core functionality in sk_busy_loop() to napi_busy_loop() and make it independent of sk. This enables re-using this function in epoll busy loop implementation. Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net: Track start of busy loop instead of when it should endAlexander Duyck
This patch flips the logic we were using to determine if the busy polling has timed out. The main motivation for this is that we will need to support two different possible timeout values in the future and by recording the start time rather than when we would want to end we can focus on making the end_time specific to the task be it epoll or socket based polling. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net: Change return type of sk_busy_loop from bool to voidAlexander Duyck
checking the return value of sk_busy_loop. As there are only a few consumers of that data, and the data being checked for can be replaced with a check for !skb_queue_empty() we might as well just pull the code out of sk_busy_loop and place it in the spots that actually need it. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net: Only define skb_mark_napi_id in one spot instead of twoAlexander Duyck
Instead of defining two versions of skb_mark_napi_id I think it is more readable to just match the format of the sk_mark_napi_id functions and just wrap the contents of the function instead of defining two versions of the function. This way we can save a few lines of code since we only need 2 of the ifdef/endif but needed 5 for the extra function declaration. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24tcp: Record Rx hash and NAPI ID in tcp_child_processAlexander Duyck
While working on some recent busy poll changes we found that child sockets were being instantiated without NAPI ID being set. In our first attempt to fix it, it was suggested that we should just pull programming the NAPI ID into the function itself since all callers will need to have it set. In addition to the NAPI ID change I have dropped the code that was populating the Rx hash since it was actually being populated in tcp_get_cookie_sock. Reported-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net: Busy polling should ignore sender CPUsAlexander Duyck
This patch is a cleanup/fix for NAPI IDs following the changes that made it so that sender_cpu and napi_id were doing a better job of sharing the same location in the sk_buff. One issue I found is that we weren't validating the napi_id as being valid before we started trying to setup the busy polling. This change corrects that by using the MIN_NAPI_ID value that is now used in both allocating the NAPI IDs, as well as validating them. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24IB/qib: fix false-postive maybe-uninitialized warningArnd Bergmann
aarch64-linux-gcc-7 complains about code it doesn't fully understand: drivers/infiniband/hw/qib/qib_iba7322.c: In function 'qib_7322_txchk_change': include/asm-generic/bitops/non-atomic.h:105:35: error: 'shadow' may be used uninitialized in this function [-Werror=maybe-uninitialized] The code is right, and despite trying hard, I could not come up with a version that I liked better than just adding a fake initialization here to shut up the warning. Fixes: f931551bafe1 ("IB/qib: Add new qib driver for QLogic PCIe InfiniBand adapters") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24RDMA/iser: Fix possible mr leak on device removal eventSagi Grimberg
When the rdma device is removed, we must cleanup all the rdma resources within the DEVICE_REMOVAL event handler to let the device teardown gracefully. When this happens with live I/O, some memory regions are occupied. Thus, track them too and dereg all the mr's. We are safe with mr access by iscsi_iser_cleanup_task. Reported-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24i40e: Do not enable NAPI on q_vectors that have no ringsAlexander Duyck
When testing the epoll w/ busy poll code I found that I could get into a state where the i40e driver had q_vectors w/ active NAPI that had no rings. This was resulting in a divide by zero error. To correct it I am updating the driver code so that we only support NAPI on q_vectors that have 1 or more rings allocated to them. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24secure_seq: downgrade to per-host timestamp offsetsFlorian Westphal
Unfortunately too many devices (not under our control) use tcp_tw_recycle=1, which depends on timestamps being identical of the same saddr. Although tcp_tw_recycle got removed in net-next we can't make such end hosts disappear so downgrade to per-host timestamp offsets. Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Reported-by: Yvan Vanrossomme <yvan@vanrossomme.net> Fixes: 95a22caee396c ("tcp: randomize tcp timestamp offsets for each connection") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24IB/device: Convert ib-comp-wq to be CPU-boundSagi Grimberg
This workqueue is used by our storage target mode ULPs via the new CQ API. Recent observations when working with very high-end flash storage devices reveal that UNBOUND workqueue threads can migrate between cpu cores and even numa nodes (although some numa locality is accounted for). While this attribute can be useful in some workloads, it does not fit in very nicely with the normal run-to-completion model we usually use in our target-mode ULPs and the block-mq irq<->cpu affinity facilities. The whole block-mq concept is that the completion will land on the same cpu where the submission was performed. The fact that our submitter thread is migrating cpus can break this locality. We assume that as a target mode ULP, we will serve multiple initiators/clients and we can spread the load enough without having to use unbound kworkers. Also, while we're at it, expose this workqueue via sysfs which is harmless and can be useful for debug. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>-- Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24IB/cq: Don't process more than the given budgetSagi Grimberg
The caller might not want this overhead. Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24Merge branch 'mlx5-xdp-perf-optimizations'David S. Miller
Saeed Mahameed says: ==================== Mellanox mlx5e XDP performance optimization This series provides some preformancee optimizations for mlx5e driver, especially for XDP TX flows. 1st patch is a simple change of rmb to dma_rmb in CQE fetch routine which shows a huge gain for both RX and TX packet rates. 2nd patch removes write combining logic from the driver TX handler and simplifies the TX logic while improving TX CPU utilization. All other patches combined provide some refactoring to the driver TX flows to allow some significant XDP TX improvements. More details and performance numbers per patch can be found in each patch commit message compared to the preceding patch. Overall performance improvemnets System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz Test case Baseline Now improvement --------------------------------------------------------------- TX packets (24 threads) 45Mpps 54Mpps 20% TC stack Drop (1 core) 3.45Mpps 3.6Mpps 5% XDP Drop (1 core) 14Mpps 16.9Mpps 20% XDP TX (1 core) 10.4Mpps 13.7Mpps 31% ==================== Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net/mlx5e: Different SQ typesSaeed Mahameed
Different SQ types (tx, xdp, ico) are growing apart, we separate them and remove unwanted parts in each one of them, to simplify data path and utilize data cache. Remove DB union from SQ structures since it is not needed anymore as we now have different SQ data type for each SQ. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net/mlx5e: Generalize SQ create/modify/destroy functionsSaeed Mahameed
In the next patches we will introduce different SQ types, and we would want to reuse those functions, in this patch we make them agnostic to SQ type (txq, xdp, ico). Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net/mlx5e: Proper names for SQ/RQ/CQ functionsSaeed Mahameed
Rename mlx5e_{create,destroy}_{sq,rq,cq} to mlx5e_{alloc,free}_{sq,rq,cq}. Rename mlx5e_{enable,disable}_{sq,rq,cq} to mlx5e_{create,destroy}_{sq,rq,cq}. mlx5e_{enable,disable}_{sq,rq,cq} used to actually create/destroy the SQ in FW, so we rename them to align the functions names with FW semantics. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net/mlx5e: Generalize tx helper functions for different SQ typesSaeed Mahameed
In the next patches we will introduce different SQ types, for that we here generalize some TX helper functions to work with more basic SQ parameters, in order to re-use them for the different SQ types. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net/mlx5e: Optimize XDP frame xmitSaeed Mahameed
XDP SQ has a fixed size WQE (MLX5E_XDP_TX_WQEBBS = 1) and only posts one kind of WQE (MLX5_OPCODE_SEND), Also we initialize SQ descriptors static fields once on open_xdpsq, rather than every time on critical path. Optimize the code in light of those facts and add a prefetch of the TX descriptor first thing in the xdp xmit function. Performance improvement: System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz Test case Before Now improvement --------------------------------------------------------------- XDP TX (1 core) 13Mpps 13.7Mpps 5% Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net/mlx5e: Poll XDP TX CQ before RX CQSaeed Mahameed
Handle XDP TX completions before handling RX packets, to make sure more free space is available for XDP TX packets a moment before handling RX packets. Performance improvement: System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz Test case Before Now improvement --------------------------------------------------------------- XDP Drop (1 core) 16.9Mpps 16.9Mpps No change XDP TX (1 core) 12Mpps 13Mpps 8% Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net/mlx5e: Move XDP SQ instance into RQSaeed Mahameed
To save many rq->channel->sq dereferences in fast-path. And rename it to xdpsq. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net/mlx5e: Move mlx5e_rq struct declarationSaeed Mahameed
Move struct mlx5e_rq and friends to appear after mlx5e_sq declaration in en.h. We will need this for next patch to move the mlx5e_sq instance into mlx5e_rq struct for XDP SQs. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net/mlx5e: Move XDP completion functions to rx fileSaeed Mahameed
XDP code belongs to RX path, move mlx5e_poll_xdp_tx_cq and mlx5e_free_xdp_tx_descs to en_rx.c. Rename them to mlx5e_poll_xdpsq_cq and mlx5e_free_xdpsq_descs. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net/mlx5e: Single bfreg (UAR) for all mlx5e SQs and netdevsSaeed Mahameed
One is sufficient since Blue Flame is not supported anymore. This will also come in handy for switchdev mode to save resources, since VF representors will use same single UAR as well for their own SQs. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net/mlx5e: Xmit, no write combiningSaeed Mahameed
mlx5e netdev Blue Flame (write combining) support demands a lot of overhead for a little latency gain for some special cases, this overhead is hurting the common case. Here we remove xmit Blue Flame support by creating all bfregs with no write combining for all SQs, and we remove a lot of BF logic and conditions from xmit data path. Simplify mlx5e_tx_notify_hw (doorbell function) by removing BF related code and by removing one memory barrier needed for WC mapped SQ doorbell buffers, which no longer exist. Performance improvement: System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz Test case Before Now improvement --------------------------------------------------------------- TX packets (24 threads) 50Mpps 54Mpps 8% Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24net/mlx5e: Use dma_rmb rather than rmb in CQE fetch routineSaeed Mahameed
Use dma_rmb in mlx5e_get_cqe rather than aggressive rmb (at least on some architectures), this should help improve the performance on such CPU archs where dma_rmb is optimized. Performance improvement: System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz Test case Baseline Now improvement --------------------------------------------------------------- TX packets (24 threads) 45Mpps 50Mpps 11% TC stack Drop (1 core) 3.45Mpps 3.6Mpps 5% XDP Drop (1 core) 14Mpps 16.9Mpps 20% XDP TX (1 core) 10.4Mpps 12Mpps 15% Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24IB/rxe: increment msn only when completing a requestDavid Marchand
According to C9-147, MSN should only be incremented when the last packet of a multi packet request has been received. "Logically, the requester associates a sequential Send Sequence Number (SSN) with each WQE posted to the send queue. The SSN bears a one- to-one relationship to the MSN returned by the responder in each re- sponse packet. Therefore, when the requester receives a response, it in- terprets the MSN as representing the SSN of the most recent request completed by the responder to determine which send WQE(s) can be completed." Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: David Marchand <david.marchand@6wind.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-24uapi: fix rdma/mlx5-abi.h userspace compilation errorsDmitry V. Levin
Consistently use types from linux/types.h to fix the following rdma/mlx5-abi.h userspace compilation errors: /usr/include/rdma/mlx5-abi.h:69:25: error: 'u64' undeclared here (not in a function) MLX5_LIB_CAP_4K_UAR = (u64)1 << 0, /usr/include/rdma/mlx5-abi.h:69:29: error: expected ',' or '}' before numeric constant MLX5_LIB_CAP_4K_UAR = (u64)1 << 0, Include <linux/if_ether.h> to fix the following rdma/mlx5-abi.h userspace compilation error: /usr/include/rdma/mlx5-abi.h:286:12: error: 'ETH_ALEN' undeclared here (not in a function) __u8 dmac[ETH_ALEN]; Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Doug Ledford <dledford@redhat.com>