summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2019-02-08cfg80211: parse multi-bssid only if HW supports itSara Sharon
Parsing and exposing nontransmitted APs is problematic when underlying HW doesn't support it. Do it only if driver indicated support. Allow HE restriction as well, since the HE spec defined the exact manner that Multiple BSSID set should behave. APs that not support the HE spec will have less predictable Multiple BSSID set support/behavior Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-08cfg80211: Move Multiple BSS info to struct cfg80211_bss to be visibleSara Sharon
Previously the transmitted BSS and the non-trasmitted BSS list were defined in struct cfg80211_internal_bss. Move them to struct cfg80211_bss since mac80211 needs this info. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-08cfg80211: Properly track transmitting and non-transmitting BSSSara Sharon
When holding data of the non-transmitting BSS, we need to keep the transmitting BSS data on. Otherwise it will be released, and release the non-transmitting BSS with it. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-08cfg80211: use for_each_element() for multi-bssid parsingJohannes Berg
Use the new for_each_element() helper here, we cannot use for_each_subelement() since we have a fixed 1 byte before the subelements start. While at it, also fix le16_to_cpup() to be get_unaligned_le16() since we don't know anything about alignment. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-08cfg80211: Parsing of Multiple BSSID information in scanningPeng Xu
This extends cfg80211 BSS table processing to be able to parse Multiple BSSID element from Beacon and Probe Response frames and to update the BSS profiles in internal database for non-transmitted BSSs. Signed-off-by: Peng Xu <pxu@codeaurora.org> Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-08mac80211: move the bss update from elements to an helperSara Sharon
This will allow iterating over multiple BSSs inside cfg80211_bss, in case of multiple BSSID. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-08mac80211: pass bssids to elements parsing functionSara Sharon
In multiple BSSID, we have nested IEs inside the multiple BSSID IE, that override the external ones for that specific BSS. As preparation for supporting that, pass 2 BSSIDs to the parse function, the transmitter, and the selected BSSID, so it can know which IEs to choose. If the selected BSSID is NULL, the outer ones will be applied. Change ieee80211_bss_info_update to parse elements itself, instead of receiving them parsed, so we have the relevant bss entry in hand. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-08nl80211: use for_each_element() in validate_ie_attr()Johannes Berg
This makes for much simpler code, simply walk through all the elements and check that the last one found ends with the end of the data. This works because if any element is malformed the walk is aborted, we end up with a mismatch. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-08cfg80211: add various struct element finding helpersJohannes Berg
We currently have a number of helpers to find elements that just return a u8 *, change those to return a struct element and add inlines to deal with the u8 * compatibility. Note that the match behaviour is changed to start the natch at the data, so conversion from _ie_match to _elem_match need to be done carefully. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-08mac80211: use element iteration macro in parsingJohannes Berg
Instead of open-coding the element walk, use the new macro. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-08cfg80211: add and use strongly typed element iteration macrosJohannes Berg
Rather than always iterating elements from frames with pure u8 pointers, add a type "struct element" that encapsulates the id/datalen/data format of them. Then, add the element iteration macros * for_each_element * for_each_element_id * for_each_element_extid which take, as their first 'argument', such a structure and iterate through a given u8 array interpreting it as elements. While at it and since we'll need it, also add * for_each_subelement * for_each_subelement_id * for_each_subelement_extid which instead of taking data/length just take an outer element and use its data/datalen. Also add for_each_element_completed() to determine if any of the loops above completed, i.e. it was able to parse all of the elements successfully and no data remained. Use for_each_element_id() in cfg80211_find_ie_match() as the first user of this. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-07net/smc: original socket family in inet_sock_diagKarsten Graul
Commit ed75986f4aae ("net/smc: ipv6 support for smc_diag.c") changed the value of the diag_family field. The idea was to indicate the family of the IP address in the inet_diag_sockid field. But the change makes it impossible to distinguish an inet_sock_diag response message from SMC sock_diag response. This patch restores the original behaviour and sends AF_SMC as value of the diag_family field. Fixes: ed75986f4aae ("net/smc: ipv6 support for smc_diag.c") Reported-by: Eugene Syromiatnikov <esyr@redhat.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07net/smc: move code to clear the conn->lgr fieldKarsten Graul
The lgr field of an smc_connection is set in smc_conn_create() and should be cleared in smc_conn_free() for consistency reasons, so move the responsible code. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07net/smc: use client and server LGR pending locks for SMC-RHans Wippel
If SMC client and server connections are both established at the same time, smc_connect_rdma() cannot send a CLC confirm message while smc_listen_work() is waiting for one due to lock contention. This can result in timeouts in smc_clc_wait_msg() and failed SMC connections. In case of SMC-R, there are two types of LGRs (client and server LGRs) which can be protected by separate locks. So, this patch splits the LGR pending lock into two separate locks for client and server to avoid the locking issue for SMC-R. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07net/smc: unlock LGR pending lock earlier for SMC-DHans Wippel
If SMC client and server connections are both established at the same time, smc_connect_ism() cannot send a CLC confirm message while smc_listen_work() is waiting for one due to lock contention. This can result in timeouts in smc_clc_wait_msg() and failed SMC connections. In case of SMC-D, the LGR pending lock is not needed while smc_listen_work() is waiting for the CLC confirm message. So, this patch releases the lock earlier for SMC-D to avoid the locking issue. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07net/smc: use smc_curs_copy() for SMC-DUrsula Braun
SMC already provides a wrapper for atomic64 calls to be architecture independent. Use this wrapper for SMC-D as well. Reported-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07net/smc: postpone release of clcsockUrsula Braun
According to RFC7609 (http://www.rfc-editor.org/info/rfc7609) first the SMC-R connection is shut down and then the normal TCP connection FIN processing drives cleanup of the internal TCP connection. The unconditional release of the clcsock during active socket closing has to be postponed if the peer has not yet signalled socket closing. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()Hangbin Liu
If we disabled IPv6 from the kernel command line (ipv6.disable=1), we should not call ip6_err_gen_icmpv6_unreach(). This: ip link add sit1 type sit local 192.0.2.1 remote 192.0.2.2 ttl 1 ip link set sit1 up ip addr add 198.51.100.1/24 dev sit1 ping 198.51.100.2 if IPv6 is disabled at boot time, will crash the kernel. v2: there's no need to use in6_dev_get(), use __in6_dev_get() instead, as we only need to check that idev exists and we are under rcu_read_lock() (from netif_receive_skb_internal()). Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: ca15a078bd90 ("sit: generate icmpv6 error when receiving icmpv4 error") Cc: Oussama Ghorbel <ghorbel@pivasoftware.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07devlink: Add health dump {get,clear} commandsEran Ben Elisha
Add devlink health dump commands, in order to run an dump operation over a specific reporter. The supported operations are dump_get in order to get last saved dump (if not exist, dump now) and dump_clear to clear last saved dump. It is expected from driver's callback for diagnose command to fill it via the devlink fmsg API. Devlink will parse it and convert it to netlink nla API in order to pass it to the user. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07devlink: Add health diagnose commandEran Ben Elisha
Add devlink health diagnose command, in order to run a diagnose operation over a specific reporter. It is expected from driver's callback for diagnose command to fill it via the devlink fmsg API. Devlink will parse it and convert it to netlink nla API in order to pass it to the user. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07devlink: Add health recover commandEran Ben Elisha
Add devlink health recover command to the uapi, in order to allow the user to execute a recover operation over a specific reporter. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07devlink: Add health set commandEran Ben Elisha
Add devlink health set command, in order to set configuration parameters for a specific reporter. Supported parameters are: - graceful_period: Time interval between auto recoveries (in msec) - auto_recover: Determines if the devlink shall execute recover upon receiving error for the reporter Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07devlink: Add health get commandEran Ben Elisha
Add devlink health get command to provide reporter/s data for user space. Add the ability to get data per reporter or dump data from all available reporters. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07devlink: Add health report functionalityEran Ben Elisha
Upon error discover, every driver can report it to the devlink health mechanism via devlink_health_report function, using the appropriate reporter registered to it. Driver can pass error specific context which will be delivered to it as part of the dump / recovery callbacks. Once an error is reported, devlink health will do the following actions: * A log is being send to the kernel trace events buffer * Health status and statistics are being updated for the reporter instance * Object dump is being taken and stored at the reporter instance (as long as there is no other dump which is already stored) * Auto recovery attempt is being done. Depends on: - Auto Recovery configuration - Grace period vs. Time since last recover Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07devlink: Add health reporter create/destroy functionalityEran Ben Elisha
Devlink health reporter is an instance for reporting, diagnosing and recovering from run time errors discovered by the reporters. Define it's data structure and supported operations. In addition, expose devlink API to create and destroy a reporter. Each devlink instance will hold it's own reporters list. As part of the allocation, driver shall provide a set of callbacks which will be used by devlink in order to handle health reports and user commands related to this reporter. In addition, driver is entitled to provide some priv pointer, which can be fetched from the reporter by devlink_health_reporter_priv function. For each reporter, devlink will hold a metadata of statistics, dump msg and status. For passing dumps and diagnose data to the user-space, it will use devlink fmsg API. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07devlink: Add devlink formatted message (fmsg) APIEran Ben Elisha
Devlink fmsg is a mechanism to pass descriptors between drivers and devlink, in json-like format. The API allows the driver to add nested attributes such as object, object pair and value array, in addition to attributes such as name and value. Driver can use this API to fill the fmsg context in a format which will be translated by the devlink to the netlink message later. There is no memory allocation in advance (other than the initial list head), and it dynamically allocates messages descriptors and add them to the list on the fly. When it needs to send the data using SKBs to the netlink layer, it fragments the data between different SKBs. In order to do this fragmentation, it uses virtual nests attributes, to avoid actual nesting use which cannot be divided between different SKBs. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06Merge branch 'for_net-next-5.1/rds-tos-v4' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux Santosh Shilimkar says: ==================== rds: add tos support RDS applications make use of tos to classify database traffic. This feature has been used in shipping products from 2.6.32 based kernels. Its tied with RDS v4.1 protocol version and the compatibility gets negotiated as part of connections setup. Patchset keeps full backward compatibility using existing connection negotiation scheme. Currently the feature is exploited by RDMA transport and for TCP transport the user tos values are mapped to same default class (0). For RDMA transports, RDMA CM service type API is used to set up different SL(service lanes) and the IB fabric is configured for tos mapping using Subnet Manager(SL to VL mappings). Similarly for ROCE fabric, user priority is mapped with different DSCP code points which are associated with different switch queues in the fabric. The original code was developed by Bang Nguyen in downstream kernel back in 2.6.32 kernel days and it has evolved significantly over period of time. Thanks to Yanjun for doing testing with various combinations of host like v3.1<->v4.1, v4.1.<->v3.1, v4.1 upstream to shipping v4.1 etc etc ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-02-07 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add a riscv64 JIT for BPF, from Björn. 2) Implement BTF deduplication algorithm for libbpf which takes BTF type information containing duplicate per-compilation unit information and reduces it to an equivalent set of BTF types with no duplication and without loss of information, from Andrii. 3) Offloaded and native BPF XDP programs can coexist today, enable also offloaded and generic ones as well, from Jakub. 4) Expose various BTF related helper functions in libbpf as API which are in particular helpful for JITed programs, from Yonghong. 5) Fix the recently added JMP32 code emission in s390x JIT, from Heiko. 6) Fix BPF kselftests' tcp_{server,client}.py to be able to run inside a network namespace, also add a fix for libbpf to get libbpf_print() working, from Stanislav. 7) Fixes for bpftool documentation, from Prashant. 8) Type cleanup in BPF kselftests' test_maps.c to silence a gcc8 warning, from Breno. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07y2038: syscalls: rename y2038 compat syscallsArnd Bergmann
A lot of system calls that pass a time_t somewhere have an implementation using a COMPAT_SYSCALL_DEFINEx() on 64-bit architectures, and have been reworked so that this implementation can now be used on 32-bit architectures as well. The missing step is to redefine them using the regular SYSCALL_DEFINEx() to get them out of the compat namespace and make it possible to build them on 32-bit architectures. Any system call that ends in 'time' gets a '32' suffix on its name for that version, while the others get a '_time32' suffix, to distinguish them from the normal version, which takes a 64-bit time argument in the future. In this step, only 64-bit architectures are changed, doing this rename first lets us avoid touching the 32-bit architectures twice. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-02-06net: Get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_IDFlorian Fainelli
Now that we have a dedicated NDO for getting a port's parent ID, get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_ID and convert all callers to use the NDO exclusively. This is a preliminary change to getting rid of switchdev_ops eventually. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06net: dsa: Implement ndo_get_port_parent_id()Florian Fainelli
DSA implements SWITCHDEV_ATTR_ID_PORT_PARENT_ID and we want to get rid of switchdev_ops eventually, ease that migration by implementing a ndo_get_port_parent_id() function which returns what switchdev_port_attr_get() would do. Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06net: Introduce ndo_get_port_parent_id()Florian Fainelli
In preparation for getting rid of switchdev_ops, create a dedicated NDO operation for getting the port's parent identifier. There are essentially two classes of drivers that need to implement getting the port's parent ID which are VF/PF drivers with a built-in switch, and pure switchdev drivers such as mlxsw, ocelot, dsa etc. We introduce a helper function: dev_get_port_parent_id() which supports recursion into the lower devices to obtain the first port's parent ID. Convert the bridge, core and ipv4 multicast routing code to check for such ndo_get_port_parent_id() and call the helper function when valid before falling back to switchdev_port_attr_get(). This will allow us to convert all relevant drivers in one go instead of having to implement both switchdev_port_attr_get() and ndo_get_port_parent_id() operations, then get rid of switchdev_port_attr_get(). Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06net: dsa: Fix NULL checking in dsa_slave_set_eee()Dan Carpenter
This function can't succeed if dp->pl is NULL. It will Oops inside the call to return phylink_ethtool_get_eee(dp->pl, e); Fixes: 1be52e97ed3e ("dsa: slave: eee: Allow ports to use phylink") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06svcrdma: Remove syslog warnings in work completion handlersChuck Lever
These can result in a lot of log noise, and are able to be triggered by client misbehavior. Since there are trace points in these handlers now, there's no need to spam the log. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-02-06svcrdma: Squelch compiler warning when SUNRPC_DEBUG is disabledChuck Lever
CC [M] net/sunrpc/xprtrdma/svc_rdma_transport.o linux/net/sunrpc/xprtrdma/svc_rdma_transport.c: In function ‘svc_rdma_accept’: linux/net/sunrpc/xprtrdma/svc_rdma_transport.c:452:19: warning: variable ‘sap’ set but not used [-Wunused-but-set-variable] struct sockaddr *sap; ^ Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-02-06svcrdma: Use struct_size() in kmalloc()Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; instance = kmalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-02-06svcrpc: fix unlikely races preventing queueing of socketsJ. Bruce Fields
In the rpc server, When something happens that might be reason to wake up a thread to do something, what we do is - modify xpt_flags, sk_sock->flags, xpt_reserved, or xpt_nr_rqsts to indicate the new situation - call svc_xprt_enqueue() to decide whether to wake up a thread. svc_xprt_enqueue may require multiple conditions to be true before queueing up a thread to handle the xprt. In the SMP case, one of the other CPU's may have set another required condition, and in that case, although both CPUs run svc_xprt_enqueue(), it's possible that neither call sees the writes done by the other CPU in time, and neither one recognizes that all the required conditions have been set. A socket could therefore be ignored indefinitely. Add memory barries to ensure that any svc_xprt_enqueue() call will always see the conditions changed by other CPUs before deciding to ignore a socket. I've never seen this race reported. In the unlikely event it happens, another event will usually come along and the problem will fix itself. So I don't think this is worth backporting to stable. Chuck tried this patch and said "I don't see any performance regressions, but my server has only a single last-level CPU cache." Tested-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-02-06svcrpc: svc_xprt_has_something_to_do seems a little longJ. Bruce Fields
The long name seemed cute till I wanted to refer to it somewhere else. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-02-06SUNRPC: Don't allow compiler optimisation of svc_xprt_release_slot()Trond Myklebust
Use READ_ONCE() to tell the compiler to not optimse away the read of xprt->xpt_flags in svc_xprt_release_slot(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-02-06svcrdma: Remove max_sge check at connect timeChuck Lever
Two and a half years ago, the client was changed to use gathered Send for larger inline messages, in commit 655fec6987b ("xprtrdma: Use gathered Send for large inline messages"). Several fixes were required because there are a few in-kernel device drivers whose max_sge is 3, and these were broken by the change. Apparently my memory is going, because some time later, I submitted commit 25fd86eca11c ("svcrdma: Don't overrun the SGE array in svc_rdma_send_ctxt"), and after that, commit f3c1fd0ee294 ("svcrdma: Reduce max_send_sges"). These too incorrectly assumed in-kernel device drivers would have more than a few Send SGEs available. The fix for the server side is not the same. This is because the fundamental problem on the server is that, whether or not the client has provisioned a chunk for the RPC reply, the server must squeeze even the most complex RPC replies into a single RDMA Send. Failing in the send path because of Send SGE exhaustion should never be an option. Therefore, instead of failing when the send path runs out of SGEs, switch to using a bounce buffer mechanism to handle RPC replies that are too complex for the device to send directly. That allows us to remove the max_sge check to enable drivers with small max_sge to work again. Reported-by: Don Dutile <ddutile@redhat.com> Fixes: 25fd86eca11c ("svcrdma: Don't overrun the SGE array in ...") Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-02-06devlink: add hardware errors tracing facilityNir Dotan
Define a tracepoint and allow user to trace messages in case of an hardware error code for hardware associated with devlink instance. Signed-off-by: Nir Dotan <nird@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06rxrpc: bad unlock balance in rxrpc_recvmsgEric Dumazet
When either "goto wait_interrupted;" or "goto wait_error;" paths are taken, socket lock has already been released. This patch fixes following syzbot splat : WARNING: bad unlock balance detected! 5.0.0-rc4+ #59 Not tainted ------------------------------------- syz-executor223/8256 is trying to release lock (sk_lock-AF_RXRPC) at: [<ffffffff86651353>] rxrpc_recvmsg+0x6d3/0x3099 net/rxrpc/recvmsg.c:598 but there are no more locks to release! other info that might help us debug this: 1 lock held by syz-executor223/8256: #0: 00000000fa9ed0f4 (slock-AF_RXRPC){+...}, at: spin_lock_bh include/linux/spinlock.h:334 [inline] #0: 00000000fa9ed0f4 (slock-AF_RXRPC){+...}, at: release_sock+0x20/0x1c0 net/core/sock.c:2798 stack backtrace: CPU: 1 PID: 8256 Comm: syz-executor223 Not tainted 5.0.0-rc4+ #59 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_unlock_imbalance_bug kernel/locking/lockdep.c:3391 [inline] print_unlock_imbalance_bug.cold+0x114/0x123 kernel/locking/lockdep.c:3368 __lock_release kernel/locking/lockdep.c:3601 [inline] lock_release+0x67e/0xa00 kernel/locking/lockdep.c:3860 sock_release_ownership include/net/sock.h:1471 [inline] release_sock+0x183/0x1c0 net/core/sock.c:2808 rxrpc_recvmsg+0x6d3/0x3099 net/rxrpc/recvmsg.c:598 sock_recvmsg_nosec net/socket.c:794 [inline] sock_recvmsg net/socket.c:801 [inline] sock_recvmsg+0xd0/0x110 net/socket.c:797 __sys_recvfrom+0x1ff/0x350 net/socket.c:1845 __do_sys_recvfrom net/socket.c:1863 [inline] __se_sys_recvfrom net/socket.c:1859 [inline] __x64_sys_recvfrom+0xe1/0x1a0 net/socket.c:1859 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x446379 Code: e8 2c b3 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fe5da89fd98 EFLAGS: 00000246 ORIG_RAX: 000000000000002d RAX: ffffffffffffffda RBX: 00000000006dbc28 RCX: 0000000000446379 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 00000000006dbc20 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dbc2c R13: 0000000000000000 R14: 0000000000000000 R15: 20c49ba5e353f7cf Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Howells <dhowells@redhat.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06batman-adv: fix memory leak in in batadv_dat_put_dhcpMartin Weinelt
batadv_dat_put_dhcp is creating a new ARP packet via batadv_dat_arp_create_reply and tries to forward it via batadv_dat_send_data to different peers in the DHT. The original skb is not consumed by batadv_dat_send_data and thus has to be consumed by the caller. Fixes: b61ec31c8575 ("batman-adv: Snoop DHCPACKs for DAT") Signed-off-by: Martin Weinelt <martin@linuxlounge.net> [sven@narfation.org: add commit message] Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2019-02-06ethtool: add ethtool_rx_flow_spec to flow_rule structure translatorPablo Neira Ayuso
This patch adds a function to translate the ethtool_rx_flow_spec structure to the flow_rule representation. This allows us to reuse code from the driver side given that both flower and ethtool_rx_flow interfaces use the same representation. This patch also includes support for the flow type flags FLOW_EXT, FLOW_MAC_EXT and FLOW_RSS. The ethtool_rx_flow_spec_input wrapper structure is used to convey the rss_context field, that is away from the ethtool_rx_flow_spec structure, and the ethtool_rx_flow_spec structure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06cls_flower: don't expose TC actions to drivers anymorePablo Neira Ayuso
Now that drivers have been converted to use the flow action infrastructure, remove this field from the tc_cls_flower_offload structure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06flow_offload: add statistics retrieval infrastructure and use itPablo Neira Ayuso
This patch provides the flow_stats structure that acts as container for tc_cls_flower_offload, then we can use to restore the statistics on the existing TC actions. Hence, tcf_exts_stats_update() is not used from drivers anymore. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06cls_api: add translator to flow_action representationPablo Neira Ayuso
This patch implements a new function to translate from native TC action to the new flow_action representation. Moreover, this patch also updates cls_flower to use this new function. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06flow_offload: add flow action infrastructurePablo Neira Ayuso
This new infrastructure defines the nic actions that you can perform from existing network drivers. This infrastructure allows us to avoid a direct dependency with the native software TC action representation. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06flow_offload: add flow_rule and flow_match structures and use themPablo Neira Ayuso
This patch wraps the dissector key and mask - that flower uses to represent the matching side - around the flow_match structure. To avoid a follow up patch that would edit the same LoCs in the drivers, this patch also wraps this new flow match structure around the flow rule object. This new structure will also contain the flow actions in follow up patches. This introduces two new interfaces: bool flow_rule_match_key(rule, dissector_id) that returns true if a given matching key is set on, and: flow_rule_match_XYZ(rule, &match); To fetch the matching side XYZ into the match container structure, to retrieve the key and the mask with one single call. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06net: xdp: allow generic and driver XDP on one interfaceJakub Kicinski
Since commit a25717d2b604 ("xdp: support simultaneous driver and hw XDP attachment") users can load an XDP program for offload and in native driver mode simultaneously. Allow a similar mix of offload and SKB mode/generic XDP. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>