summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2019-09-01netlabel: remove redundant assignment to pointer iterColin Ian King
Pointer iter is being initialized with a value that is never read and is being re-assigned a little later on. The assignment is redundant and hence can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-31net/ncsi: add response handlers for PLDM over NC-SIBen Wei
This patch adds handlers for PLDM over NC-SI command response. This enables NC-SI driver recognizes the packet type so the responses don't get dropped as unknown packet type. PLDM over NC-SI are not handled in kernel driver for now, but can be passed back to user space via Netlink for further handling. Signed-off-by: Ben Wei <benwei@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-31devlink: Use switch-case instead of if-elseParav Pandit
Make core more readable with switch-case for various port flavours. Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-31devlink: Make port index data type as unsigned intParav Pandit
Devlink port index attribute is returned to users as u32 through netlink response. Change index data type from 'unsigned' to 'unsigned int' to avoid below checkpatch.pl warning. WARNING: Prefer 'unsigned int' to bare use of 'unsigned' 81: FILE: include/net/devlink.h:81: + unsigned index; Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-31net: tls: export protocol version, cipher, tx_conf/rx_conf to socket diagDavide Caratti
When an application configures kernel TLS on top of a TCP socket, it's now possible for inet_diag_handler() to collect information regarding the protocol version, the cipher type and TX / RX configuration, in case INET_DIAG_INFO is requested. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-31tcp: ulp: add functions to dump ulp-specific informationDavide Caratti
currently, only getsockopt(TCP_ULP) can be invoked to know if a ULP is on top of a TCP socket. Extend idiag_get_aux() and idiag_get_aux_size(), introduced by commit b37e88407c1d ("inet_diag: allow protocols to provide additional data"), to report the ULP name and other information that can be made available by the ULP through optional functions. Users having CAP_NET_ADMIN privileges will then be able to retrieve this information through inet_diag_handler, if they specify INET_DIAG_INFO in the request. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-31net/tls: use RCU protection on icsk->icsk_ulp_dataJakub Kicinski
We need to make sure context does not get freed while diag code is interrogating it. Free struct tls_context with kfree_rcu(). We add the __rcu annotation directly in icsk, and cast it away in the datapath accessor. Presumably all ULPs will do a similar thing. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-31net: dsa: tag_8021q: Restore bridge VLANs when enabling vlan_filteringVladimir Oltean
The bridge core assumes that enabling/disabling vlan_filtering will translate into the simple toggling of a flag for switchdev drivers. That is clearly not the case for sja1105, which alters the VLAN table and the pvids in order to obtain port separation in standalone mode. There are 2 parts to the issue. First, tag_8021q changes the pvid to a unique per-port rx_vid for frame identification. But we need to disable tag_8021q when vlan_filtering kicks in, and at that point, the VLAN configured as pvid will have to be removed from the filtering table of the ports. With an invalid pvid, the ports will drop all traffic. Since the bridge will not call any vlan operation through switchdev after enabling vlan_filtering, we need to ensure we're in a functional state ourselves. Hence read the pvid that the bridge is aware of, and program that into our ports. Secondly, tag_8021q uses the 1024-3071 range privately in vlan_filtering=0 mode. Had the user installed one of these VLANs during a previous vlan_filtering=1 session, then upon the next tag_8021q cleanup for vlan_filtering to kick in again, VLANs in that range will get deleted unconditionally, hence breaking user expectation. So when deleting the VLANs, check if the bridge had knowledge about them, and if it did, re-apply the settings. Wrap this logic inside a dsa_8021q_vid_apply helper function to reduce code duplication. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-31net: bridge: Populate the pvid flag in br_vlan_get_infoVladimir Oltean
Currently this simplified code snippet fails: br_vlan_get_pvid(netdev, &pvid); br_vlan_get_info(netdev, pvid, &vinfo); ASSERT(!(vinfo.flags & BRIDGE_VLAN_INFO_PVID)); It is intuitive that the pvid of a netdevice should have the BRIDGE_VLAN_INFO_PVID flag set. However I can't seem to pinpoint a commit where this behavior was introduced. It seems like it's been like that since forever. At a first glance it would make more sense to just handle the BRIDGE_VLAN_INFO_PVID flag in __vlan_add_flags. However, as Nikolay explains: There are a few reasons why we don't do it, most importantly because we need to have only one visible pvid at any single time, even if it's stale - it must be just one. Right now that rule will not be violated by this change, but people will try using this flag and could see two pvids simultaneously. You can see that the pvid code is even using memory barriers to propagate the new value faster and everywhere the pvid is read only once. That is the reason the flag is set dynamically when dumping entries, too. A second (weaker) argument against would be given the above we don't want another way to do the same thing, specifically if it can provide us with two pvids (e.g. if walking the vlan list) or if it can provide us with a pvid different from the one set in the vg. [Obviously, I'm talking about RCU pvid/vlan use cases similar to the dumps. The locked cases are fine. I would like to avoid explaining why this shouldn't be relied upon without locking] So instead of introducing the above change and making sure of the pvid uniqueness under RCU, simply dynamically populate the pvid flag in br_vlan_get_info(). Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-30net: sched: cls_matchall: cleanup flow_action before deallocatingVlad Buslov
Recent rtnl lock removal patch changed flow_action infra to require proper cleanup besides simple memory deallocation. However, matchall classifier was not updated to call tc_cleanup_flow_action(). Add proper cleanup to mall_replace_hw_filter() and mall_reoffload(). Fixes: 5a6ff4b13d59 ("net: sched: take reference to action dev before calling offloads") Reported-by: Ido Schimmel <idosch@mellanox.com> Tested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-30tcp_bbr: clarify that bbr_bdp() rounds up in commentsLuke Hsiao
This explicitly clarifies that bbr_bdp() returns the rounded-up value of the bandwidth-delay product and why in the comments. Signed-off-by: Luke Hsiao <lukehsiao@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-30sched: act_vlan: implement stats_update callbackJiri Pirko
Implement this callback in order to get the offloaded stats added to the kernel stats. Reported-by: Pengfei Liu <pengfeil@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27sctp: allow users to set ep ecn flag by sockoptXin Long
SCTP_ECN_SUPPORTED sockopt will be added to allow users to change ep ecn flag, and it's similar with other feature flags. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27sctp: allow users to set netns ecn flag with sysctlXin Long
sysctl net.sctp.ecn_enable is added in this patch. It will allow users to change the default sctp ecn flag, net.sctp.ecn_enable. This feature was also required on this thread: http://lkml.iu.edu/hypermail/linux/kernel/0812.1/01858.html Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27sctp: make ecn flag per netns and endpointXin Long
This patch is to add ecn flag for both netns_sctp and sctp_endpoint, net->sctp.ecn_enable is set 1 by default, and ep->ecn_enable will be initialized with net->sctp.ecn_enable. asoc->peer.ecn_capable will be set during negotiation only when ep->ecn_enable is set on both sides. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: dsa: Advertise the VLAN offload netdev ability only if switch supports itVladimir Oltean
When adding a VLAN sub-interface on a DSA slave port, the 8021q core checks NETIF_F_HW_VLAN_CTAG_FILTER and, if the netdev is capable of filtering, calls .ndo_vlan_rx_add_vid or .ndo_vlan_rx_kill_vid to configure the VLAN offloading. DSA sets this up counter-intuitively: it always advertises this netdev feature, but the underlying driver may not actually support VLAN table manipulation. In that case, the DSA core is forced to ignore the error, because not being able to offload the VLAN is still fine - and should result in the creation of a non-accelerated VLAN sub-interface. Change this so that the netdev feature is only advertised for switch drivers that support VLAN manipulation, instead of checking for -EOPNOTSUPP at runtime. Suggested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: dsa: clear VLAN PVID flag for CPU portVivien Didelot
When the bridge offloads a VLAN on a slave port, we also need to program its dedicated CPU port as a member of the VLAN. Drivers may handle the CPU port's membership as they want. For example, Marvell as a special "Unmodified" mode to pass frames as is through such ports. Even though DSA expects the drivers to handle the CPU port membership, it does not make sense to program user VLANs as PVID on the CPU port. This patch clears this flag before programming the CPU port. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: dsa: program VLAN on CPU port from slaveVivien Didelot
DSA currently programs a VLAN on the CPU port implicitly after the related notifier is received by a switch. While we still need to do this transparent programmation of the DSA links in the fabric, programming the CPU port this way may cause problems in some corners such as the tag_8021q driver. Because the dedicated CPU port is specific to a slave, make their programmation explicit a few layers up, in the slave code. Note that technically, DSA links have a dedicated CPU port as well, but since they are only used as conduit between interconnected switches of a fabric, programming them transparently this way is what we want. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: dsa: check bridge VLAN in slave operationsVivien Didelot
The bridge VLANs are not offloaded by dsa_port_vlan_* if the port is not bridged or if its bridge is not VLAN aware. This is a good thing but other corners of DSA, such as the tag_8021q driver, may need to program VLANs regardless the bridge state. And also because bridge_dev is specific to user ports anyway, move these checks were it belongs, one layer up in the slave code. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: dsa: add slave VLAN helpersVivien Didelot
Add dsa_slave_vlan_add and dsa_slave_vlan_del helpers to handle SWITCHDEV_OBJ_ID_PORT_VLAN switchdev objects. Also copy the switchdev_obj_port_vlan structure on add since we will modify it in future patches. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: dsa: do not skip -EOPNOTSUPP in dsa_port_vid_addVivien Didelot
Currently dsa_port_vid_add returns 0 if the switch returns -EOPNOTSUPP. This function is used in the tag_8021q.c code to offload the PVID of ports, which would simply not work if .port_vlan_add is not supported by the underlying switch. Do not skip -EOPNOTSUPP in dsa_port_vid_add but only when necessary, that is to say in dsa_slave_vlan_rx_add_vid. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27net: dsa: remove bitmap operationsVivien Didelot
The bitmap operations were introduced to simplify the switch drivers in the future, since most of them could implement the common VLAN and MDB operations (add, del, dump) with simple functions taking all target ports at once, and thus limiting the number of hardware accesses. Programming an MDB or VLAN this way in a single operation would clearly simplify the drivers a lot but would require a new get-set interface in DSA. The usage of such bitmap from the stack also raised concerned in the past, leading to the dynamic allocation of a new ds->_bitmap member in the dsa_switch structure. So let's get rid of them for now. This commit nicely wraps the ds->ops->port_{mdb,vlan}_{prepare,add} switch operations into new dsa_switch_{mdb,vlan}_{prepare,add} variants not using any bitmap argument anymore. New dsa_switch_{mdb,vlan}_match helpers have been introduced to make clear which local port of a switch must be programmed with the target object. While the targeted user port is an obvious candidate, the DSA links must also be programmed, as well as the CPU port for VLANs. While at it, also remove local variables that are only used once. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Minor conflict in r8169, bug fix had two versions in net and net-next, take the net-next hunks. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27Merge tag 'nfs-for-5.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Stable fixes: - Fix a page lock leak in nfs_pageio_resend() - Ensure O_DIRECT reports an error if the bytes read/written is 0 - Don't handle errors if the bind/connect succeeded - Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidat ed" Bugfixes: - Don't refresh attributes with mounted-on-file information - Fix return values for nfs4_file_open() and nfs_finish_open() - Fix pnfs layoutstats reporting of I/O errors - Don't use soft RPC calls for pNFS/flexfiles I/O, and don't abort for soft I/O errors when the user specifies a hard mount. - Various fixes to the error handling in sunrpc - Don't report writepage()/writepages() errors twice" * tag 'nfs-for-5.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: remove set but not used variable 'mapping' NFSv2: Fix write regression NFSv2: Fix eof handling NFS: Fix writepage(s) error handling to not report errors twice NFS: Fix spurious EIO read errors pNFS/flexfiles: Don't time out requests on hard mounts SUNRPC: Handle connection breakages correctly in call_status() Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidated" SUNRPC: Handle EADDRINUSE and ENOBUFS correctly pNFS/flexfiles: Turn off soft RPC calls SUNRPC: Don't handle errors if the bind/connect succeeded NFS: On fatal writeback errors, we need to call nfs_inode_remove_request() NFS: Fix initialisation of I/O result struct in nfs_pgio_rpcsetup NFS: Ensure O_DIRECT reports an error if the bytes read/written is 0 NFSv4/pnfs: Fix a page lock leak in nfs_pageio_resend() NFSv4: Fix return value in nfs_finish_open() NFSv4: Fix return values for nfs4_file_open() NFS: Don't refresh attributes with mounted-on-file information
2019-08-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Use 32-bit index for tails calls in s390 bpf JIT, from Ilya Leoshkevich. 2) Fix missed EPOLLOUT events in TCP, from Eric Dumazet. Same fix for SMC from Jason Baron. 3) ipv6_mc_may_pull() should return 0 for malformed packets, not -EINVAL. From Stefano Brivio. 4) Don't forget to unpin umem xdp pages in error path of xdp_umem_reg(). From Ivan Khoronzhuk. 5) Fix sta object leak in mac80211, from Johannes Berg. 6) Fix regression by not configuring PHYLINK on CPU port of bcm_sf2 switches. From Florian Fainelli. 7) Revert DMA sync removal from r8169 which was causing regressions on some MIPS Loongson platforms. From Heiner Kallweit. 8) Use after free in flow dissector, from Jakub Sitnicki. 9) Fix NULL derefs of net devices during ICMP processing across collect_md tunnels, from Hangbin Liu. 10) proto_register() memory leaks, from Zhang Lin. 11) Set NLM_F_MULTI flag in multipart netlink messages consistently, from John Fastabend. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits) r8152: Set memory to all 0xFFs on failed reg reads openvswitch: Fix conntrack cache with timeout ipv4: mpls: fix mpls_xmit for iptunnel nexthop: Fix nexthop_num_path for blackhole nexthops net: rds: add service level support in rds-info net: route dump netlink NLM_F_MULTI flag missing s390/qeth: reject oversized SNMP requests sock: fix potential memory leak in proto_register() MAINTAINERS: Add phylink keyword to SFF/SFP/SFP+ MODULE SUPPORT xfrm/xfrm_policy: fix dst dev null pointer dereference in collect_md mode ipv4/icmp: fix rt dst dev null pointer dereference openvswitch: Fix log message in ovs conntrack bpf: allow narrow loads of some sk_reuseport_md fields with offset > 0 bpf: fix use after free in prog symbol exposure bpf: fix precision tracking in presence of bpf2bpf calls flow_dissector: Fix potential use-after-free on BPF_PROG_DETACH Revert "r8169: remove not needed call to dma_sync_single_for_device" ipv6: propagate ipv6_add_dev's error returns out of ipv6_find_idev net/ncsi: Fix the payload copying for the request coming from Netlink qed: Add cleanup in qed_slowpath_start() ...
2019-08-26net: sched: flower: don't take rtnl lock for cls hw offloads APIVlad Buslov
Don't manually take rtnl lock in flower classifier before calling cls hardware offloads API. Instead, pass rtnl lock status via 'rtnl_held' parameter. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26net: sched: copy tunnel info when setting flow_action entry->tunnelVlad Buslov
In order to remove dependency on rtnl lock, modify tc_setup_flow_action() to copy tunnel info, instead of just saving pointer to tunnel_key action tunnel info. This is necessary to prevent concurrent action overwrite from releasing tunnel info while it is being used by rtnl-unlocked driver. Implement helper tcf_tunnel_info_copy() that is used to copy tunnel info with all its options to dynamically allocated memory block. Modify tc_cleanup_flow_action() to free dynamically allocated tunnel info. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26net: sched: take reference to action dev before calling offloadsVlad Buslov
In order to remove dependency on rtnl lock when calling hardware offload API, take reference to action mirred dev when initializing flow_action structure in tc_setup_flow_action(). Implement function tc_cleanup_flow_action(), use it to release the device after hardware offload API is done using it. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26net: sched: take rtnl lock in tc_setup_flow_action()Vlad Buslov
In order to allow using new flow_action infrastructure from unlocked classifiers, modify tc_setup_flow_action() to accept new 'rtnl_held' argument. Take rtnl lock before accessing tc_action data. This is necessary to protect from concurrent action replace. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26net: sched: conditionally obtain rtnl lock in cls hw offloads APIVlad Buslov
In order to remove dependency on rtnl lock from offloads code of classifiers, take rtnl lock conditionally before executing driver callbacks. Only obtain rtnl lock if block is bound to devices that require it. Block bind/unbind code is rtnl-locked and obtains block->cb_lock while holding rtnl lock. Obtain locks in same order in tc_setup_cb_*() functions to prevent deadlock. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26net: sched: add API for registering unlocked offload block callbacksVlad Buslov
Extend struct flow_block_offload with "unlocked_driver_cb" flag to allow registering and unregistering block hardware offload callbacks that do not require caller to hold rtnl lock. Extend tcf_block with additional lockeddevcnt counter that is incremented for each non-unlocked driver callback attached to device. This counter is necessary to conditionally obtain rtnl lock before calling hardware callbacks in following patches. Register mlx5 tc block offload callbacks as "unlocked". Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26net: sched: notify classifier on successful offload add/deleteVlad Buslov
To remove dependency on rtnl lock, extend classifier ops with new ops->hw_add() and ops->hw_del() callbacks. Call them from cls API while holding cb_lock every time filter if successfully added to or deleted from hardware. Implement the new API in flower classifier. Use it to manage hw_filters list under cb_lock protection, instead of relying on rtnl lock to synchronize with concurrent fl_reoffload() call. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26net: sched: refactor block offloads counter usageVlad Buslov
Without rtnl lock protection filters can no longer safely manage block offloads counter themselves. Refactor cls API to protect block offloadcnt with tcf_block->cb_lock that is already used to protect driver callback list and nooffloaddevcnt counter. The counter can be modified by concurrent tasks by new functions that execute block callbacks (which is safe with previous patch that changed its type to atomic_t), however, block bind/unbind code that checks the counter value takes cb_lock in write mode to exclude any concurrent modifications. This approach prevents race conditions between bind/unbind and callback execution code but allows for concurrency for tc rule update path. Move block offload counter, filter in hardware counter and filter flags management from classifiers into cls hardware offloads API. Make functions tcf_block_offload_{inc|dec}() and tc_cls_offload_cnt_update() to be cls API private. Implement following new cls API to be used instead: tc_setup_cb_add() - non-destructive filter add. If filter that wasn't already in hardware is successfully offloaded, increment block offloads counter, set filter in hardware counter and flag. On failure, previously offloaded filter is considered to be intact and offloads counter is not decremented. tc_setup_cb_replace() - destructive filter replace. Release existing filter block offload counter and reset its in hardware counter and flag. Set new filter in hardware counter and flag. On failure, previously offloaded filter is considered to be destroyed and offload counter is decremented. tc_setup_cb_destroy() - filter destroy. Unconditionally decrement block offloads counter. tc_setup_cb_reoffload() - reoffload filter to single cb. Execute cb() and call tc_cls_offload_cnt_update() if cb() didn't return an error. Refactor all offload-capable classifiers to atomically offload filters to hardware, change block offload counter, and set filter in hardware counter and flag by means of the new cls API functions. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26net: sched: change tcf block offload counter type to atomic_tVlad Buslov
As a preparation for running proto ops functions without rtnl lock, change offload counter type to atomic. This is necessary to allow updating the counter by multiple concurrent users when offloading filters to hardware from unlocked classifiers. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26net: sched: protect block offload-related fields with rw_semaphoreVlad Buslov
In order to remove dependency on rtnl lock, extend tcf_block with 'cb_lock' rwsem and use it to protect flow_block->cb_list and related counters from concurrent modification. The lock is taken in read mode for read-only traversal of cb_list in tc_setup_cb_call() and write mode in all other cases. This approach ensures that: - cb_list is not changed concurrently while filters is being offloaded on block. - block->nooffloaddevcnt is checked while holding the lock in read mode, but is only changed by bind/unbind code when holding the cb_lock in write mode to prevent concurrent modification. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26SUNRPC: Handle connection breakages correctly in call_status()Trond Myklebust
If the connection breaks while we're waiting for a reply from the server, then we want to immediately try to reconnect. Fixes: ec6017d90359 ("SUNRPC fix regression in umount of a secure mount") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-26Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidated"Trond Myklebust
This reverts commit a79f194aa4879e9baad118c3f8bb2ca24dbef765. The mechanism for aborting I/O is racy, since we are not guaranteed that the request is asleep while we're changing both task->tk_status and task->tk_action. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v5.1
2019-08-26SUNRPC: Handle EADDRINUSE and ENOBUFS correctlyTrond Myklebust
If a connect or bind attempt returns EADDRINUSE, that means we want to retry with a different port. It is not a fatal connection error. Similarly, ENOBUFS is not fatal, but just indicates a memory allocation issue. Retry after a short delay. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-26SUNRPC: Don't handle errors if the bind/connect succeededTrond Myklebust
Don't handle errors in call_bind_status()/call_connect_status() if it turns out that a previous call caused it to succeed. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v5.1+
2019-08-25openvswitch: Fix conntrack cache with timeoutYi-Hung Wei
This patch addresses a conntrack cache issue with timeout policy. Currently, we do not check if the timeout extension is set properly in the cached conntrack entry. Thus, after packet recirculate from conntrack action, the timeout policy is not applied properly. This patch fixes the aforementioned issue. Fixes: 06bd2bdf19d2 ("openvswitch: Add timeout support to ct action") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-25ipv4: mpls: fix mpls_xmit for iptunnelAlexey Kodanev
When using mpls over gre/gre6 setup, rt->rt_gw4 address is not set, the same for rt->rt_gw_family. Therefore, when rt->rt_gw_family is checked in mpls_xmit(), neigh_xmit() call is skipped. As a result, such setup doesn't work anymore. This issue was found with LTP mpls03 tests. Fixes: 1550c171935d ("ipv4: Prepare rtable for IPv6 gateway") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24net: rds: add service level support in rds-infoZhu Yanjun
>From IB specific 7.6.5 SERVICE LEVEL, Service Level (SL) is used to identify different flows within an IBA subnet. It is carried in the local route header of the packet. Before this commit, run "rds-info -I". The outputs are as below: " RDS IB Connections: LocalAddr RemoteAddr Tos SL LocalDev RemoteDev 192.2.95.3 192.2.95.1 2 0 fe80::21:28:1a:39 fe80::21:28:10:b9 192.2.95.3 192.2.95.1 1 0 fe80::21:28:1a:39 fe80::21:28:10:b9 192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9 " After this commit, the output is as below: " RDS IB Connections: LocalAddr RemoteAddr Tos SL LocalDev RemoteDev 192.2.95.3 192.2.95.1 2 2 fe80::21:28:1a:39 fe80::21:28:10:b9 192.2.95.3 192.2.95.1 1 1 fe80::21:28:1a:39 fe80::21:28:10:b9 192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9 " The commit fe3475af3bdf ("net: rds: add per rds connection cache statistics") adds cache_allocs in struct rds_info_rdma_connection as below: struct rds_info_rdma_connection { ... __u32 rdma_mr_max; __u32 rdma_mr_size; __u8 tos; __u32 cache_allocs; }; The peer struct in rds-tools of struct rds_info_rdma_connection is as below: struct rds_info_rdma_connection { ... uint32_t rdma_mr_max; uint32_t rdma_mr_size; uint8_t tos; uint8_t sl; uint32_t cache_allocs; }; The difference between userspace and kernel is the member variable sl. In the kernel struct, the member variable sl is missing. This will introduce risks. So it is necessary to use this commit to avoid this risk. Fixes: fe3475af3bdf ("net: rds: add per rds connection cache statistics") CC: Joe Jin <joe.jin@oracle.com> CC: JUNXIAO_BI <junxiao.bi@oracle.com> Suggested-by: Gerd Rausch <gerd.rausch@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24net: route dump netlink NLM_F_MULTI flag missingJohn Fastabend
An excerpt from netlink(7) man page, In multipart messages (multiple nlmsghdr headers with associated payload in one byte stream) the first and all following headers have the NLM_F_MULTI flag set, except for the last header which has the type NLMSG_DONE. but, after (ee28906) there is a missing NLM_F_MULTI flag in the middle of a FIB dump. The result is user space applications following above man page excerpt may get confused and may stop parsing msg believing something went wrong. In the golang netlink lib [0] the library logic stops parsing believing the message is not a multipart message. Found this running Cilium[1] against net-next while adding a feature to auto-detect routes. I noticed with multiple route tables we no longer could detect the default routes on net tree kernels because the library logic was not returning them. Fix this by handling the fib_dump_info_fnhe() case the same way the fib_dump_info() handles it by passing the flags argument through the call chain and adding a flags argument to rt_fill_info(). Tested with Cilium stack and auto-detection of routes works again. Also annotated libs to dump netlink msgs and inspected NLM_F_MULTI and NLMSG_DONE flags look correct after this. Note: In inet_rtm_getroute() pass rt_fill_info() '0' for flags the same as is done for fib_dump_info() so this looks correct to me. [0] https://github.com/vishvananda/netlink/ [1] https://github.com/cilium/ Fixes: ee28906fd7a14 ("ipv4: Dump route exceptions if requested") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24sock: fix potential memory leak in proto_register()zhanglin
If protocols registered exceeded PROTO_INUSE_NR, prot will be added to proto_list, but no available bit left for prot in proto_inuse_idx. Changes since v2: * Propagate the error code properly Signed-off-by: zhanglin <zhang.lin16@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24net/core/skmsg: Delete an unnecessary check before the function call ↵Markus Elfring
“consume_skb” The consume_skb() function performs also input parameter validation. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24xfrm/xfrm_policy: fix dst dev null pointer dereference in collect_md modeHangbin Liu
In decode_session{4,6} there is a possibility that the skb dst dev is NULL, e,g, with tunnel collect_md mode, which will cause kernel crash. Here is what the code path looks like, for GRE: - ip6gre_tunnel_xmit - ip6gre_xmit_ipv6 - __gre6_xmit - ip6_tnl_xmit - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE - icmpv6_send - icmpv6_route_lookup - xfrm_decode_session_reverse - decode_session4 - oif = skb_dst(skb)->dev->ifindex; <-- here - decode_session6 - oif = skb_dst(skb)->dev->ifindex; <-- here The reason is __metadata_dst_init() init dst->dev to NULL by default. We could not fix it in __metadata_dst_init() as there is no dev supplied. On the other hand, the skb_dst(skb)->dev is actually not needed as we called decode_session{4,6} via xfrm_decode_session_reverse(), so oif is not used by: fl4->flowi4_oif = reverse ? skb->skb_iif : oif; So make a dst dev check here should be clean and safe. v4: No changes. v3: No changes. v2: fix the issue in decode_session{4,6} instead of updating shared dst dev in {ip_md, ip6}_tunnel_xmit. Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24ipv4/icmp: fix rt dst dev null pointer dereferenceHangbin Liu
In __icmp_send() there is a possibility that the rt->dst.dev is NULL, e,g, with tunnel collect_md mode, which will cause kernel crash. Here is what the code path looks like, for GRE: - ip6gre_tunnel_xmit - ip6gre_xmit_ipv4 - __gre6_xmit - ip6_tnl_xmit - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE - icmp_send - net = dev_net(rt->dst.dev); <-- here The reason is __metadata_dst_init() init dst->dev to NULL by default. We could not fix it in __metadata_dst_init() as there is no dev supplied. On the other hand, the reason we need rt->dst.dev is to get the net. So we can just try get it from skb->dev when rt->dst.dev is NULL. v4: Julian Anastasov remind skb->dev also could be NULL. We'd better still use dst.dev and do a check to avoid crash. v3: No changes. v2: fix the issue in __icmp_send() instead of updating shared dst dev in {ip_md, ip6}_tunnel_xmit. Fixes: c8b34e680a09 ("ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Julian Anastasov <ja@ssi.bg> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24openvswitch: Fix log message in ovs conntrackYi-Hung Wei
Fixes: 06bd2bdf19d2 ("openvswitch: Add timeout support to ct action") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24Merge branch 'ieee802154-for-davem-2019-08-24' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan Stefan Schmidt says: ==================== pull-request: ieee802154 for net 2019-08-24 An update from ieee802154 for your *net* tree. Yue Haibing fixed two bugs discovered by KASAN in the hwsim driver for ieee802154 and Colin Ian King cleaned up a redundant variable assignment. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2019-08-24 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix verifier precision tracking with BPF-to-BPF calls, from Alexei. 2) Fix a use-after-free in prog symbol exposure, from Daniel. 3) Several s390x JIT fixes plus BE related fixes in BPF kselftests, from Ilya. 4) Fix memory leak by unpinning XDP umem pages in error path, from Ivan. 5) Fix a potential use-after-free on flow dissector detach, from Jakub. 6) Fix bpftool to close prog fd after showing metadata, from Quentin. 7) BPF kselftest config and TEST_PROGS_EXTENDED fixes, from Anders. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>