summaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)Author
2018-03-31inet: frags: add a pointer to struct netns_fragsEric Dumazet
In order to simplify the API, add a pointer to struct inet_frags. This will allow us to make things less complex. These functions no longer have a struct inet_frags parameter : inet_frag_destroy(struct inet_frag_queue *q /*, struct inet_frags *f */) inet_frag_put(struct inet_frag_queue *q /*, struct inet_frags *f */) inet_frag_kill(struct inet_frag_queue *q /*, struct inet_frags *f */) inet_frags_exit_net(struct netns_frags *nf /*, struct inet_frags *f */) ip6_expire_frag_queue(struct net *net, struct frag_queue *fq) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31inet: frags: change inet_frags_init_net() return valueEric Dumazet
We will soon initialize one rhashtable per struct netns_frags in inet_frags_init_net(). This patch changes the return value to eventually propagate an error. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31ipv6: frag: remove unused fieldEric Dumazet
csum field in struct frag_queue is not used, remove it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31bpf: Hooks for sys_connectAndrey Ignatov
== The problem == See description of the problem in the initial patch of this patch set. == The solution == The patch provides much more reliable in-kernel solution for the 2nd part of the problem: making outgoing connecttion from desired IP. It adds new attach types `BPF_CGROUP_INET4_CONNECT` and `BPF_CGROUP_INET6_CONNECT` for program type `BPF_PROG_TYPE_CGROUP_SOCK_ADDR` that can be used to override both source and destination of a connection at connect(2) time. Local end of connection can be bound to desired IP using newly introduced BPF-helper `bpf_bind()`. It allows to bind to only IP though, and doesn't support binding to port, i.e. leverages `IP_BIND_ADDRESS_NO_PORT` socket option. There are two reasons for this: * looking for a free port is expensive and can affect performance significantly; * there is no use-case for port. As for remote end (`struct sockaddr *` passed by user), both parts of it can be overridden, remote IP and remote port. It's useful if an application inside cgroup wants to connect to another application inside same cgroup or to itself, but knows nothing about IP assigned to the cgroup. Support is added for IPv4 and IPv6, for TCP and UDP. IPv4 and IPv6 have separate attach types for same reason as sys_bind hooks, i.e. to prevent reading from / writing to e.g. user_ip6 fields when user passes sockaddr_in since it'd be out-of-bound. == Implementation notes == The patch introduces new field in `struct proto`: `pre_connect` that is a pointer to a function with same signature as `connect` but is called before it. The reason is in some cases BPF hooks should be called way before control is passed to `sk->sk_prot->connect`. Specifically `inet_dgram_connect` autobinds socket before calling `sk->sk_prot->connect` and there is no way to call `bpf_bind()` from hooks from e.g. `ip4_datagram_connect` or `ip6_datagram_connect` since it'd cause double-bind. On the other hand `proto.pre_connect` provides a flexible way to add BPF hooks for connect only for necessary `proto` and call them at desired time before `connect`. Since `bpf_bind()` is allowed to bind only to IP and autobind in `inet_dgram_connect` binds only port there is no chance of double-bind. bpf_bind() sets `force_bind_address_no_port` to bind to only IP despite of value of `bind_address_no_port` socket field. bpf_bind() sets `with_lock` to `false` when calling to __inet_bind() and __inet6_bind() since all call-sites, where bpf_bind() is called, already hold socket lock. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-31net: Introduce __inet_bind() and __inet6_bindAndrey Ignatov
Refactor `bind()` code to make it ready to be called from BPF helper function `bpf_bind()` (will be added soon). Implementation of `inet_bind()` and `inet6_bind()` is separated into `__inet_bind()` and `__inet6_bind()` correspondingly. These function can be used from both `sk_prot->bind` and `bpf_bind()` contexts. New functions have two additional arguments. `force_bind_address_no_port` forces binding to IP only w/o checking `inet_sock.bind_address_no_port` field. It'll allow to bind local end of a connection to desired IP in `bpf_bind()` w/o changing `bind_address_no_port` field of a socket. It's useful since `bpf_bind()` can return an error and we'd need to restore original value of `bind_address_no_port` in that case if we changed this before calling to the helper. `with_lock` specifies whether to lock socket when working with `struct sk` or not. The argument is set to `true` for `sk_prot->bind`, i.e. old behavior is preserved. But it will be set to `false` for `bpf_bind()` use-case. The reason is all call-sites, where `bpf_bind()` will be called, already hold that socket lock. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree. This batch comes with more input sanitization for xtables to address bug reports from fuzzers, preparation works to the flowtable infrastructure and assorted updates. In no particular order, they are: 1) Make sure userspace provides a valid standard target verdict, from Florian Westphal. 2) Sanitize error target size, also from Florian. 3) Validate that last rule in basechain matches underflow/policy since userspace assumes this when decoding the ruleset blob that comes from the kernel, from Florian. 4) Consolidate hook entry checks through xt_check_table_hooks(), patch from Florian. 5) Cap ruleset allocations at 512 mbytes, 134217728 rules and reject very large compat offset arrays, so we have a reasonable upper limit and fuzzers don't exercise the oom-killer. Patches from Florian. 6) Several WARN_ON checks on xtables mutex helper, from Florian. 7) xt_rateest now has a hashtable per net, from Cong Wang. 8) Consolidate counter allocation in xt_counters_alloc(), from Florian. 9) Earlier xt_table_unlock() call in {ip,ip6,arp,eb}tables, patch from Xin Long. 10) Set FLOW_OFFLOAD_DIR_* to IP_CT_DIR_* definitions, patch from Felix Fietkau. 11) Consolidate code through flow_offload_fill_dir(), also from Felix. 12) Inline ip6_dst_mtu_forward() just like ip_dst_mtu_maybe_forward() to remove a dependency with flowtable and ipv6.ko, from Felix. 13) Cache mtu size in flow_offload_tuple object, this is safe for forwarding as f87c10a8aa1e describes, from Felix. 14) Rename nf_flow_table.c to nf_flow_table_core.o, to simplify too modular infrastructure, from Felix. 15) Add rt0, rt2 and rt4 IPv6 routing extension support, patch from Ahmed Abdelsalam. 16) Remove unused parameter in nf_conncount_count(), from Yi-Hung Wei. 17) Support for counting only to nf_conncount infrastructure, patch from Yi-Hung Wei. 18) Add strict NFT_CT_{SRC_IP,DST_IP,SRC_IP6,DST_IP6} key datatypes to nft_ct. 19) Use boolean as return value from ipt_ah and from IPVS too, patch from Gustavo A. R. Silva. 20) Remove useless parameters in nfnl_acct_overquota() and nf_conntrack_broadcast_help(), from Taehee Yoo. 21) Use ipv6_addr_is_multicast() from xt_cluster, also from Taehee Yoo. 22) Statify nf_tables_obj_lookup_byhandle, patch from Fengguang Wu. 23) Fix typo in xt_limit, from Geert Uytterhoeven. 24) Do no use VLAs in Netfilter code, again from Gustavo. 25) Use ADD_COUNTER from ebtables, from Taehee Yoo. 26) Bitshift support for CONNMARK and MARK targets, from Jack Ma. 27) Use pr_*() and add pr_fmt(), from Arushi Singhal. 28) Add synproxy support to ctnetlink. 29) ICMP type and IGMP matching support for ebtables, patches from Matthias Schiffer. 30) Support for the revision infrastructure to ebtables, from Bernie Harris. 31) String match support for ebtables, also from Bernie. 32) Documentation for the new flowtable infrastructure. 33) Use generic comparison functions in ebt_stp, from Joe Perches. 34) Demodularize filter chains in nftables. 35) Register conntrack hooks in case nftables NAT chain is added. 36) Merge assignments with return in a couple of spots in the Netfilter codebase, also from Arushi. 37) Document that xtables percpu counters are stored in the same memory area, from Ben Hutchings. 38) Revert mark_source_chains() sanity checks that break existing rulesets, from Florian Westphal. 39) Use is_zero_ether_addr() in the ipset codebase, from Joe Perches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30xfrm: Register xfrm_dev_notifier in appropriate placeKirill Tkhai
Currently, driver registers it from pernet_operations::init method, and this breaks modularity, because initialization of net namespace and netdevice notifiers are orthogonal actions. We don't have per-namespace netdevice notifiers; all of them are global for all devices in all namespaces. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-30netfilter: nf_tables: rename to nft_set_lookup_global()Pablo Neira Ayuso
To prepare shorter introduction of shorter function prefix. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-30netfilter: nf_tables: enable conntrack if NAT chain is registeredPablo Neira Ayuso
Register conntrack hooks if the user adds NAT chains. Users get confused with the existing behaviour since they will see no packets hitting this chain until they add the first rule that refers to conntrack. This patch adds new ->init() and ->free() indirections to chain types that can be used by NAT chains to invoke the conntrack dependency. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-30netfilter: nf_tables: build-in filter chain typePablo Neira Ayuso
One module per supported filter chain family type takes too much memory for very little code - too much modularization - place all chain filter definitions in one single file. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-30netfilter: nf_tables: nft_register_chain_type() returns voidPablo Neira Ayuso
Use WARN_ON() instead since it should not happen that neither family goes over NFPROTO_NUMPROTO nor there is already a chain of this type already registered. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-30netfilter: nf_tables: rename struct nf_chain_typePablo Neira Ayuso
Use nft_ prefix. By when I added chain types, I forgot to use the nftables prefix. Rename enum nft_chain_type to enum nft_chain_types too, otherwise there is an overlap. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-30bpf: sockmap redirect ingress supportJohn Fastabend
Add support for the BPF_F_INGRESS flag in sk_msg redirect helper. To do this add a scatterlist ring for receiving socks to check before calling into regular recvmsg call path. Additionally, because the poll wakeup logic only checked the skb recv queue we need to add a hook in TCP stack (similar to write side) so that we have a way to wake up polling socks when a scatterlist is redirected to that sock. After this all that is needed is for the redirect helper to push the scatterlist into the psock receive queue. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-29Merge tag 'mac80211-next-for-davem-2018-03-29' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== We have a fair number of patches, but many of them are from the first bullet here: * EAPoL-over-nl80211 from Denis - this will let us fix some long-standing issues with bridging, races with encryption and more * DFS offload support from the qtnfmac folks * regulatory database changes for the new ETSI adaptivity requirements * various other fixes and small enhancements ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29net: Introduce net_rwsem to protect net_namespace_listKirill Tkhai
rtnl_lock() is used everywhere, and contention is very high. When someone wants to iterate over alive net namespaces, he/she has no a possibility to do that without exclusive lock. But the exclusive rtnl_lock() in such places is overkill, and it just increases the contention. Yes, there is already for_each_net_rcu() in kernel, but it requires rcu_read_lock(), and this can't be sleepable. Also, sometimes it may be need really prevent net_namespace_list growth, so for_each_net_rcu() is not fit there. This patch introduces new rw_semaphore, which will be used instead of rtnl_mutex to protect net_namespace_list. It is sleepable and allows not-exclusive iterations over net namespaces list. It allows to stop using rtnl_lock() in several places (what is made in next patches) and makes less the time, we keep rtnl_mutex. Here we just add new lock, while the explanation of we can remove rtnl_lock() there are in next patches. Fine grained locks generally are better, then one big lock, so let's do that with net_namespace_list, while the situation allows that. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29nl80211: Add control_port_over_nl80211 to mesh_setupDenis Kenzior
Signed-off-by: Denis Kenzior <denkenz@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-03-29nl80211: Add control_port_over_nl80211 for ibssDenis Kenzior
Signed-off-by: Denis Kenzior <denkenz@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-03-29nl80211: Add CONTROL_PORT_OVER_NL80211 attributeDenis Kenzior
Signed-off-by: Denis Kenzior <denkenz@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-03-29nl80211: Implement TX of control port framesDenis Kenzior
This commit implements the TX side of NL80211_CMD_CONTROL_PORT_FRAME. Userspace provides the raw EAPoL frame using NL80211_ATTR_FRAME. Userspace should also provide the destination address and the protocol type to use when sending the frame. This is used to implement TX of Pre-authentication frames. If CONTROL_PORT_ETHERTYPE_NO_ENCRYPT is specified, then the driver will be asked not to encrypt the outgoing frame. A new EXT_FEATURE flag is introduced so that nl80211 code can check whether a given wiphy has capability to pass EAPoL frames over nl80211. Signed-off-by: Denis Kenzior <denkenz@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-03-29nl80211: Add CMD_CONTROL_PORT_FRAME APIDenis Kenzior
This commit also adds cfg80211_rx_control_port function. This is used to generate a CMD_CONTROL_PORT_FRAME event out to userspace. The conn_owner_nlportid is used as the unicast destination. This means that userspace must specify NL80211_ATTR_SOCKET_OWNER flag if control port over nl80211 routing is requested in NL80211_CMD_CONNECT, NL80211_CMD_ASSOCIATE, NL80211_CMD_START_AP or IBSS/mesh join. Signed-off-by: Denis Kenzior <denkenz@gmail.com> [johannes: fix return value of cfg80211_rx_control_port()] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-03-29cfg80211: Add API to allow querying regdb for wmm_ruleHaim Dreyfuss
In general regulatory self managed devices maintain their own regulatory profiles thus it doesn't have to query the regulatory database on country change. ETSI has recently introduced a new channel access mechanism for 5GHz that all wlan devices need to comply with. These values are stored in the regulatory database. There are self managed devices which can't maintain these values on their own. Add API to allow self managed regulatory devices to query the regulatory database for high band wmm rule. Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> [johannes: fix documentation] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-03-29cfg80211: read wmm rules from regulatory databaseHaim Dreyfuss
ETSI EN 301 893 v2.1.1 (2017-05) standard defines a new channel access mechanism that all devices (WLAN and LAA) need to comply with. The regulatory database can now be loaded into the kernel and also has the option to load optional data. In order to be able to comply with ETSI standard, we add wmm_rule into regulatory rule and add the option to read its value from the regulatory database. Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> [johannes: fix memory leak in error path] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-03-29cfg80211: fix data type of sta_opmode_info parametertamizhr@codeaurora.org
Currently bw and smps_mode are u8 type value in sta_opmode_info structure. This values filled in mac80211 from ieee80211_sta_rx_bandwidth and ieee80211_smps_mode. These enum values are specific to mac80211 and userspace/cfg80211 doesn't know about that. This will lead to incorrect result/assumption by the user space application. Change bw and smps_mode parameters to their respective enums in nl80211. Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-03-27rxrpc, afs: Use debug_ids rather than pointers in tracesDavid Howells
In rxrpc and afs, use the debug_ids that are monotonically allocated to various objects as they're allocated rather than pointers as kernel pointers are now hashed making them less useful. Further, the debug ids aren't reused anywhere nearly as quickly. In addition, allow kernel services that use rxrpc, such as afs, to take numbers from the rxrpc counter, assign them to their own call struct and pass them in to rxrpc for both client and service calls so that the trace lines for each will have the same ID tag. Signed-off-by: David Howells <dhowells@redhat.com>
2018-03-27net: Add more commentsKirill Tkhai
This adds comments to different places to improve readability. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27net: Rename net_sem to pernet_ops_rwsemKirill Tkhai
net_sem is some undefined area name, so it will be better to make the area more defined. Rename it to pernet_ops_rwsem for better readability and better intelligibility. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27net: Drop pernet_operations::asyncKirill Tkhai
Synchronous pernet_operations are not allowed anymore. All are asynchronous. So, drop the structure member. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27llc: properly handle dev_queue_xmit() return valueCong Wang
llc_conn_send_pdu() pushes the skb into write queue and calls llc_conn_send_pdus() to flush them out. However, the status of dev_queue_xmit() is not returned to caller, in this case, llc_conn_state_process(). llc_conn_state_process() needs hold the skb no matter success or failure, because it still uses it after that, therefore we should hold skb before dev_queue_xmit() when that skb is the one being processed by llc_conn_state_process(). For other callers, they can just pass NULL and ignore the return value as they are. Reported-by: Noam Rathaus <noamr@beyondsecurity.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27sctp: remove unnecessary asoc in sctp_has_associationXin Long
After Commit dae399d7fdee ("sctp: hold transport instead of assoc when lookup assoc in rx path"), it put transport instead of asoc in sctp_has_association. Variable 'asoc' is not used any more. So this patch is to remove it, while at it, it also changes the return type of sctp_has_association to bool, and does the same for it's caller sctp_endpoint_is_peeled_off. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27net: Spelling s/stucture/structure/Geert Uytterhoeven
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2018-03-26ip6mr: Support fib notificationsYuval Mintz
In similar fashion to ipmr, support fib notifications for ip6mr mfc and vif related events. This would later allow drivers to react to said notifications and offload the IPv6 mroutes. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-26net: sched, fix OOO packets with pfifo_fastJohn Fastabend
After the qdisc lock was dropped in pfifo_fast we allow multiple enqueue threads and dequeue threads to run in parallel. On the enqueue side the skb bit ooo_okay is used to ensure all related skbs are enqueued in-order. On the dequeue side though there is no similar logic. What we observe is with fewer queues than CPUs it is possible to re-order packets when two instances of __qdisc_run() are running in parallel. Each thread will dequeue a skb and then whichever thread calls the ndo op first will be sent on the wire. This doesn't typically happen because qdisc_run() is usually triggered by the same core that did the enqueue. However, drivers will trigger __netif_schedule() when queues are transitioning from stopped to awake using the netif_tx_wake_* APIs. When this happens netif_schedule() calls qdisc_run() on the same CPU that did the netif_tx_wake_* which is usually done in the interrupt completion context. This CPU is selected with the irq affinity which is unrelated to the enqueue operations. To resolve this we add a RUNNING bit to the qdisc to ensure only a single dequeue per qdisc is running. Enqueue and dequeue operations can still run in parallel and also on multi queue NICs we can still have a dequeue in-flight per qdisc, which is typically per CPU. Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array") Reported-by: Jakob Unterwurzacher <jakob.unterwurzacher@theobroma-systems.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-25Merge tag 'wireless-drivers-next-for-davem-2018-03-24' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for 4.17 The biggest changes are the bluetooth related patches to the rsi driver. It adds a new bluetooth driver which communicates directly with the wireless driver and the interface is defined in include/net/rsi_91x.h. Major changes: wl1251 * read the MAC address from the NVS file rtlwifi * enable mac80211 fast-tx support mt76 * add capability to select tx/rx antennas mt7601 * let mac80211 validate rx CCMP Packet Number (PN) rsi * bluetooth: add new btrsi driver * btcoex support with the new btrsi driver ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Don't pick fixed hash implementation for NFT_SET_EVAL sets, otherwise userspace hits EOPNOTSUPP with valid rules using the meter statement, from Florian Westphal. 2) If you send a batch that flushes the existing ruleset (that contains a NAT chain) and the new ruleset definition comes with a new NAT chain, don't bogusly hit EBUSY. Also from Florian. 3) Missing netlink policy attribute validation, from Florian. 4) Detach conntrack template from skbuff if IP_NODEFRAG is set on, from Paolo Abeni. 5) Cache device names in flowtable object, otherwise we may end up walking over devices going aways given no rtnl_lock is held. 6) Fix incorrect net_device ingress with ingress hooks. 7) Fix crash when trying to read more data than available in UDP packets from the nf_socket infrastructure, from Subash. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23net/sched: remove tcf_idr_cleanup()Davide Caratti
tcf_idr_cleanup() is no more used, so remove it. Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tls: RX path for ktlsDave Watson
Add rx path for tls software implementation. recvmsg, splice_read, and poll implemented. An additional sockopt TLS_RX is added, with the same interface as TLS_TX. Either TLX_RX or TLX_TX may be provided separately, or together (with two different setsockopt calls with appropriate keys). Control messages are passed via CMSG in a similar way to transmit. If no cmsg buffer is passed, then only application data records will be passed to userspace, and EIO is returned for other types of alerts. EBADMSG is passed for decryption errors, and EMSGSIZE is passed for framing too big, and EBADMSG for framing too small (matching openssl semantics). EINVAL is returned for TLS versions that do not match the original setsockopt call. All are unrecoverable. strparser is used to parse TLS framing. Decryption is done directly in to userspace buffers if they are large enough to support it, otherwise sk_cow_data is called (similar to ipsec), and buffers are decrypted in place and copied. splice_read always decrypts in place, since no buffers are provided to decrypt in to. sk_poll is overridden, and only returns POLLIN if a full TLS message is received. Otherwise we wait for strparser to finish reading a full frame. Actual decryption is only done during recvmsg or splice_read calls. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tls: Refactor variable namesDave Watson
Several config variables are prefixed with tx, drop the prefix since these will be used for both tx and rx. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tls: Pass error code explicitly to tls_err_abortDave Watson
Pass EBADMSG explicitly to tls_err_abort. Receive path will pass additional codes - EMSGSIZE if framing is larger than max TLS record size, EINVAL if TLS version mismatch. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23tls: Move cipher info to a separate structDave Watson
Separate tx crypto parameters to a separate cipher_context struct. The same parameters will be used for rx using the same struct. tls_advance_record_sn is modified to only take the cipher info. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23devlink: Remove top_hierarchy arg for DEVLINK disabled pathDavid Ahern
Earlier change missed the path where CONFIG_NET_DEVLINK is disabled. Thanks to Jiri for spotting. Fixes: 145307460ba9 ("devlink: Remove top_hierarchy arg to devlink_resource_register") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Fun set of conflict resolutions here... For the mac80211 stuff, these were fortunately just parallel adds. Trivially resolved. In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the function phy_disable_interrupts() earlier in the file, whilst in 'net-next' the phy_error() call from this function was removed. In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the 'rt_table_id' member of rtable collided with a bug fix in 'net' that added a new struct member "rt_mtu_locked" which needs to be copied over here. The mlxsw driver conflict consisted of net-next separating the span code and definitions into separate files, whilst a 'net' bug fix made some changes to that moved code. The mlx5 infiniband conflict resolution was quite non-trivial, the RDMA tree's merge commit was used as a guide here, and here are their notes: ==================== Due to bug fixes found by the syzkaller bot and taken into the for-rc branch after development for the 4.17 merge window had already started being taken into the for-next branch, there were fairly non-trivial merge issues that would need to be resolved between the for-rc branch and the for-next branch. This merge resolves those conflicts and provides a unified base upon which ongoing development for 4.17 can be based. Conflicts: drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f9524 (IB/mlx5: Fix cleanup order on unload) added to for-rc and commit b5ca15ad7e61 (IB/mlx5: Add proper representors support) add as part of the devel cycle both needed to modify the init/de-init functions used by mlx5. To support the new representors, the new functions added by the cleanup patch needed to be made non-static, and the init/de-init list added by the representors patch needed to be modified to match the init/de-init list changes made by the cleanup patch. Updates: drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function prototypes added by representors patch to reflect new function names as changed by cleanup patch drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init stage list to match new order from cleanup patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23mac80211: notify driver for change in multicast ratesPradeep Kumar Chitrapu
With drivers implementing rate control in driver or firmware rate_control_send_low() may not get called, and thus the driver needs to know about changes in the multicast rate. Add and use a new BSS change flag for this. Signed-off-by: Pradeep Kumar Chitrapu <pradeepc@codeaurora.org> [rewrite commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-03-22net: Replace ip_ra_lock with per-net mutexKirill Tkhai
Since ra_chain is per-net, we may use per-net mutexes to protect them in ip_ra_control(). This improves scalability. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: Make ip_ra_chain per struct netKirill Tkhai
This is optimization, which makes ip_call_ra_chain() iterate less sockets to find the sockets it's looking for. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22devlink: Remove top_hierarchy arg to devlink_resource_registerDavid Ahern
top_hierarchy arg can be determined by comparing parent_resource_id to DEVLINK_RESOURCE_ID_PARENT_TOP so it does not need to be a separate argument. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22net: add uevent socket memberChristian Brauner
This commit adds struct uevent_sock to struct net. Since struct uevent_sock records the position of the uevent socket in the uevent socket list we can trivially remove it from the uevent socket list during cleanup. This speeds up the old removal codepath. Note, list_del() will hit __list_del_entry_valid() in its call chain which will validate that the element is a member of the list. If it isn't it will take care that the list is not modified. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22netfilter: nf_tables: cache device name in flowtable objectPablo Neira Ayuso
Devices going away have to grab the nfnl_lock from the netdev event path to avoid races with control plane updates. However, netlink dumps in netfilter do not hold nfnl_lock mutex. Cache the device name into the objects to avoid an use-after-free situation for a device that is going away. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-03-21 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add a BPF hook for sendmsg and sendfile by reusing the ULP infrastructure and sockmap. Three helpers are added along with this, bpf_msg_apply_bytes(), bpf_msg_cork_bytes(), and bpf_msg_pull_data(). The first is used to tell for how many bytes the verdict should be applied to, the second to tell that x bytes need to be queued first to retrigger the BPF program for a verdict, and the third helper is mainly for the sendfile case to pull in data for making it private for reading and/or writing, from John. 2) Improve address to symbol resolution of user stack traces in BPF stackmap. Currently, the latter stores the address for each entry in the call trace, however to map these addresses to user space files, it is necessary to maintain the mapping from these virtual addresses to symbols in the binary which is not practical for system-wide profiling. Instead, this option for the stackmap rather stores the ELF build id and offset for the call trace entries, from Song. 3) Add support that allows BPF programs attached to perf events to read the address values recorded with the perf events. They are requested through PERF_SAMPLE_ADDR via perf_event_open(). Main motivation behind it is to support building memory or lock access profiling and tracing tools with the help of BPF, from Teng. 4) Several improvements to the tools/bpf/ Makefiles. The 'make bpf' in the tools directory does not provide the standard quiet output except for bpftool and it also does not respect specifying a build output directory. 'make bpf_install' command neither respects specified destination nor prefix, all from Jiri. In addition, Jakub fixes several other minor issues in the Makefiles on top of that, e.g. fixing dependency paths, phony targets and more. 5) Various doc updates e.g. add a comment for BPF fs about reserved names to make the dentry lookup from there a bit more obvious, and a comment to the bpf_devel_QA file in order to explain the diff between native and bpf target clang usage with regards to pointer size, from Quentin and Daniel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-21mac80211: add ieee80211_hw flag for QoS NDP supportBen Caradoc-Davies
Commit 7b6ddeaf27ec ("mac80211: use QoS NDP for AP probing") added an argument qos_ok to ieee80211_nullfunc_get to support QoS NDP. Despite the claim in the commit log "Change all the drivers to *not* allow QoS NDP for now, even though it looks like most of them should be OK with that", this commit enables QoS NDP in response to beacons (see change to mlme.c:ieee80211_send_nullfunc), causing ath9k_htc to lose IP connectivity. See: https://patchwork.kernel.org/patch/10241109/ https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=891060 Introduce a hardware flag to allow such buggy drivers to override the correct default behaviour of mac80211 of sending QoS NDP packets. Signed-off-by: Ben Caradoc-Davies <ben@transient.nz> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-03-20netfilter: Refactor nf_conncountYi-Hung Wei
Remove parameter 'family' in nf_conncount_count() and count_tree(). It is because the parameter is not useful after commit 625c556118f3 ("netfilter: connlimit: split xt_connlimit into front and backend"). Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>