summaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)Author
2018-09-24neighbour: send netlink notification if NTF_ROUTER changesRoopa Prabhu
send netlink notification if neigh_update results in NTF_ROUTER change and if NEIGH_UPDATE_F_ISROUTER is on. Also move the NTF_ROUTER change function into a helper. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-24net/sched: Add hardware specific counters to TC actionsEelco Chaudron
Add additional counters that will store the bytes/packets processed by hardware. These will be exported through the netlink interface for displaying by the iproute2 tc tool Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-24net/core: Add new basic hardware counterEelco Chaudron
Add a new hardware specific basic counter, TCA_STATS_BASIC_HW. This can be used to count packets/bytes processed by hardware offload. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21tcp: provide earliest departure time in skb->tstampEric Dumazet
Switch internal TCP skb->skb_mstamp to skb->skb_mstamp_ns, from usec units to nsec units. Do not clear skb->tstamp before entering IP stacks in TX, so that qdisc or devices can implement pacing based on the earliest departure time instead of socket sk->sk_pacing_rate Packets are fed with tcp_wstamp_ns, and following patch will update tcp_wstamp_ns when both TCP and sch_fq switch to the earliest departure time mechanism. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21tcp: add tcp_wstamp_ns socket fieldEric Dumazet
TCP will soon provide earliest departure time on TX skbs. It needs to track this in a new variable. tcp_mstamp_refresh() needs to update this variable, and became too big to stay an inline. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21tcp: introduce tcp_skb_timestamp_us() helperEric Dumazet
There are few places where TCP reads skb->skb_mstamp expecting a value in usec unit. skb->tstamp (aka skb->skb_mstamp) will soon store CLOCK_TAI nsec value. Add tcp_skb_timestamp_us() to provide proper conversion when needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21tcp: switch tcp_clock_ns() to CLOCK_TAI baseEric Dumazet
TCP pacing is either implemented in sch_fq or internally. We have the goal of being able to offload pacing on the NICS. TCP will soon provide per skb skb->tstamp as early departure time. Like ETF in commit 25db26a91364 ("net/sched: Introduce the ETF Qdisc") we chose CLOCK_T as the clock base, so that TCP and pacers can share a common clock, to get better RTT samples (without pacing artificially inflating these samples). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21net/tls: Add support for async encryption of records for performanceVakul Garg
In current implementation, tls records are encrypted & transmitted serially. Till the time the previously submitted user data is encrypted, the implementation waits and on finish starts transmitting the record. This approach of encrypt-one record at a time is inefficient when asynchronous crypto accelerators are used. For each record, there are overheads of interrupts, driver softIRQ scheduling etc. Also the crypto accelerator sits idle most of time while an encrypted record's pages are handed over to tcp stack for transmission. This patch enables encryption of multiple records in parallel when an async capable crypto accelerator is present in system. This is achieved by allowing the user space application to send more data using sendmsg() even while previously issued data is being processed by crypto accelerator. This requires returning the control back to user space application after submitting encryption request to accelerator. This also means that zero-copy mode of encryption cannot be used with async accelerator as we must be done with user space application buffer before returning from sendmsg(). There can be multiple records in flight to/from the accelerator. Each of the record is represented by 'struct tls_rec'. This is used to store the memory pages for the record. After the records are encrypted, they are added in a linked list called tx_ready_list which contains encrypted tls records sorted as per tls sequence number. The records from tx_ready_list are transmitted using a newly introduced function called tls_tx_records(). The tx_ready_list is polled for any record ready to be transmitted in sendmsg(), sendpage() after initiating encryption of new tls records. This achieves parallel encryption and transmission of records when async accelerator is present. There could be situation when crypto accelerator completes encryption later than polling of tx_ready_list by sendmsg()/sendpage(). Therefore we need a deferred work context to be able to transmit records from tx_ready_list. The deferred work context gets scheduled if applications are not sending much data through the socket. If the applications issue sendmsg()/sendpage() in quick succession, then the scheduling of tx_work_handler gets cancelled as the tx_ready_list would be polled from application's context itself. This saves scheduling overhead of deferred work. The patch also brings some side benefit. We are able to get rid of the concept of CLOSED record. This is because the records once closed are either encrypted and then placed into tx_ready_list or if encryption fails, the socket error is set. This simplifies the kernel tls sendpath. However since tls_device.c is still using macros, accessory functions for CLOSED records have been retained. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20net/ipv4: Move device validation to helperDavid Ahern
Move the device matching check in __fib_validate_source to a helper and export it for use by netfilter modules. Code move only; no functional change intended. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20netfilter: conntrack: clamp l4proto array size at largers supported protocolFlorian Westphal
All higher l4proto numbers are handled by the generic tracker; the l4proto lookup function already returns generic one in case the l4proto number exceeds max size. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-20netfilter: conntrack: remove l3->l4 mapping informationFlorian Westphal
l4 protocols are demuxed by l3num, l4num pair. However, almost all l4 trackers are l3 agnostic. Only exceptions are: - gre, icmp (ipv4 only) - icmpv6 (ipv6 only) This commit gets rid of the l3 mapping, l4 trackers can now be looked up by their IPPROTO_XXX value alone, which gets rid of the additional l3 indirection. For icmp, ipcmp6 and gre, add a check on state->pf and return -NF_ACCEPT in case we're asked to track e.g. icmpv6-in-ipv4, this seems more fitting than using the generic tracker. Additionally we can kill the 2nd l4proto definitions that were needed for v4/v6 split -- they are now the same so we can use single l4proto struct for each protocol, rather than two. The EXPORT_SYMBOLs can be removed as all these object files are part of nf_conntrack with no external references. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-20netfilter: conntrack: remove unused proto arg from netns init functionsFlorian Westphal
Its unused, next patch will remove l4proto->l3proto number to simplify l4 protocol demuxer lookup. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-20netfilter: conntrack: remove error callback and handle icmp from coreFlorian Westphal
icmp(v6) are the only two layer four protocols that need the error() callback (to handle icmp errors that are related to an established connections, e.g. packet too big, port unreachable and the like). Remove the error callback and handle these two special cases from the core. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-20netfilter: conntrack: deconstify packet callback skb pointerFlorian Westphal
Only two protocols need the ->error() function: icmp and icmpv6. This is because icmp error mssages might be RELATED to an existing connection (e.g. PMTUD, port unreachable and the like), and their ->error() handlers do this. The error callback is already optional, so remove it for udp and call them from ->packet() instead. As the error() callback can call checksum functions that write to skb->csum*, the const qualifier has to be removed as well. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-20netfilter: conntrack: remove the l4proto->new() functionFlorian Westphal
->new() gets invoked after ->error() and before ->packet() if a conntrack lookup has found no result for the tuple. We can fold it into ->packet() -- the packet() implementations can check if the conntrack is confirmed (new) or not (already in hash). If its unconfirmed, the conntrack isn't in the hash yet so current skb created a new conntrack entry. Only relevant side effect -- if packet() doesn't return NF_ACCEPT but -NF_ACCEPT (or drop), while the conntrack was just created, then the newly allocated conntrack is freed right away, rather than not created in the first place. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-20netfilter: conntrack: pass nf_hook_state to packet and error handlersFlorian Westphal
nf_hook_state contains all the hook meta-information: netns, protocol family, hook location, and so on. Instead of only passing selected information, pass a pointer to entire structure. This will allow to merge the error and the packet handlers and remove the ->new() function in followup patches. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-18NFC: Fix the number of pipesSuren Baghdasaryan
According to ETSI TS 102 622 specification chapter 4.4 pipe identifier is 7 bits long which allows for 128 unique pipe IDs. Because NFC_HCI_MAX_PIPES is used as the number of pipes supported and not as the max pipe ID, its value should be 128 instead of 127. nfc_hci_recv_from_llc extracts pipe ID from packet header using NFC_HCI_FRAGMENT(0x7F) mask which allows for pipe ID value of 127. Same happens when NCI_HCP_MSG_GET_PIPE() is being used. With pipes array having only 127 elements and pipe ID of 127 the OOB memory access will result. Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Allen Pais <allen.pais@oracle.com> Cc: "David S. Miller" <davem@davemloft.net> Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18netlink: add ethernet address policy typesJohannes Berg
Commonly, ethernet addresses are just using a policy of { .len = ETH_ALEN } which leaves userspace free to send more data than it should, which may hide bugs. Introduce NLA_EXACT_LEN which checks for exact size, rejecting the attribute if it's not exactly that length. Also add NLA_EXACT_LEN_WARN which requires the minimum length and will warn on longer attributes, for backward compatibility. Use these to define NLA_POLICY_ETH_ADDR (new strict policy) and NLA_POLICY_ETH_ADDR_COMPAT (compatible policy with warning); these are used like this: static const struct nla_policy <name>[...] = { [NL_ATTR_NAME] = NLA_POLICY_ETH_ADDR, ... }; Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18netlink: add NLA_REJECT policy typeJohannes Berg
In some situations some netlink attributes may be used for output only (kernel->userspace) or may be reserved for future use. It's then helpful to be able to prevent userspace from using them in messages sent to the kernel, since they'd otherwise be ignored and any future will become impossible if this happens. Add NLA_REJECT to the policy which does nothing but reject (with EINVAL) validation of any messages containing this attribute. Allow for returning a specific extended ACK error message in the validation_data pointer. While at it clear up the documentation a bit - the NLA_BITFIELD32 documentation was added to the list of len field descriptions. Also, use NL_SET_BAD_ATTR() in one place where it's open-coded. The specific case I have in mind now is a shared nested attribute containing request/response data, and it would be pointless and potentially confusing to have userspace include response data in the messages that actually contain a request. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Two new tls tests added in parallel in both net and net-next. Used Stephen Rothwell's linux-next resolution. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17tls: async support causes out-of-bounds access in crypto APIsJohn Fastabend
When async support was added it needed to access the sk from the async callback to report errors up the stack. The patch tried to use space after the aead request struct by directly setting the reqsize field in aead_request. This is an internal field that should not be used outside the crypto APIs. It is used by the crypto code to define extra space for private structures used in the crypto context. Users of the API then use crypto_aead_reqsize() and add the returned amount of bytes to the end of the request memory allocation before posting the request to encrypt/decrypt APIs. So this breaks (with general protection fault and KASAN error, if enabled) because the request sent to decrypt is shorter than required causing the crypto API out-of-bounds errors. Also it seems unlikely the sk is even valid by the time it gets to the callback because of memset in crypto layer. Anyways, fix this by holding the sk in the skb->sk field when the callback is set up and because the skb is already passed through to the callback handler via void* we can access it in the handler. Then in the handler we need to be careful to NULL the pointer again before kfree_skb. I added comments on both the setup (in tls_do_decryption) and when we clear it from the crypto callback handler tls_decrypt_done(). After this selftests pass again and fixes KASAN errors/warnings. Fixes: 94524d8fc965 ("net/tls: Add support for async decryption of tls records") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Reviewed-by: Vakul Garg <Vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17netfilter: nf_tables: asynchronous releaseFlorian Westphal
Release the committed transaction log from a work queue, moving expensive synchronize_rcu out of the locked section and providing opportunity to batch this. On my test machine this cuts runtime of nft-test.py in half. Based on earlier patch from Pablo Neira Ayuso. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17netfilter: nf_tables: split set destruction in deactivate and destroy phaseFlorian Westphal
Splits unbind_set into destroy_set and unbinding operation. Unbinding removes set from lists (so new transaction would not find it anymore) but keeps memory allocated (so packet path continues to work). Rebind function is added to allow unrolling in case transaction that wants to remove set is aborted. Destroy function is added to free the memory, but this could occur outside of transaction in the future. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-14flow_dissector: implements flow dissector BPF hookPetar Penkov
Adds a hook for programs of type BPF_PROG_TYPE_FLOW_DISSECTOR and attach type BPF_FLOW_DISSECTOR that is executed in the flow dissector path. The BPF program is per-network namespace. Signed-off-by: Petar Penkov <ppenkov@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-09-13vxlan: Remove duplicated include from vxlan.hYueHaibing
Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13tls: zero the crypto information from tls_context before freeingSabrina Dubroca
This contains key material in crypto_send_aes_gcm_128 and crypto_recv_aes_gcm_128. Introduce union tls_crypto_context, and replace the two identical unions directly embedded in struct tls_context with it. We can then use this union to clean up the memory in the new tls_ctx_free() function. Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13net: sock: introduce SOCK_XDPJason Wang
This patch introduces a new sock flag - SOCK_XDP. This will be used for notifying the upper layer that XDP program is attached on the lower socket, and requires for extra headroom. TUN will be the first user. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13llc: avoid blocking in llc_sap_close()Cong Wang
llc_sap_close() is called by llc_sap_put() which could be called in BH context in llc_rcv(). We can't block in BH. There is no reason to block it here, kfree_rcu() should be sufficient. Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13net: dsa: Add Lantiq / Intel GSWIP tag supportHauke Mehrtens
This handles the tag added by the PMAC on the VRX200 SoC line. The GSWIP uses internally a GSWIP special tag which is located after the Ethernet header. The PMAC which connects the GSWIP to the CPU converts this special tag used by the GSWIP into the PMAC special tag which is added in front of the Ethernet header. This was tested with GSWIP 2.1 found in the VRX200 SoCs, other GSWIP versions use slightly different PMAC special tags. Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2018-09-12scsi: libcxgbi: fib6_ino reference in rt6_info is rcu protectedDavid Ahern
The fib6_info reference in rt6_info is rcu protected. Add a helper to extract prefsrc from and update cxgbi_check_route6 to use it. Fixes: 0153167aebd0 ("net/ipv6: Remove rt6i_prefsrc") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for you net tree: 1) Remove duplicated include at the end of UDP conntrack, from Yue Haibing. 2) Restore conntrack dependency on xt_cluster, from Martin Willi. 3) Fix splat with GSO skbs from the checksum target, from Florian Westphal. 4) Rework ct timeout support, the template strategy to attach custom timeouts is not correct since it will not work in conjunction with conntrack zones and we have a possible free after use when removing the rule due to missing refcounting. To fix these problems, do not use conntrack template at all and set custom timeout on the already valid conntrack object. This fix comes with a preparation patch to simplify timeout adjustment by initializating the first position of the timeout array for all of the existing trackers. Patchset from Florian Westphal. 5) Fix missing dependency on from IPv4 chain NAT type, from Florian. 6) Release chain reference counter from the flush path, from Taehee Yoo. 7) After flushing an iptables ruleset, conntrack hooks are unregistered and entries are left stale to be cleaned up by the timeout garbage collector. No TCP tracking is done on established flows by this time. If ruleset is reloaded, then hooks are registered again and TCP tracking is restored, which considers packets to be invalid. Clear window tracking to exercise TCP flow pickup from the middle given that history is lost for us. Again from Florian. 8) Fix crash from netlink interface with CONFIG_NF_CONNTRACK_TIMEOUT=y and CONFIG_NF_CT_NETLINK_TIMEOUT=n. 9) Broken CT target due to returning incorrect type from ctnl_timeout_find_get(). 10) Solve conntrack clash on NF_REPEAT verdicts too, from Michal Vaner. 11) Missing conversion of hashlimit sysctl interface to new API, from Cong Wang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10net: sched: cls_flower: dump offload count valueVlad Buslov
Change flower in_hw_count type to fixed-size u32 and dump it as TCA_FLOWER_IN_HW_COUNT. This change is necessary to properly test shared blocks and re-offload functionality. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10sch_netem: Move private queue handler to generic location.David S. Miller
By hand copies of SKB list handlers do not belong in individual packet schedulers. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10sch_htb: Remove local SKB queue handling code.David S. Miller
Instead, adjust __qdisc_enqueue_tail() such that HTB can use it instead. The only other caller of __qdisc_enqueue_tail() is qdisc_enqueue_tail() so we can move the backlog and return value handling (which HTB doesn't need/want) to the latter. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10net/ipv6: Remove rt6i_prefsrcDavid Ahern
After the conversion to fib6_info, rt6i_prefsrc has a single user that reads the value and otherwise it is only set. The one reader can be converted to use rt->from so rt6i_prefsrc can be removed, reducing rt6_info by another 20 bytes. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-089p: Add refcount to p9_req_tTomas Bortoli
To avoid use-after-free(s), use a refcount to keep track of the usable references to any instantiated struct p9_req_t. This commit adds p9_req_put(), p9_req_get() and p9_req_try_get() as wrappers to kref_put(), kref_get() and kref_get_unless_zero(). These are used by the client and the transports to keep track of valid requests' references. p9_free_req() is added back and used as callback by kref_put(). Add SLAB_TYPESAFE_BY_RCU as it ensures that the memory freed by kmem_cache_free() will not be reused for another type until the rcu synchronisation period is over, so an address gotten under rcu read lock is safe to inc_ref() without corrupting random memory while the lock is held. Link: http://lkml.kernel.org/r/1535626341-20693-1-git-send-email-asmadeus@codewreck.org Co-developed-by: Dominique Martinet <dominique.martinet@cea.fr> Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com> Reported-by: syzbot+467050c1ce275af2a5b8@syzkaller.appspotmail.com Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
2018-09-089p: add a per-client fcall kmem_cacheDominique Martinet
Having a specific cache for the fcall allocations helps speed up end-to-end latency. The caches will automatically be merged if there are multiple caches of items with the same size so we do not need to try to share a cache between different clients of the same size. Since the msize is negotiated with the server, only allocate the cache after that negotiation has happened - previous allocations or allocations of different sizes (e.g. zero-copy fcall) are made with kmalloc directly. Some figures on two beefy VMs with Connect-IB (sriov) / trans=rdma, with ior running 32 processes in parallel doing small 32 bytes IOs: - no alloc (4.18-rc7 request cache): 65.4k req/s - non-power of two alloc, no patch: 61.6k req/s - power of two alloc, no patch: 62.2k req/s - non-power of two alloc, with patch: 64.7k req/s - power of two alloc, with patch: 65.1k req/s Link: http://lkml.kernel.org/r/1532943263-24378-2-git-send-email-asmadeus@codewreck.org Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> Acked-by: Jun Piao <piaojun@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Greg Kurz <groug@kaod.org>
2018-09-089p: embed fcall in req to round down buffer allocsDominique Martinet
'msize' is often a power of two, or at least page-aligned, so avoiding an overhead of two dozen bytes for each allocation will help the allocator do its work and reduce memory fragmentation. Link: http://lkml.kernel.org/r/1533825236-22896-1-git-send-email-asmadeus@codewreck.org Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: Jun Piao <piaojun@huawei.com> Cc: Matthew Wilcox <willy@infradead.org>
2018-09-05rtnetlink: add rtnl_get_net_ns_capable()Christian Brauner
get_target_net() will be used in follow-up patches in ipv{4,6} codepaths to retrieve network namespaces based on network namespace identifiers. So remove the static declaration and export in the rtnetlink header. Also, rename it to rtnl_get_net_ns_capable() to make it obvious what this function is doing. Export rtnl_get_net_ns_capable() so it can be used when ipv6 is built as a module. Signed-off-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05mac80211: add an option for drivers to check if packets can be aggregatedSara Sharon
Some hardwares have limitations on the packets' type in AMSDU. Add an optional driver callback to determine if two skbs can be used in the same AMSDU or not. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05mac80211: allow AMSDU size limitation per-TIDSara Sharon
Some drivers may have AMSDU size limitation per TID, due to HW constrains. Add an option to set this limit. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05mac80211: add an option for station management TXQSara Sharon
We have a TXQ abstraction for non-data packets that need powersave buffering. Since the AP cannot sleep, in case of station we can use this TXQ for all management frames, regardless if they are bufferable. Add HW flag to allow that. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05mac80211: support reporting 0-length PSDU in radiotapShaul Triebitz
For certain sounding frames, it may be useful to report them to userspace even though they don't have a PSDU in order to determine the PHY parameters (e.g. VHT rate/stream config.) Add support for this to mac80211. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05mac80211: Fix PTK rekey freezes and clear text leakAlexander Wetzel
Rekeying PTK keys without "Extended Key ID for Individually Addressed Frames" did use a procedure not suitable to replace in-use keys and could caused the following issues: 1) Freeze caused by incoming frames: If the local STA installed the key prior to the remote STA we still had the old key active in the hardware when mac80211 switched over to the new key. Therefore there was a window where the card could hand over frames decoded with the old key to mac80211 and bump the new PN (IV) value to an incorrect high number. When it happened the local replay detection silently started to drop all frames sent with the new key. 2) Freeze caused by outgoing frames: If mac80211 was providing the PN (IV) and handed over a clear text frame for encryption to the hardware prior to a key change the driver/card could have processed the queued frame after switching to the new key. This bumped the PN value on the remote STA to an incorrect high number, tricking the remote STA to discard all frames we sent later. 3) Freeze caused by RX aggregation reorder buffer: An aggregation session started with the old key and ending after the switch to the new key also bumped the PN to an incorrect high number, freezing the connection quite similar to 1). 4) Freeze caused by repeating lost frames in an aggregation session: A driver could repeat a lost frame and encrypt it with the new key while in a TX aggregation session without updating the PN for the new key. This also could freeze connections similar to 2). 5) Clear text leak: Removing encryption offload from the card cleared the encryption offload flag only after the card had deleted the key and we did not stop TX during the rekey. The driver/card could therefore get unencrypted frames from mac80211 while no longer be instructed to encrypt them. To prevent those issues the key install logic has been changed: - Mac80211 divers known to be able to rekey PTK0 keys have to set @NL80211_EXT_FEATURE_CAN_REPLACE_PTK0, - mac80211 stops queuing frames depending on the key during the replace - the key is first replaced in the hardware and after that in mac80211 - and mac80211 stops/blocks new aggregation sessions during the rekey. For drivers not setting @NL80211_EXT_FEATURE_CAN_REPLACE_PTK0 the user space must avoid PTK rekeys if "Extended Key ID for Individually Addressed Frames" is not being used. Rekeys for mac80211 drivers without this flag will generate a warning and use an extra call to ieee80211_flush_queues() to both highlight and try to prevent the issues with not updated drivers. The core of the fix changes the key install procedure from: - atomic switch over to the new key in mac80211 - remove the old key in the hardware (stops encryption offloading, fall back to software encryption with a potential clear text packet leak in between) - delete the inactive old key in mac80211 - enable hardware encryption offloading for the new key to: - if it's a PTK mark the old key as tainted to drop TX frames with the outgoing key - replace the key in hardware with the new one - atomic switch over to the new (not marked as tainted) key in mac80211 (which also resumes TX) - delete the inactive old key in mac80211 With the new sequence the hardware will be unable to decrypt frames encrypted with the old key prior to switching to the new key in mac80211 and thus prevent PNs from packets decrypted with the old key to be accounted against the new key. For that to work the drivers have to provide a clear boundary. Mac80211 drivers setting @NL80211_EXT_FEATURE_CAN_REPLACE_PTK0 confirm to provide it and mac80211 will then be able to correctly rekey in-use PTK keys with those drivers. The mac80211 requirements for drivers to set the flag have been added to the "Hardware crypto acceleration" documentation section. It drills down to: The drivers must not hand over frames decrypted with the old key to mac80211 once the call to set_key() with %DISABLE_KEY has been completed. It's allowed to either drop or continue to use the old key for any outgoing frames which are already in the queues, but it must not send out any of them unencrypted or encrypted with the new key. Even with the new boundary in place aggregation sessions with the reorder buffer are problematic: RX aggregation session started prior and completed after the rekey could still dump frames received with the old key at mac80211 after it switched over to the new key. This is side stepped by stopping all (RX and TX) aggregation sessions when replacing a PTK key and hardware key offloading. Stopping TX aggregation sessions avoids the need to get the PNs (IVs) updated in frames prepared for the old key and (re)transmitted after the switch to the new key. As a bonus it improves the compatibility when the remote STA is not handling rekeys as it should. When using software crypto aggregation sessions are not stopped. Mac80211 won't be able to decode the dangerous frames and discard them without special handling. Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> [trim overly long rekey warning] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05mac80211: support radiotap L-SIG dataShaul Triebitz
As before with HE, the data needs to be provided by the driver in the skb head, since there's not enough space in the skb CB. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05mac80211: Store sk_pacing_shift in ieee80211_hwWen Gong
Make it possibly for drivers to adjust the default skb_pacing_shift by storing it in the hardware struct. Signed-off-by: Wen Gong <wgong@codeaurora.org> [adjust commit log, move & adjust comment] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05mac80211: introduce capability flags for VHT EXT NSS supportJohannes Berg
Depending on whether or not rate control supports selecting rates depending on the bandwidth, we can use VHT extended NSS support. In essence, this is dot11VHTExtendedNSSBWCapable from the spec, since depending on that we'll need to parse the bandwidth. If needed, also set/clear the VHT Capability Element bit for this capability so that we don't advertise it erroneously or don't advertise it when we actually use it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05cfg80211: add he_capabilities (ext) IE to AP settingsShaul Triebitz
Same as for HT and VHT. This helps the lower level to know whether the AP supports HE. Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-09-05mac80211: add an optional TXQ for other PS-buffered framesJohannes Berg
Some drivers may want to also use the TXQ abstraction with non-data packets that need powersave buffering, so add a hardware flag to allow this. Signed-off-by: Johannes Berg <johannes.berg@intel.com>