summaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)Author
2013-10-21tcp_memcontrol: Remove setting cgroup settings via sysctlEric W. Biederman
The code is broken and does not constrain sysctl_tcp_mem as tcp_update_limit does. With the result that it allows the cgroup tcp memory limits to be bypassed. The semantics are broken as the settings are not per netns and are in a per netns table, and instead looks at current. Since the code is broken in both design and implementation and does not implement the functionality for which it was written remove it. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21tcp_memcontrol: Remove tcp_max_memoryEric W. Biederman
This function is never called. Remove it. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21nf_tables*.h: Remove extern from function prototypesJoe Perches
There are a mix of function prototypes with and without extern in the kernel sources. Standardize on not using extern for function prototypes. Function prototypes don't need to be written with extern. extern is assumed by the compiler. Its use is as unnecessary as using auto to declare automatic/local variables in a block. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19tcp: switch tcp_fastopen key generation to net_get_random_onceHannes Frederic Sowa
Changed key initialization of tcp_fastopen cookies to net_get_random_once. If the user sets a custom key net_get_random_once must be called at least once to ensure we don't overwrite the user provided key when the first cookie is generated later on. Cc: Yuchung Cheng <ycheng@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19inet: convert inet_ehash_secret and ipv6_hash_secret to net_get_random_onceHannes Frederic Sowa
Initialize the ehash and ipv6_hash_secrets with net_get_random_once. Each compilation unit gets its own secret now: ipv4/inet_hashtables.o ipv4/udp.o ipv6/inet6_hashtables.o ipv6/udp.o rds/connection.o The functions still get inlined into the hashing functions. In the fast path we have at most two (needed in ipv6) if (unlikely(...)). Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19inet: split syncookie keys for ipv4 and ipv6 and initialize with ↵Hannes Frederic Sowa
net_get_random_once This patch splits the secret key for syncookies for ipv4 and ipv6 and initializes them with net_get_random_once. This change was the reason I did this series. I think the initialization of the syncookie_secret is way to early. Cc: Florian Westphal <fw@strlen.de> Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19ipv6: split inet6_ehashfn to hash functions per compilation unitHannes Frederic Sowa
This patch splits the inet6_ehashfn into separate ones in ipv6/inet6_hashtables.o and ipv6/udp.o to ease the introduction of seperate secrets keys later. Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19ipv4: split inet_ehashfn to hash functions per compilation unitHannes Frederic Sowa
This duplicates a bit of code but let's us easily introduce separate secret keys later. The separate compilation units are ipv4/inet_hashtabbles.o, ipv4/udp.o and rds/connection.o. Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19ipv4: generalize gre_handle_offloadsEric Dumazet
This patch makes gre_handle_offloads() more generic and rename it to iptunnel_handle_offloads() This will be used to add GSO/TSO support to IPIP tunnels. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== 1) Don't use a wildcard SA if a more precise one is in acquire state, from Fan Du. 2) Simplify the SA lookup when using wildcard source. We need to check only the destination in this case, from Fan Du. 3) Add a receive path hook for IPsec virtual tunnel interfaces to xfrm6_mode_tunnel. 4) Add support for IPsec virtual tunnel interfaces to ipv6. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-18tcp: rename tcp_tso_segment()Eric Dumazet
Rename tcp_tso_segment() to tcp_gso_segment(), to better reflect what is going on, and ease grep games. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17Merge branch 'for-davem' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== This is a batch of updates intended for the 3.13 stream... The biggest item of interest in here is wcn36xx, the new mac80211 driver for Qualcomm WCN3660/WCN3680 hardware. Regarding the mac80211 bits, Johannes says: "We have an assortment of cleanups and new features, of which the biggest one is probably the channel-switch support in IBSS. Nothing else really stands out much." On top of that, the ath9k and rt2x00 get a lot of update action from Felix Fietkau and Gabor Juhos, respectively. There are a handful of updates to other drivers here and there as well. Please let me know if there are problems! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17ipv4: shrink rt_cache_statEric Dumazet
Half of the rt_cache_stat fields are no longer used after IP route cache removal, lets shrink this per cpu area. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17netdev: inet_timewait_sock.h missing semi-colon when KMEMCHECK is enabledRandy Dunlap
Fix (a few hundred) build errors due to missing semi-colon when KMEMCHECK is enabled: include/net/inet_timewait_sock.h:139:2: error: expected ',', ';' or '}' before 'int' include/net/inet_timewait_sock.h:148:28: error: 'const struct inet_timewait_sock' has no member named 'tw_death_node' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17Merge branch 'net-next' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables Pablo Neira Ayuso says: ==================== netfilter updates: nf_tables pull request The following patchset contains the current original nf_tables tree condensed in 17 patches. I have organized them by chronogical order since the original nf_tables code was released in 2009 and by dependencies between the different patches. The patches are: 1) Adapt all existing hooks in the tree to pass hook ops to the hook callback function, required by nf_tables, from Patrick McHardy. 2) Move alloc_null_binding to nf_nat_core, as it is now also needed by nf_tables and ip_tables, original patch from Patrick McHardy but required major changes to adapt it to the current tree that I made. 3) Add nf_tables core, including the netlink API, the packet filtering engine, expressions and built-in tables, from Patrick McHardy. This patch includes accumulated fixes since 2009 and minor enhancements. The patch description contains a list of references to the original patches for the record. For those that are not familiar to the original work, see [1], [2] and [3]. 4) Add netlink set API, this replaces the original set infrastructure to introduce a netlink API to add/delete sets and to add/delete set elements. This includes two set types: the hash and the rb-tree sets (used for interval based matching). The main difference with ipset is that this infrastructure is data type agnostic. Patch from Patrick McHardy. 5) Allow expression operation overload, this API change allows us to provide define expression subtypes depending on the configuration that is received from user-space via Netlink. It is used by follow up patches to provide optimized versions of the payload and cmp expressions and the x_tables compatibility layer, from Patrick McHardy. 6) Add optimized data comparison operation, it requires the previous patch, from Patrick McHardy. 7) Add optimized payload implementation, it requires patch 5, from Patrick McHardy. 8) Convert built-in tables to chain types. Each chain type have special semantics (filter, route and nat) that are used by userspace to configure the chain behaviour. The main chain regarding iptables is that tables become containers of chain, with no specific semantics. However, you may still configure your tables and chains to retain iptables like semantics, patch from me. 9) Add compatibility layer for x_tables. This patch adds support to use all existing x_tables extensions from nf_tables, this is used to provide a userspace utility that accepts iptables syntax but used internally the nf_tables kernel core. This patch includes missing features in the nf_tables core such as the per-chain stats, default chain policy and number of chain references, which are required by the iptables compatibility userspace tool. Patch from me. 10) Fix transport protocol matching, this fix is a side effect of the x_tables compatibility layer, which now provides a pointer to the transport header, from me. 11) Add support for dormant tables, this feature allows you to disable all chains and rules that are contained in one table, from me. 12) Add IPv6 NAT support. At the time nf_tables was made, there was no NAT IPv6 support yet, from Tomasz Bursztyka. 13) Complete net namespace support. This patch register the protocol family per net namespace, so tables (thus, other objects contained in tables such as sets, chains and rules) are only visible from the corresponding net namespace, from me. 14) Add the insert operation to the nf_tables netlink API, this requires adding a new position attribute that allow us to locate where in the ruleset a rule needs to be inserted, from Eric Leblond. 15) Add rule batching support, including atomic rule-set updates by using rule-set generations. This patch includes a change to nfnetlink to include two new control messages to indicate the beginning and the end of a batch. The end message is interpreted as the commit message, if it's missing, then the rule-set updates contained in the batch are aborted, from me. 16) Add trace support to the nf_tables packet filtering core, from me. 17) Add ARP filtering support, original patch from Patrick McHardy, but adapted to fit into the chain type infrastructure. This was recovered to be used by nft userspace tool and our compatibility arptables userspace tool. There is still work to do to fully replace x_tables [4] [5] but that can be done incrementally by extending our netlink API. Moreover, looking at netfilter-devel and the amount of contributions to nf_tables we've been getting, I think it would be good to have it mainstream to avoid accumulating large patchsets skip continuous rebases. I tried to provide a reasonable patchset, we have more than 100 accumulated patches in the original nf_tables tree, so I collapsed many of the small fixes to the main patch we had since 2009 and provide a small batch for review to netdev, while trying to retain part of the history. For those who didn't give a try to nf_tables yet, there's a quick howto available from Eric Leblond that describes how to get things working [6]. Comments/reviews welcome. Thanks! [1] http://lwn.net/Articles/324251/ [2] http://workshop.netfilter.org/2013/wiki/images/e/ee/Nftables-osd-2013-developer.pdf [3] http://lwn.net/Articles/564095/ [4] http://people.netfilter.org/pablo/map-pending-work.txt [4] http://people.netfilter.org/pablo/nftables-todo.txt [5] https://home.regit.org/netfilter-en/nftables-quick-howto/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17irda: update comment mentioning IRQF_DISABLEDMichael Opdenacker
This patch removes a comment mentioning IRQF_DISABLED, which is deprecated. Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
2013-10-14netfilter: nf_tables: add ARP filtering supportPablo Neira Ayuso
This patch registers the ARP family and he filter chain type for this family. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: nf_tables: add trace supportPablo Neira Ayuso
This patch adds support for tracing the packet travel through the ruleset, in a similar fashion to x_tables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: nfnetlink: add batch support and use it from nf_tablesPablo Neira Ayuso
This patch adds a batch support to nfnetlink. Basically, it adds two new control messages: * NFNL_MSG_BATCH_BEGIN, that indicates the beginning of a batch, the nfgenmsg->res_id indicates the nfnetlink subsystem ID. * NFNL_MSG_BATCH_END, that results in the invocation of the ss->commit callback function. If not specified or an error ocurred in the batch, the ss->abort function is invoked instead. The end message represents the commit operation in nftables, the lack of end message results in an abort. This patch also adds the .call_batch function that is only called from the batch receival path. This patch adds atomic rule updates and dumps based on bitmask generations. This allows to atomically commit a set of rule-set updates incrementally without altering the internal state of existing nf_tables expressions/matches/targets. The idea consists of using a generation cursor of 1 bit and a bitmask of 2 bits per rule. Assuming the gencursor is 0, then the genmask (expressed as a bitmask) can be interpreted as: 00 active in the present, will be active in the next generation. 01 inactive in the present, will be active in the next generation. 10 active in the present, will be deleted in the next generation. ^ gencursor Once you invoke the transition to the next generation, the global gencursor is updated: 00 active in the present, will be active in the next generation. 01 active in the present, needs to zero its future, it becomes 00. 10 inactive in the present, delete now. ^ gencursor If a dump is in progress and nf_tables enters a new generation, the dump will stop and return -EBUSY to let userspace know that it has to retry again. In order to invalidate dumps, a global genctr counter is increased everytime nf_tables enters a new generation. This new operation can be used from the user-space utility that controls the firewall, eg. nft -f restore The rule updates contained in `file' will be applied atomically. cat file ----- add filter INPUT ip saddr 1.1.1.1 counter accept #1 del filter INPUT ip daddr 2.2.2.2 counter drop #2 -EOF- Note that the rule 1 will be inactive until the transition to the next generation, the rule 2 will be evicted in the next generation. There is a penalty during the rule update due to the branch misprediction in the packet matching framework. But that should be quickly resolved once the iteration over the commit list that contain rules that require updates is finished. Event notification happens once the rule-set update has been committed. So we skip notifications is case the rule-set update is aborted, which can happen in case that the rule-set is tested to apply correctly. This patch squashed the following patches from Pablo: * nf_tables: atomic rule updates and dumps * nf_tables: get rid of per rule list_head for commits * nf_tables: use per netns commit list * nfnetlink: add batch support and use it from nf_tables * nf_tables: all rule updates are transactional * nf_tables: attach replacement rule after stale one * nf_tables: do not allow deletion/replacement of stale rules * nf_tables: remove unused NFTA_RULE_FLAGS Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: nf_tables: complete net namespace supportPablo Neira Ayuso
Register family per netnamespace to ensure that sets are only visible in its approapriate namespace. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: nf_tables: add compatibility layer for x_tablesPablo Neira Ayuso
This patch adds the x_tables compatibility layer. This allows you to use existing x_tables matches and targets from nf_tables. This compatibility later allows us to use existing matches/targets for features that are still missing in nf_tables. We can progressively replace them with native nf_tables extensions. It also provides the userspace compatibility software that allows you to express the rule-set using the iptables syntax but using the nf_tables kernel components. In order to get this compatibility layer working, I've done the following things: * add NFNL_SUBSYS_NFT_COMPAT: this new nfnetlink subsystem is used to query the x_tables match/target revision, so we don't need to use the native x_table getsockopt interface. * emulate xt structures: this required extending the struct nft_pktinfo to include the fragment offset, which is already obtained from ip[6]_tables and that is used by some matches/targets. * add support for default policy to base chains, required to emulate x_tables. * add NFTA_CHAIN_USE attribute to obtain the number of references to chains, required by x_tables emulation. * add chain packet/byte counters using per-cpu. * support 32-64 bits compat. For historical reasons, this patch includes the following patches that were posted in the netfilter-devel mailing list. From Pablo Neira Ayuso: * nf_tables: add default policy to base chains * netfilter: nf_tables: add NFTA_CHAIN_USE attribute * nf_tables: nft_compat: private data of target and matches in contiguous area * nf_tables: validate hooks for compat match/target * nf_tables: nft_compat: release cached matches/targets * nf_tables: x_tables support as a compile time option * nf_tables: fix alias for xtables over nftables module * nf_tables: add packet and byte counters per chain * nf_tables: fix per-chain counter stats if no counters are passed * nf_tables: don't bump chain stats * nf_tables: add protocol and flags for xtables over nf_tables * nf_tables: add ip[6]t_entry emulation * nf_tables: move specific layer 3 compat code to nf_tables_ipv[4|6] * nf_tables: support 32bits-64bits x_tables compat * nf_tables: fix compilation if CONFIG_COMPAT is disabled From Patrick McHardy: * nf_tables: move policy to struct nft_base_chain * nf_tables: send notifications for base chain policy changes From Alexander Primak: * nf_tables: remove the duplicate NF_INET_LOCAL_OUT From Nicolas Dichtel: * nf_tables: fix compilation when nf-netlink is a module Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: nf_tables: convert built-in tables/chains to chain typesPablo Neira Ayuso
This patch converts built-in tables/chains to chain types that allows you to deploy customized table and chain configurations from userspace. After this patch, you have to specify the chain type when creating a new chain: add chain ip filter output { type filter hook input priority 0; } ^^^^ ------ The existing chain types after this patch are: filter, route and nat. Note that tables are just containers of chains with no specific semantics, which is a significant change with regards to iptables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: nft_payload: add optimized payload implementation for small loadsPatrick McHardy
Add an optimized payload expression implementation for small (up to 4 bytes) aligned data loads from the linear packet area. This patch also includes original Patrick McHardy's entitled (nf_tables: inline nft_payload_fast_eval() into main evaluation loop). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: nf_tables: add optimized data comparison for small valuesPatrick McHardy
Add an optimized version of nft_data_cmp() that only handles values of to 4 bytes length. This patch includes original Patrick McHardy's patch entitled (nf_tables: inline nft_cmp_fast_eval() into main evaluation loop). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: nf_tables: expression ops overloadingPatrick McHardy
Split the expression ops into two parts and support overloading of the runtime expression ops based on the requested function through a ->select_ops() callback. This can be used to provide optimized implementations, for instance for loading small aligned amounts of data from the packet or inlining frequently used operations into the main evaluation loop. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: nf_tables: add netlink set APIPatrick McHardy
This patch adds the new netlink API for maintaining nf_tables sets independently of the ruleset. The API supports the following operations: - creation of sets - deletion of sets - querying of specific sets - dumping of all sets - addition of set elements - removal of set elements - dumping of all set elements Sets are identified by name, each table defines an individual namespace. The name of a set may be allocated automatically, this is mostly useful in combination with the NFT_SET_ANONYMOUS flag, which destroys a set automatically once the last reference has been released. Sets can be marked constant, meaning they're not allowed to change while linked to a rule. This allows to perform lockless operation for set types that would otherwise require locking. Additionally, if the implementation supports it, sets can (as before) be used as maps, associating a data value with each key (or range), by specifying the NFT_SET_MAP flag and can be used for interval queries by specifying the NFT_SET_INTERVAL flag. Set elements are added and removed incrementally. All element operations support batching, reducing netlink message and set lookup overhead. The old "set" and "hash" expressions are replaced by a generic "lookup" expression, which binds to the specified set. Userspace is not aware of the actual set implementation used by the kernel anymore, all configuration options are generic. Currently the implementation selection logic is largely missing and the kernel will simply use the first registered implementation supporting the requested operation. Eventually, the plan is to have userspace supply a description of the data characteristics and select the implementation based on expected performance and memory use. This patch includes the new 'lookup' expression to look up for element matching in the set. This patch includes kernel-doc descriptions for this set API and it also includes the following fixes. From Patrick McHardy: * netfilter: nf_tables: fix set element data type in dumps * netfilter: nf_tables: fix indentation of struct nft_set_elem comments * netfilter: nf_tables: fix oops in nft_validate_data_load() * netfilter: nf_tables: fix oops while listing sets of built-in tables * netfilter: nf_tables: destroy anonymous sets immediately if binding fails * netfilter: nf_tables: propagate context to set iter callback * netfilter: nf_tables: add loop detection From Pablo Neira Ayuso: * netfilter: nf_tables: allow to dump all existing sets * netfilter: nf_tables: fix wrong type for flags variable in newelem Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: add nftablesPatrick McHardy
This patch adds nftables which is the intended successor of iptables. This packet filtering framework reuses the existing netfilter hooks, the connection tracking system, the NAT subsystem, the transparent proxying engine, the logging infrastructure and the userspace packet queueing facilities. In a nutshell, nftables provides a pseudo-state machine with 4 general purpose registers of 128 bits and 1 specific purpose register to store verdicts. This pseudo-machine comes with an extensible instruction set, a.k.a. "expressions" in the nftables jargon. The expressions included in this patch provide the basic functionality, they are: * bitwise: to perform bitwise operations. * byteorder: to change from host/network endianess. * cmp: to compare data with the content of the registers. * counter: to enable counters on rules. * ct: to store conntrack keys into register. * exthdr: to match IPv6 extension headers. * immediate: to load data into registers. * limit: to limit matching based on packet rate. * log: to log packets. * meta: to match metainformation that usually comes with the skbuff. * nat: to perform Network Address Translation. * payload: to fetch data from the packet payload and store it into registers. * reject (IPv4 only): to explicitly close connection, eg. TCP RST. Using this instruction-set, the userspace utility 'nft' can transform the rules expressed in human-readable text representation (using a new syntax, inspired by tcpdump) to nftables bytecode. nftables also inherits the table, chain and rule objects from iptables, but in a more configurable way, and it also includes the original datatype-agnostic set infrastructure with mapping support. This set infrastructure is enhanced in the follow up patch (netfilter: nf_tables: add netlink set API). This patch includes the following components: * the netlink API: net/netfilter/nf_tables_api.c and include/uapi/netfilter/nf_tables.h * the packet filter core: net/netfilter/nf_tables_core.c * the expressions (described above): net/netfilter/nft_*.c * the filter tables: arp, IPv4, IPv6 and bridge: net/ipv4/netfilter/nf_tables_ipv4.c net/ipv6/netfilter/nf_tables_ipv6.c net/ipv4/netfilter/nf_tables_arp.c net/bridge/netfilter/nf_tables_bridge.c * the NAT table (IPv4 only): net/ipv4/netfilter/nf_table_nat_ipv4.c * the route table (similar to mangle): net/ipv4/netfilter/nf_table_route_ipv4.c net/ipv6/netfilter/nf_table_route_ipv6.c * internal definitions under: include/net/netfilter/nf_tables.h include/net/netfilter/nf_tables_core.h * It also includes an skeleton expression: net/netfilter/nft_expr_template.c and the preliminary implementation of the meta target net/netfilter/nft_meta_target.c It also includes a change in struct nf_hook_ops to add a new pointer to store private data to the hook, that is used to store the rule list per chain. This patch is based on the patch from Patrick McHardy, plus merged accumulated cleanups, fixes and small enhancements to the nftables code that has been done since 2009, which are: From Patrick McHardy: * nf_tables: adjust netlink handler function signatures * nf_tables: only retry table lookup after successful table module load * nf_tables: fix event notification echo and avoid unnecessary messages * nft_ct: add l3proto support * nf_tables: pass expression context to nft_validate_data_load() * nf_tables: remove redundant definition * nft_ct: fix maxattr initialization * nf_tables: fix invalid event type in nf_tables_getrule() * nf_tables: simplify nft_data_init() usage * nf_tables: build in more core modules * nf_tables: fix double lookup expression unregistation * nf_tables: move expression initialization to nf_tables_core.c * nf_tables: build in payload module * nf_tables: use NFPROTO constants * nf_tables: rename pid variables to portid * nf_tables: save 48 bits per rule * nf_tables: introduce chain rename * nf_tables: check for duplicate names on chain rename * nf_tables: remove ability to specify handles for new rules * nf_tables: return error for rule change request * nf_tables: return error for NLM_F_REPLACE without rule handle * nf_tables: include NLM_F_APPEND/NLM_F_REPLACE flags in rule notification * nf_tables: fix NLM_F_MULTI usage in netlink notifications * nf_tables: include NLM_F_APPEND in rule dumps From Pablo Neira Ayuso: * nf_tables: fix stack overflow in nf_tables_newrule * nf_tables: nft_ct: fix compilation warning * nf_tables: nft_ct: fix crash with invalid packets * nft_log: group and qthreshold are 2^16 * nf_tables: nft_meta: fix socket uid,gid handling * nft_counter: allow to restore counters * nf_tables: fix module autoload * nf_tables: allow to remove all rules placed in one chain * nf_tables: use 64-bits rule handle instead of 16-bits * nf_tables: fix chain after rule deletion * nf_tables: improve deletion performance * nf_tables: add missing code in route chain type * nf_tables: rise maximum number of expressions from 12 to 128 * nf_tables: don't delete table if in use * nf_tables: fix basechain release From Tomasz Bursztyka: * nf_tables: Add support for changing users chain's name * nf_tables: Change chain's name to be fixed sized * nf_tables: Add support for replacing a rule by another one * nf_tables: Update uapi nftables netlink header documentation From Florian Westphal: * nft_log: group is u16, snaplen u32 From Phil Oester: * nf_tables: operational limit match Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-14netfilter: nf_nat: move alloc_null_binding to nf_nat_core.cPablo Neira Ayuso
Similar to nat_decode_session, alloc_null_binding is needed for both ip_tables and nf_tables, so move it to nf_nat_core.c. This change is required by nf_tables. This is an adapted version of the original patch from Patrick McHardy. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-10inet: rename ir_loc_port to ir_numEric Dumazet
In commit 634fb979e8f ("inet: includes a sock_common in request_sock") I forgot that the two ports in sock_common do not have same byte order : skc_dport is __be16 (network order), but skc_num is __u16 (host order) So sparse complains because ir_loc_port (mapped into skc_num) is considered as __u16 while it should be __be16 Let rename ir_loc_port to ireq->ir_num (analogy with inet->inet_num), and perform appropriate htons/ntohs conversions. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-10Merge branch 'for-john' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
2013-10-10inet: includes a sock_common in request_sockEric Dumazet
TCP listener refactoring, part 5 : We want to be able to insert request sockets (SYN_RECV) into main ehash table instead of the per listener hash table to allow RCU lookups and remove listener lock contention. This patch includes the needed struct sock_common in front of struct request_sock This means there is no more inet6_request_sock IPv6 specific structure. Following inet_request_sock fields were renamed as they became macros to reference fields from struct sock_common. Prefix ir_ was chosen to avoid name collisions. loc_port -> ir_loc_port loc_addr -> ir_loc_addr rmt_addr -> ir_rmt_addr rmt_port -> ir_rmt_port iif -> ir_iif Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-09net: fix build errors if ipv6 is disabledEric Dumazet
CONFIG_IPV6=n is still a valid choice ;) It appears we can remove dead code. Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-09ipv6: Add a receive path hook for vti6 in xfrm6_mode_tunnel.Steffen Klassert
Add a receive path hook for the IPsec vritual tunnel interface. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-10-09ipv6: make lookups simpler and fasterEric Dumazet
TCP listener refactoring, part 4 : To speed up inet lookups, we moved IPv4 addresses from inet to struct sock_common Now is time to do the same for IPv6, because it permits us to have fast lookups for all kind of sockets, including upcoming SYN_RECV. Getting IPv6 addresses in TCP lookups currently requires two extra cache lines, plus a dereference (and memory stall). inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6 This patch is way bigger than its IPv4 counter part, because for IPv4, we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6, it's not doable easily. inet6_sk(sk)->daddr becomes sk->sk_v6_daddr inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr at the same offset. We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic macro. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08tcp/dccp: remove twchainEric Dumazet
TCP listener refactoring, part 3 : Our goal is to hash SYN_RECV sockets into main ehash for fast lookup, and parallel SYN processing. Current inet_ehash_bucket contains two chains, one for ESTABLISH (and friend states) sockets, another for TIME_WAIT sockets only. As the hash table is sized to get at most one socket per bucket, it makes little sense to have separate twchain, as it makes the lookup slightly more complicated, and doubles hash table memory usage. If we make sure all socket types have the lookup keys at the same offsets, we can use a generic and faster lookup. It turns out TIME_WAIT and ESTABLISHED sockets already have common lookup fields for IPv4. [ INET_TW_MATCH() is no longer needed ] I'll provide a follow-up to factorize IPv6 lookup as well, to remove INET6_TW_MATCH() This way, SYN_RECV pseudo sockets will be supported the same. A new sock_gen_put() helper is added, doing either a sock_put() or inet_twsk_put() [ and will support SYN_RECV later ]. Note this helper should only be called in real slow path, when rcu lookup found a socket that was moved to another identity (freed/reused immediately), but could eventually be used in other contexts, like sock_edemux() Before patch : dmesg | grep "TCP established" TCP established hash table entries: 524288 (order: 11, 8388608 bytes) After patch : TCP established hash table entries: 524288 (order: 10, 4194304 bytes) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: include/linux/netdevice.h net/core/sock.c Trivial merge issues. Removal of "extern" for functions declaration in netdevice.h at the same time "const" was added to an argument. Two parallel line additions in net/core/sock.c Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08net: ipv4 only populate IP_PKTINFO when neededShawn Bohrer
The since the removal of the routing cache computing fib_compute_spec_dst() does a fib_table lookup for each UDP multicast packet received. This has introduced a performance regression for some UDP workloads. This change skips populating the packet info for sockets that do not have IP_PKTINFO set. Benchmark results from a netperf UDP_RR test: Before 89789.68 transactions/s After 90587.62 transactions/s Benchmark results from a fio 1 byte UDP multicast pingpong test (Multicast one way unicast response): Before 12.63us RTT After 12.48us RTT Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08udp: ipv4: Add udp early demuxShawn Bohrer
The removal of the routing cache introduced a performance regression for some UDP workloads since a dst lookup must be done for each packet. This change caches the dst per socket in a similar manner to what we do for TCP by implementing early_demux. For UDP multicast we can only cache the dst if there is only one receiving socket on the host. Since caching only works when there is one receiving socket we do the multicast socket lookup using RCU. For UDP unicast we only demux sockets with an exact match in order to not break forwarding setups. Additionally since the hash chains may be long we only check the first socket to see if it is a match and not waste extra time searching the whole chain when we might not find an exact match. Benchmark results from a netperf UDP_RR test: Before 87961.22 transactions/s After 89789.68 transactions/s Benchmark results from a fio 1 byte UDP multicast pingpong test (Multicast one way unicast response): Before 12.97us RTT After 12.63us RTT Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-nextDavid S. Miller
Conflicts: drivers/net/wireless/brcm80211/brcmfmac/dhd_bus.h drivers/net/wireless/rtlwifi/rtl8188ee/phy.h drivers/net/wireless/rtlwifi/rtl8192ce/phy.h drivers/net/wireless/rtlwifi/rtl8192de/phy.h drivers/net/wireless/rtlwifi/rtl8723ae/phy.h Just some minor conflicts between the wireless-next changes and Joe Perches's "extern" removal from function prototypes in header files. John W. Linville says: ==================== Regarding the Bluetooth bits, Gustavo says: "The big work here is from Marcel and Johan. They did a lot of work in the L2CAP, HCI and MGMT layers. The most important ones are the addition of a new MGMT command to enable/disable LE advertisement and the introduction of the HCI user channel to allow applications to get directly and exclusive access to Bluetooth devices." As to the ath10k bits, Kalle says: "Bartosz dropped support for qca98xx hw1.0 hardware from ath10k, it's just too much to support it. Michal added support for the new firmware interface. Marek fixed WEP in AP and IBSS mode. Rest of the changes are minor fixes or cleanups." And also: "Major changes are: * throughput improvements including aligning the RX frames correctly and optimising HTT layer (Michal) * remove qca98xx hw1.0 support (Bartosz) * add support for firmware version 999.999.0.636 (Michal) * firmware htt statistics support (Kalle) * fix WEP in AP and IBSS mode (Marek) * fix a mutex unlock balance in debugfs file (Shafi) And of course there's a lot of smaller fixes and cleanup." For the wl12xx bits, Luca says: "Here are some patches intended for 3.13. Eliad is upstreaming a bunch of patches that have been pending in the internal tree. Mostly bugfixes and other small improvements." Along with that... Arend and friends bring us a batch of brcmfmac updates, Larry Finger offers some rtlwifi refactoring, and Sujith sends the usual batch of ath9k updates. As usual, there are a number of other small updates from a variety of players as well. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-07net: fix unsafe set_memory_rw from softirqAlexei Starovoitov
on x86 system with net.core.bpf_jit_enable = 1 sudo tcpdump -i eth1 'tcp port 22' causes the warning: [ 56.766097] Possible unsafe locking scenario: [ 56.766097] [ 56.780146] CPU0 [ 56.786807] ---- [ 56.793188] lock(&(&vb->lock)->rlock); [ 56.799593] <Interrupt> [ 56.805889] lock(&(&vb->lock)->rlock); [ 56.812266] [ 56.812266] *** DEADLOCK *** [ 56.812266] [ 56.830670] 1 lock held by ksoftirqd/1/13: [ 56.836838] #0: (rcu_read_lock){.+.+..}, at: [<ffffffff8118f44c>] vm_unmap_aliases+0x8c/0x380 [ 56.849757] [ 56.849757] stack backtrace: [ 56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45 [ 56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012 [ 56.882004] ffffffff821944c0 ffff88080bbdb8c8 ffffffff8175a145 0000000000000007 [ 56.895630] ffff88080bbd5f40 ffff88080bbdb928 ffffffff81755b14 0000000000000001 [ 56.909313] ffff880800000001 ffff880800000000 ffffffff8101178f 0000000000000001 [ 56.923006] Call Trace: [ 56.929532] [<ffffffff8175a145>] dump_stack+0x55/0x76 [ 56.936067] [<ffffffff81755b14>] print_usage_bug+0x1f7/0x208 [ 56.942445] [<ffffffff8101178f>] ? save_stack_trace+0x2f/0x50 [ 56.948932] [<ffffffff810cc0a0>] ? check_usage_backwards+0x150/0x150 [ 56.955470] [<ffffffff810ccb52>] mark_lock+0x282/0x2c0 [ 56.961945] [<ffffffff810ccfed>] __lock_acquire+0x45d/0x1d50 [ 56.968474] [<ffffffff810cce6e>] ? __lock_acquire+0x2de/0x1d50 [ 56.975140] [<ffffffff81393bf5>] ? cpumask_next_and+0x55/0x90 [ 56.981942] [<ffffffff810cef72>] lock_acquire+0x92/0x1d0 [ 56.988745] [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380 [ 56.995619] [<ffffffff817628f1>] _raw_spin_lock+0x41/0x50 [ 57.002493] [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380 [ 57.009447] [<ffffffff8118f52a>] vm_unmap_aliases+0x16a/0x380 [ 57.016477] [<ffffffff8118f44c>] ? vm_unmap_aliases+0x8c/0x380 [ 57.023607] [<ffffffff810436b0>] change_page_attr_set_clr+0xc0/0x460 [ 57.030818] [<ffffffff810cfb8d>] ? trace_hardirqs_on+0xd/0x10 [ 57.037896] [<ffffffff811a8330>] ? kmem_cache_free+0xb0/0x2b0 [ 57.044789] [<ffffffff811b59c3>] ? free_object_rcu+0x93/0xa0 [ 57.051720] [<ffffffff81043d9f>] set_memory_rw+0x2f/0x40 [ 57.058727] [<ffffffff8104e17c>] bpf_jit_free+0x2c/0x40 [ 57.065577] [<ffffffff81642cba>] sk_filter_release_rcu+0x1a/0x30 [ 57.072338] [<ffffffff811108e2>] rcu_process_callbacks+0x202/0x7c0 [ 57.078962] [<ffffffff81057f17>] __do_softirq+0xf7/0x3f0 [ 57.085373] [<ffffffff81058245>] run_ksoftirqd+0x35/0x70 cannot reuse jited filter memory, since it's readonly, so use original bpf insns memory to hold work_struct defer kfree of sk_filter until jit completed freeing tested on x86_64 and i386 Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03tcp: shrink tcp6_timewait_sock by one cache lineEric Dumazet
While working on tcp listener refactoring, I found that it would really make things easier if sock_common could include the IPv6 addresses needed in the lookups, instead of doing very complex games to get their values (depending on sock being SYN_RECV, ESTABLISHED, TIME_WAIT) For this to happen, I need to be sure that tcp6_timewait_sock and tcp_timewait_sock consume same number of cache lines. This is possible if we only use 32bits for tw_ttd, as we remove one 32bit hole in inet_timewait_sock inet_tw_time_stamp() is defined and used, even if its current implementation looks like tcp_time_stamp : We might need finer resolution for tcp_time_stamp in the future. Before patch : sizeof(struct tcp6_timewait_sock) = 0xc8 After patch : sizeof(struct tcp6_timewait_sock) = 0xc0 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03Merge branch 'for-upstream' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
2013-10-03flow_dissector: factor out the ports extraction in skb_flow_get_portsNikolay Aleksandrov
Factor out the code that extracts the ports from skb_flow_dissect and add a new function skb_flow_get_ports which can be re-used. Suggested-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03inet: consolidate INET_TW_MATCHEric Dumazet
TCP listener refactoring, part 2 : We can use a generic lookup, sockets being in whatever state, if we are sure all relevant fields are at the same place in all socket types (ESTABLISH, TIME_WAIT, SYN_RECV) This patch removes these macros : inet_addrpair, inet_addrpair, tw_addrpair, tw_portpair And adds : sk_portpair, sk_addrpair, sk_daddr, sk_rcv_saddr Then, INET_TW_MATCH() is really the same than INET_MATCH() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-02Bluetooth: Add the definition for Slave Page Response TimeoutDoHyun Pyun
The Slave Page Response Timeout event indicates to the Host that a slave page response timeout has occurred in the BR/EDR Controller. The Core Spec Addendum 4 adds this command in part B Connectionless Slave Broadcast. Bluetooth Core Specification Addendum 4 - Page 110 "7.7.72 Slave Page Response Timeout Event [New Section] ... Note: this event will be generated if the slave BR/EDR Controller responds to a page but does not receive the master FHS packet (see Baseband, Section 8.3.3) within pagerespTO. Event Parameters: NONE" Signed-off-by: Dohyun Pyun <dh79.pyun@samsung.com> Signed-off-by: C S Bhargava <cs.bhargava@samsung.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-02Bluetooth: Add the definition and stcuture for Sync Train CompleteDoHyun Pyun
The Synchronization Train Complete event indicates that the Start Synchronization Train command has completed. The Core Spec Addendum 4 adds this command in part B Connectionless Slave Broadcast. Bluetooth Core Specification Addendum 4 - Page 103 "7.7.67 Synchronization Train Complete Event [New Section] ... Event Parameters: Status 0x00 Start Synchronization Train command completed successfully. 0x01-0xFF Start Synchronization Train command failed. See Part D, Error Codes, for error codes and descriptions." Signed-off-by: Dohyun Pyun <dh79.pyun@samsung.com> Signed-off-by: C S Bhargava <cs.bhargava@samsung.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-02Bluetooth: Add the definition for Start Synchronization TrainDoHyun Pyun
The Start_Synchronization_Train command controls the Synchronization Train functionality in the BR/EDR Controller. The Core Spec Addendum 4 adds this command in part B Connectionless Slave Broadcast. Bluetooth Core Specification Addendum 4 - Page 86 "7.1.51 Start Synchronization Train Command [New Section] ... If connectionless slave broadcast mode is not enabled, the Command Disallowed (0x0C) error code shall be returned. After receiving this command and returning a Command Status event, the Baseband starts attempting to send synchronization train packets containing information related to the enabled Connectionless Slave Broadcast packet timing. Note: The AFH_Channel_Map used in the synchronization train packets is configured by the Set_AFH_Channel_Classification command and the local channel classification in the BR/EDR Controller. The synchronization train packets will be sent using the parameters specified by the latest Write_Synchronization_Train_Parameters command. The Synchronization Train will continue until synchronization_trainTO slots (as specified in the last Write_Synchronization_Train command) have passed or until the Host disables the Connectionless Slave Broadcast logical transport." Signed-off-by: Dohyun Pyun <dh79.pyun@samsung.com> Signed-off-by: C S Bhargava <cs.bhargava@samsung.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-02Bluetooth: Add the definition and structure for Set CSBDoHyun Pyun
he Set_Connectionless_Slave_Broadcast command controls the Connectionless Slave Broadcast functionality in the BR/EDR Controller. The Core Spec Addendum 4 adds this command in part B Connectionless Slave Broadcast. Bluetooth Core Specification Addendum 4 - Page 78 "7.1.49 Set Connectionless Slave Broadcast Command [New Section] ... The LT_ADDR indicated in the Set_Connectionless_Slave_Broadcast shall be pre-allocated using the HCI_Set_Reserved_LT_ADDR command. If the LT_ADDR has not been reserved, the Unknown Connection Identifier (0x02) error code shall be returned. If the controller is unable to reserve sufficient bandwidth for the requested activity, the Connection Rejected Due to Limited Resources (0x0D) error code shall be returned. The LPO_Allowed parameter informs the BR/EDR Controller whether it is allowed to sleep. The Packet_Type parameter specifies which packet types are allowed. The Host shall either enable BR packet types only, or shall enable EDR and DM1 packet types only. The Interval_Min and Interval_Max parameters specify the range from which the BR/EDR Controller must select the Connectionless Slave Broadcast Interval. The selected Interval is returned." Signed-off-by: Dohyun Pyun <dh79.pyun@samsung.com> Signed-off-by: C S Bhargava <cs.bhargava@samsung.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-02Bluetooth: Add the structure for Write Sync Train ParametersDoHyun Pyun
The Write_Synchronization_Train_Parameters command configures the Synchronization Train functionality in the BR/EDR Controller. The Core Spec Addendum 4 adds this command in part B Connectionless Slave Broadcast. Bluetooth Core Specification Addendum 4 - Page 97 "7.3.90 Write Synchronization Train Parameters Command [New Section] ... Note: The AFH_Channel_Map used in the Synchronization Train packets is configured by the Set_AFH_Channel_Classification command and the local channel classification in the BR/EDR Controller. Interval_Min and Interval_Max specify the allowed range of Sync_Train_Interval. Refer to [Vol. 2], Part B, section 2.7.2 for a detailed description of Sync_Train_Interval. The BR/EDR Controller shall select an interval from this range and return it in Sync_Train_Interval. If the Controller is unable to select a value from this range, it shall return the Invalid HCI Command Parameters (0x12) error code. Once started (via the Start_Synchronization_Train Command) the Synchronization Train will continue until synchronization_trainTO slots have passed or Connectionless Slave Broadcast has been disabled." Signed-off-by: Dohyun Pyun <dh79.pyun@samsung.com> Signed-off-by: C S Bhargava <cs.bhargava@samsung.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>