summaryrefslogtreecommitdiff
path: root/net/bridge/br_private.h
AgeCommit message (Collapse)Author
2015-09-29bridge: vlan: add per-vlan struct and move to rhashtablesNikolay Aleksandrov
This patch changes the bridge vlan implementation to use rhashtables instead of bitmaps. The main motivation behind this change is that we need extensible per-vlan structures (both per-port and global) so more advanced features can be introduced and the vlan support can be extended. I've tried to break this up but the moment net_port_vlans is changed and the whole API goes away, thus this is a larger patch. A few short goals of this patch are: - Extensible per-vlan structs stored in rhashtables and a sorted list - Keep user-visible behaviour (compressed vlans etc) - Keep fastpath ingress/egress logic the same (optimizations to come later) Here's a brief list of some of the new features we'd like to introduce: - per-vlan counters - vlan ingress/egress mapping - per-vlan igmp configuration - vlan priorities - avoid fdb entries replication (e.g. local fdb scaling issues) The structure is kept single for both global and per-port entries so to avoid code duplication where possible and also because we'll soon introduce "port0 / aka bridge as port" which should simplify things further (thanks to Vlad for the suggestion!). Now we have per-vlan global rhashtable (bridge-wide) and per-vlan port rhashtable, if an entry is added to a port it'll get a pointer to its global context so it can be quickly accessed later. There's also a sorted vlan list which is used for stable walks and some user-visible behaviour such as the vlan ranges, also for error paths. VLANs are stored in a "vlan group" which currently contains the rhashtable, sorted vlan list and the number of "real" vlan entries. A good side-effect of this change is that it resembles how hw keeps per-vlan data. One important note after this change is that if a VLAN is being looked up in the bridge's rhashtable for filtering purposes (or to check if it's an existing usable entry, not just a global context) then the new helper br_vlan_should_use() needs to be used if the vlan is found. In case the lookup is done only with a port's vlan group, then this check can be skipped. Things tested so far: - basic vlan ingress/egress - pvids - untagged vlans - undef CONFIG_BRIDGE_VLAN_FILTERING - adding/deleting vlans in different scenarios (with/without global ctx, while transmitting traffic, in ranges etc) - loading/removing the module while having/adding/deleting vlans - extracting bridge vlan information (user ABI), compressed requests - adding/deleting fdbs on vlans - bridge mac change, promisc mode - default pvid change - kmemleak ON during the whole time Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17netfilter: Pass net into okfnEric W. Biederman
This is immediately motivated by the bridge code that chains functions that call into netfilter. Without passing net into the okfns the bridge code would need to guess about the best expression for the network namespace to process packets in. As net is frequently one of the first things computed in continuation functions after netfilter has done it's job passing in the desired network namespace is in many cases a code simplification. To support this change the function dst_output_okfn is introduced to simplify passing dst_output as an okfn. For the moment dst_output_okfn just silently drops the struct net. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27bridge: fdb: rearrange net_bridge_fdb_entryNikolay Aleksandrov
While looking into fixing the local entries scalability issue I noticed that the structure is badly arranged because vlan_id would fall in a second cache line while keeping rcu which is used only when deleting in the first, so re-arrange the structure and push rcu to the end so we can get 16 bytes which can be used for other fields (by pushing rcu fully in the second 64 byte chunk). With this change all the core necessary information when doing fdb lookups will be available in a single cache line. pahole before (note vlan_id): struct net_bridge_fdb_entry { struct hlist_node hlist; /* 0 16 */ struct net_bridge_port * dst; /* 16 8 */ struct callback_head rcu; /* 24 16 */ long unsigned int updated; /* 40 8 */ long unsigned int used; /* 48 8 */ mac_addr addr; /* 56 6 */ unsigned char is_local:1; /* 62: 7 1 */ unsigned char is_static:1; /* 62: 6 1 */ unsigned char added_by_user:1; /* 62: 5 1 */ unsigned char added_by_external_learn:1; /* 62: 4 1 */ /* XXX 4 bits hole, try to pack */ /* XXX 1 byte hole, try to pack */ /* --- cacheline 1 boundary (64 bytes) --- */ __u16 vlan_id; /* 64 2 */ /* size: 72, cachelines: 2, members: 11 */ /* sum members: 65, holes: 1, sum holes: 1 */ /* bit holes: 1, sum bit holes: 4 bits */ /* padding: 6 */ /* last cacheline: 8 bytes */ } pahole after (note vlan_id): struct net_bridge_fdb_entry { struct hlist_node hlist; /* 0 16 */ struct net_bridge_port * dst; /* 16 8 */ long unsigned int updated; /* 24 8 */ long unsigned int used; /* 32 8 */ mac_addr addr; /* 40 6 */ __u16 vlan_id; /* 46 2 */ unsigned char is_local:1; /* 48: 7 1 */ unsigned char is_static:1; /* 48: 6 1 */ unsigned char added_by_user:1; /* 48: 5 1 */ unsigned char added_by_external_learn:1; /* 48: 4 1 */ /* XXX 4 bits hole, try to pack */ /* XXX 7 bytes hole, try to pack */ struct callback_head rcu; /* 56 16 */ /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */ /* size: 72, cachelines: 2, members: 11 */ /* sum members: 65, holes: 1, sum holes: 7 */ /* bit holes: 1, sum bit holes: 4 bits */ /* last cacheline: 8 bytes */ } Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27bridge: Add netlink support for vlan_protocol attributeToshiaki Makita
This enables bridge vlan_protocol to be configured through netlink. When CONFIG_BRIDGE_VLAN_FILTERING is disabled, kernel behaves the same way as this feature is not implemented. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-10bridge: netlink: add support for vlan_filtering attributeNikolay Aleksandrov
This patch adds the ability to toggle the vlan filtering support via netlink. Since we're already running with rtnl in .changelink() we don't need to take any additional locks. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-26bridge: mdb: notify on router port add and delSatish Ashok
Send notifications on router port add and del/expire, re-use the already existing MDBA_ROUTER and send NEWMDB/DELMDB netlink notifications respectively. Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20bridge: mcast: fix br_multicast_dev_del warn when igmp snooping is not definedNikolay Aleksandrov
Fix: net/bridge/br_if.c: In function 'br_dev_delete': >> net/bridge/br_if.c:284:2: error: implicit declaration of function >> 'br_multicast_dev_del' [-Werror=implicit-function-declaration] br_multicast_dev_del(br); ^ cc1: some warnings being treated as errors when igmp snooping is not defined. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20bridge: multicast: fix handling of temp and perm entriesSatish Ashok
When the bridge (or port) is brought down/up flush only temp entries and leave the perm ones. Flush perm entries only when deleting the bridge device or the associated port. Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09bridge: mdb: fill state in br_mdb_notifyNikolay Aleksandrov
Fill also the port group state when sending notifications. Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-24bridge: vlan: flush the dynamically learned entries on port vlan deleteNikolay Aleksandrov
Add a new argument to br_fdb_delete_by_port which allows to specify a vid to match when flushing entries and use it in nbp_vlan_delete() to flush the dynamically learned entries of the vlan/port pair when removing a vlan from a port. Before this patch only the local mac was being removed and the dynamically learned ones were left to expire. Note that the do_all argument is still respected and if specified, the vid will be ignored. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-12netfilter: bridge: forward IPv6 fragmented packetsBernhard Thaler
IPv6 fragmented packets are not forwarded on an ethernet bridge with netfilter ip6_tables loaded. e.g. steps to reproduce 1) create a simple bridge like this modprobe br_netfilter brctl addbr br0 brctl addif br0 eth0 brctl addif br0 eth2 ifconfig eth0 up ifconfig eth2 up ifconfig br0 up 2) place a host with an IPv6 address on each side of the bridge set IPv6 address on host A: ip -6 addr add fd01:2345:6789:1::1/64 dev eth0 set IPv6 address on host B: ip -6 addr add fd01:2345:6789:1::2/64 dev eth0 3) run a simple ping command on host A with packets > MTU ping6 -s 4000 fd01:2345:6789:1::2 4) wait some time and run e.g. "ip6tables -t nat -nvL" on the bridge IPv6 fragmented packets traverse the bridge cleanly until somebody runs. "ip6tables -t nat -nvL". As soon as it is run (and netfilter modules are loaded) IPv6 fragmented packets do not traverse the bridge any more (you see no more responses in ping's output). After applying this patch IPv6 fragmented packets traverse the bridge cleanly in above scenario. Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> [pablo@netfilter.org: small changes to br_nf_dev_queue_xmit] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-06-12netfilter: bridge: refactor frag_max_sizeBernhard Thaler
Currently frag_max_size is member of br_input_skb_cb and copied back and forth using IPCB(skb) and BR_INPUT_SKB_CB(skb) each time it is changed or used. Attach frag_max_size to nf_bridge_info and set value in pre_routing and forward functions. Use its value in forward and xmit functions. Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-05-05bridge: change BR_GROUPFWD_RESTRICTED to allow forwarding of LLDP framesBernhard Thaler
BR_GROUPFWD_RESTRICTED bitmask restricts users from setting values to /sys/class/net/brX/bridge/group_fwd_mask that allow forwarding of some IEEE 802.1D Table 7-10 Reserved addresses: (MAC Control) 802.3 01-80-C2-00-00-01 (Link Aggregation) 802.3 01-80-C2-00-00-02 802.1AB LLDP 01-80-C2-00-00-0E Change BR_GROUPFWD_RESTRICTED to allow to forward LLDP frames and document group_fwd_mask. e.g. echo 16384 > /sys/class/net/brX/bridge/group_fwd_mask allows to forward LLDP frames. This may be needed for bridge setups used for network troubleshooting or any other scenario where forwarding of LLDP frames is desired (e.g. bridge connecting a virtual machine to real switch transmitting LLDP frames that virtual machine needs to receive). Tested on a simple bridge setup with two interfaces and host transmitting LLDP frames on one side of this bridge (used lldpd). Setting group_fwd_mask as described above lets LLDP frames traverse bridge. Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29bridge/nl: remove wrong use of NLM_F_MULTINicolas Dichtel
NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact, it is sent only at the end of a dump. Libraries like libnl will wait forever for NLMSG_DONE. Fixes: e5a55a898720 ("net: create generic bridge ops") Fixes: 815cccbf10b2 ("ixgbe: add setlink, getlink support to ixgbe and ixgbevf") CC: John Fastabend <john.r.fastabend@intel.com> CC: Sathya Perla <sathya.perla@emulex.com> CC: Subbu Seetharaman <subbu.seetharaman@emulex.com> CC: Ajit Khaparde <ajit.khaparde@emulex.com> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: intel-wired-lan@lists.osuosl.org CC: Jiri Pirko <jiri@resnulli.us> CC: Scott Feldman <sfeldma@gmail.com> CC: Stephen Hemminger <stephen@networkplumber.org> CC: bridge@lists.linux-foundation.org Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-07netfilter: Pass socket pointer down through okfn().David Miller
On the output paths in particular, we have to sometimes deal with two socket contexts. First, and usually skb->sk, is the local socket that generated the frame. And second, is potentially the socket used to control a tunneling socket, such as one the encapsulates using UDP. We do not want to disassociate skb->sk when encapsulating in order to fix this, because that would break socket memory accounting. The most extreme case where this can cause huge problems is an AF_PACKET socket transmitting over a vxlan device. We hit code paths doing checks that assume they are dealing with an ipv4 socket, but are actually operating upon the AF_PACKET one. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-10netfilter: bridge: use rcu hook to resolve br_netfilter dependencyPablo Neira Ayuso
e5de75b ("netfilter: bridge: move DNAT helper to br_netfilter") results in the following link problem: net/bridge/br_device.c:29: undefined reference to `br_nf_prerouting_finish_bridge` Moreover it creates a hard dependency between br_netfilter and the bridge core, which is what we've been trying to avoid so far. Resolve this problem by using a hook structure so we reduce #ifdef pollution and keep bridge netfilter specific code under br_netfilter.c which was the original intention. Reported-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree. Basically, improvements for the packet rejection infrastructure, deprecation of CLUSTERIP, cleanups for nf_tables and some untangling for br_netfilter. More specifically they are: 1) Send packet to reset flow if checksum is valid, from Florian Westphal. 2) Fix nf_tables reject bridge from the input chain, also from Florian. 3) Deprecate the CLUSTERIP target, the cluster match supersedes it in functionality and it's known to have problems. 4) A couple of cleanups for nf_tables rule tracing infrastructure, from Patrick McHardy. 5) Another cleanup to place transaction declarations at the bottom of nf_tables.h, also from Patrick. 6) Consolidate Kconfig dependencies wrt. NF_TABLES. 7) Limit table names to 32 bytes in nf_tables. 8) mac header copying in bridge netfilter is already required when calling ip_fragment(), from Florian Westphal. 9) move nf_bridge_update_protocol() to br_netfilter.c, also from Florian. 10) Small refactor in br_netfilter in the transmission path, again from Florian. 11) Move br_nf_pre_routing_finish_bridge_slow() to br_netfilter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-09netfilter: bridge: move DNAT helper to br_netfilterPablo Neira Ayuso
Only one caller, there is no need to keep this in a header. Move it to br_netfilter.c where this belongs to. Based on patch from Florian Westphal. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-05bridge: Extend Proxy ARP design to allow optional rules for Wi-FiJouni Malinen
This extends the design in commit 958501163ddd ("bridge: Add support for IEEE 802.11 Proxy ARP") with optional set of rules that are needed to meet the IEEE 802.11 and Hotspot 2.0 requirements for ProxyARP. The previously added BR_PROXYARP behavior is left as-is and a new BR_PROXYARP_WIFI alternative is added so that this behavior can be configured from user space when required. In addition, this enables proxyarp functionality for unicast ARP requests for both BR_PROXYARP and BR_PROXYARP_WIFI since it is possible to use unicast as well as broadcast for these frames. The key differences in functionality: BR_PROXYARP: - uses the flag on the bridge port on which the request frame was received to determine whether to reply - block bridge port flooding completely on ports that enable proxy ARP BR_PROXYARP_WIFI: - uses the flag on the bridge port to which the target device of the request belongs - block bridge port flooding selectively based on whether the proxyarp functionality replied Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-01bridge: add flags argument to ndo_bridge_setlink and ndo_bridge_dellinkRoopa Prabhu
bridge flags are needed inside ndo_bridge_setlink/dellink handlers to avoid another call to parse IFLA_AF_SPEC inside these handlers This is used later in this series Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-18net: replace br_fdb_external_learn_* calls with switchdev notifier eventsJiri Pirko
This patch benefits from newly introduced switchdev notifier and uses it to propagate fdb learn events from rocker driver to bridge. That avoids direct function calls and possible use by other listeners (ovs). Suggested-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13net: rename vlan_tx_* helpers since "tx" is misleading thereJiri Pirko
The same macros are used for rx as well. So rename it. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02bridge: move private brport flags to if_bridge.h so port drivers can use flagsScott Feldman
Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02bridge: add API to notify bridge driver of learned FBD on offloaded deviceScott Feldman
When the swdev device learns a new mac/vlan on a port, it sends some async notification to the driver and the driver installs an FDB in the device. To give a holistic system view, the learned mac/vlan should be reflected in the bridge's FBD table, so the user, using normal iproute2 cmds, can view what is currently learned by the device. This API on the bridge driver gives a way for the swdev driver to install an FBD entry in the bridge FBD table. (And remove one). This is equivalent to the device running these cmds: bridge fdb [add|del] <mac> dev <dev> vid <vlan id> master This patch needs some extra eyeballs for review, in paricular around the locking and contexts. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02net: make vid as a parameter for ndo_fdb_add/ndo_fdb_delJiri Pirko
Do the work of parsing NDA_VLAN directly in rtnetlink code, pass simple u16 vid to drivers from there. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02bridge: convert flags in fbd entry into bitfieldsJiri Pirko
Suggested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27bridge: Add support for IEEE 802.11 Proxy ARPKyeyoon Park
This feature is defined in IEEE Std 802.11-2012, 10.23.13. It allows the AP devices to keep track of the hardware-address-to-IP-address mapping of the mobile devices within the WLAN network. The AP will learn this mapping via observing DHCP, ARP, and NS/NA frames. When a request for such information is made (i.e. ARP request, Neighbor Solicitation), the AP will respond on behalf of the associated mobile device. In the process of doing so, the AP will drop the multicast request frame that was intended to go out to the wireless medium. It was recommended at the LKS workshop to do this implementation in the bridge layer. vxlan.c is already doing something very similar. The DHCP snooping code will be added to the userspace application (hostapd) per the recommendation. This RFC commit is only for IPv4. A similar approach in the bridge layer will be taken for IPv6 as well. Signed-off-by: Kyeyoon Park <kyeyoonp@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2014-10-07bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTINGHerbert Xu
As we may defragment the packet in IPv4 PRE_ROUTING and refragment it after POST_ROUTING we should save the value of frag_max_size. This is still very wrong as the bridge is supposed to leave the packets intact, meaning that the right thing to do is to use the original frag_list for fragmentation. Unfortunately we don't currently guarantee that the frag_list is left untouched throughout netfilter so until this changes this is the best we can do. There is also a spot in FORWARD where it appears that we can forward a packet without going through fragmentation, mark it so that we can fix it later. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-05bridge: Add filtering support for default_pvidVlad Yasevich
Currently when vlan filtering is turned on on the bridge, the bridge will drop all traffic untill the user configures the filter. This isn't very nice for ports that don't care about vlans and just want untagged traffic. A concept of a default_pvid was recently introduced. This patch adds filtering support for default_pvid. Now, ports that don't care about vlans and don't define there own filter will belong to the VLAN of the default_pvid and continue to receive untagged traffic. This filtering can be disabled by setting default_pvid to 0. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-05bridge: Simplify pvid checks.Vlad Yasevich
Currently, if the pvid is not set, we return an illegal vlan value even though the pvid value is set to 0. Since pvid of 0 is currently invalid, just return 0 instead. This makes the current and future checks simpler. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-05bridge: Add a default_pvid sysfs attributeVlad Yasevich
This patch allows the user to set and retrieve default_pvid value. A new value can only be stored when vlan filtering is disabled. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-01net: bridge: add a br_set_state helper functionFlorian Fainelli
In preparation for being able to propagate port states to e.g: notifiers or other kernel parts, do not manipulate the port state directly, but instead use a helper function which will allow us to do a bit more than just setting the state. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== pull request: netfilter/ipvs updates for net-next The following patchset contains Netfilter/IPVS updates for net-next, most relevantly they are: 1) Four patches to make the new nf_tables masquerading support independent of the x_tables infrastructure. This also resolves a compilation breakage if the masquerade target is disabled but the nf_tables masq expression is enabled. 2) ipset updates via Jozsef Kadlecsik. This includes the addition of the skbinfo extension that allows you to store packet metainformation in the elements. This can be used to fetch and restore this to the packets through the iptables SET target, patches from Anton Danilov. 3) Add the hash:mac set type to ipset, from Jozsef Kadlecsick. 4) Add simple weighted fail-over scheduler via Simon Horman. This provides a fail-over IPVS scheduler (unlike existing load balancing schedulers). Connections are directed to the appropriate server based solely on highest weight value and server availability, patch from Kenny Mathis. 5) Support IPv6 real servers in IPv4 virtual-services and vice versa. Simon Horman informs that the motivation for this is to allow more flexibility in the choice of IP version offered by both virtual-servers and real-servers as they no longer need to match: An IPv4 connection from an end-user may be forwarded to a real-server using IPv6 and vice versa. No ip_vs_sync support yet though. Patches from Alex Gartrell and Julian Anastasov. 6) Add global generation ID to the nf_tables ruleset. When dumping from several different object lists, we need a way to identify that an update has ocurred so userspace knows that it needs to refresh its lists. This also includes a new command to obtain the 32-bits generation ID. The less significant 16-bits of this ID is also exposed through res_id field in the nfnetlink header to quickly detect the interference and retry when there is no risk of ID wraparound. 7) Move br_netfilter out of the bridge core. The br_netfilter code is built in the bridge core by default. This causes problems of different kind to people that don't want this: Jesper reported performance drop due to the inconditional hook registration and I remember to have read complains on netdev from people regarding the unexpected behaviour of our bridging stack when br_netfilter is enabled (fragmentation handling, layer 3 and upper inspection). People that still need this should easily undo the damage by modprobing the new br_netfilter module. 8) Dump the set policy nf_tables that allows set parameterization. So userspace can keep user-defined preferences when saving the ruleset. From Arturo Borrero. 9) Use __seq_open_private() helper function to reduce boiler plate code in x_tables, From Rob Jones. 10) Safer default behaviour in case that you forget to load the protocol tracker. Daniel Borkmann and Florian Westphal detected that if your ruleset is stateful, you allow traffic to at least one single SCTP port and the SCTP protocol tracker is not loaded, then any SCTP traffic may be pass through unfiltered. After this patch, the connection tracking classifies SCTP/DCCP/UDPlite/GRE packets as invalid if your kernel has been compiled with support for these modules. ==================== Trivially resolved conflict in include/linux/skbuff.h, Eric moved some netfilter skbuff members around, and the netfilter tree adjusted the ifdef guards for the bridging info pointer. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26netfilter: bridge: move br_netfilter out of the corePablo Neira Ayuso
Jesper reported that br_netfilter always registers the hooks since this is part of the bridge core. This harms performance for people that don't need this. This patch modularizes br_netfilter so it can be rmmod'ed, thus, the hooks can be unregistered. I think the bridge netfilter should have been a separated module since the beginning, Patrick agreed on that. Note that this is breaking compatibility for users that expect that bridge netfilter is going to be available after explicitly 'modprobe bridge' or via automatic load through brctl. However, the damage can be easily undone by modprobing br_netfilter. The bridge core also spots a message to provide a clue to people that didn't notice that this has been deprecated. On top of that, the plan is that nftables will not rely on this software layer, but integrate the connection tracking into the bridge layer to enable stateful filtering and NAT, which is was bridge netfilter users seem to require. This patch still keeps the fake_dst_ops in the bridge core, since this is required by when the bridge port is initialized. So we can safely modprobe/rmmod br_netfilter anytime. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Florian Westphal <fw@strlen.de>
2014-09-13bridge: Check if vlan filtering is enabled only once.Vlad Yasevich
The bridge code checks if vlan filtering is enabled on both ingress and egress. When the state flip happens, it is possible for the bridge to currently be forwarding packets and forwarding behavior becomes non-deterministic. Bridge may drop packets on some interfaces, but not others. This patch solves this by caching the filtered state of the packet into skb_cb on ingress. The skb_cb is guaranteed to not be over-written between the time packet entres bridge forwarding path and the time it leaves it. On egress, we can then check the cached state to see if we need to apply filtering information. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-10bridge: fdb dumping takes a filter deviceJamal Hadi Salim
Dumping a bridge fdb dumps every fdb entry held. With this change we are going to filter on selected bridge port. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11bridge: Support 802.1ad vlan filteringToshiaki Makita
This enables us to change the vlan protocol for vlan filtering. We come to be able to filter frames on the basis of 802.1ad vlan tags through a bridge. This also changes br->group_addr if it has not been set by user. This is needed for an 802.1ad bridge. (See IEEE 802.1Q-2011 8.13.5.) Furthermore, this sets br->group_fwd_mask_required so that an 802.1ad bridge can forward the Nearest Customer Bridge group addresses except for br->group_addr, which should be passed to higher layer. To change the vlan protocol, write a protocol in sysfs: # echo 0x88a8 > /sys/class/net/br0/bridge/vlan_protocol Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11bridge: Prepare for forwarding another bridge group addressesToshiaki Makita
If a bridge is an 802.1ad bridge, it must forward another bridge group addresses (the Nearest Customer Bridge group addresses). (For details, see IEEE 802.1Q-2011 8.6.3.) As user might not want group_fwd_mask to be modified by enabling 802.1ad, introduce a new mask, group_fwd_mask_required, which indicates addresses the bridge wants to forward. This will be set by enabling 802.1ad. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11bridge: Prepare for 802.1ad vlan filtering supportToshiaki Makita
This enables a bridge to have vlan protocol informantion and allows vlan tag manipulation (retrieve, insert and remove tags) according to the vlan protocol. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-10bridge: memorize and export selected IGMP/MLD querier portLinus Lüssing
Adding bridge support to the batman-adv multicast optimization requires batman-adv knowing about the existence of bridged-in IGMP/MLD queriers to be able to reliably serve any multicast listener behind this same bridge. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-10bridge: add export of multicast database adjacent to net_devLinus Lüssing
With this new, exported function br_multicast_list_adjacent(net_dev) a list of IPv4/6 addresses is returned. This list contains all multicast addresses sensed by the bridge multicast snooping feature on all bridge ports of the bridge interface of net_dev, excluding addresses from the specified net_device itself. Adding bridge support to the batman-adv multicast optimization requires batman-adv knowing about the existence of bridged-in multicast listeners to be able to reliably serve them with multicast packets. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-10bridge: adhere to querier election mechanism specified by RFCsLinus Lüssing
MLDv1 (RFC2710 section 6), MLDv2 (RFC3810 section 7.6.2), IGMPv2 (RFC2236 section 3) and IGMPv3 (RFC3376 section 6.6.2) specify that the querier with lowest source address shall become the selected querier. So far the bridge stopped its querier as soon as it heard another querier regardless of its source address. This results in the "wrong" querier potentially becoming the active querier or a potential, unnecessary querying delay. With this patch the bridge memorizes the source address of the currently selected querier and ignores queries from queriers with a higher source address than the currently selected one. This slight optimization is supposed to make it more RFC compliant (but is rather uncritical and therefore probably not necessary to be queued for stable kernels). Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-10bridge: rename struct bridge_mcast_query/querierLinus Lüssing
The current naming of these two structs is very random, in that reversing their naming would not make any semantical difference. This patch tries to make the naming less confusing by giving them a more specific, distinguishable naming. This is also useful for the upcoming patches reintroducing the "struct bridge_mcast_querier" but for storing information about the selected querier (no matter if our own or a foreign querier). Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: include/net/inetpeer.h net/ipv6/output_core.c Changes in net were fixing bugs in code removed in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02bridge: Prevent insertion of FDB entry with disallowed vlanToshiaki Makita
br_handle_local_finish() is allowing us to insert an FDB entry with disallowed vlan. For example, when port 1 and 2 are communicating in vlan 10, and even if vlan 10 is disallowed on port 3, port 3 can interfere with their communication by spoofed src mac address with vlan id 10. Note: Even if it is judged that a frame should not be learned, it should not be dropped because it is destined for not forwarding layer but higher layer. See IEEE 802.1Q-2011 8.13.10. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-22bridge: make br_device_notifier staticCong Wang
Merge net/bridge/br_notify.c into net/bridge/br.c, since it has only br_device_event() and br.c is small. Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18net: bridge: fix buildAlexei Starovoitov
fix build when BRIDGE_VLAN_FILTERING is not set Fixes: 2796d0c648c94 ("bridge: Automatically manage port promiscuous mode") Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16bridge: Automatically manage port promiscuous mode.Vlad Yasevich
There exist configurations where the administrator or another management entity has the foreknowledge of all the mac addresses of end systems that are being bridged together. In these environments, the administrator can statically configure known addresses in the bridge FDB and disable flooding and learning on ports. This makes it possible to turn off promiscuous mode on the interfaces connected to the bridge. Here is why disabling flooding and learning allows us to control promiscuity: Consider port X. All traffic coming into this port from outside the bridge (ingress) will be either forwarded through other ports of the bridge (egress) or dropped. Forwarding (egress) is defined by FDB entries and by flooding in the event that no FDB entry exists. In the event that flooding is disabled, only FDB entries define the egress. Once learning is disabled, only static FDB entries provided by a management entity define the egress. If we provide information from these static FDBs to the ingress port X, then we'll be able to accept all traffic that can be successfully forwarded and drop all the other traffic sooner without spending CPU cycles to process it. Another way to define the above is as following equations: ingress = egress + drop expanding egress ingress = static FDB + learned FDB + flooding + drop disabling flooding and learning we a left with ingress = static FDB + drop By adding addresses from the static FDB entries to the MAC address filter of an ingress port X, we fully define what the bridge can process without dropping and can thus turn off promiscuous mode, thus dropping packets sooner. There have been suggestions that we may want to allow learning and update the filters with learned addresses as well. This would require mac-level authentication similar to 802.1x to prevent attacks against the hw filters as they are limited resource. Additionally, if the user places the bridge device in promiscuous mode, all ports are placed in promiscuous mode regardless of the changes to flooding and learning. Since the above functionality depends on full static configuration, we have also require that vlan filtering be enabled to take advantage of this. The reason is that the bridge has to be able to receive and process VLAN-tagged frames and the there are only 2 ways to accomplish this right now: promiscuous mode or vlan filtering. Suggested-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16bridge: Introduce BR_PROMISC flagVlad Yasevich
Introduce a BR_PROMISC per-port flag that will help us track if the current port is supposed to be in promiscuous mode or not. For now, always start in promiscuous mode. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>