summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2010-06-11econet: fix lockingEric Dumazet
econet lacks proper locking. It holds econet_lock only when inserting or deleting an entry in econet_sklist, not during lookups. - convert econet_lock from rwlock to spinlock - use econet_lock in ec_listening_socket() lookup - use appropriate sock_hold() / sock_put() to avoid corruptions. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-11pkt_sched: gen_kill_estimator() rcu fixesEric Dumazet
gen_kill_estimator() API is incomplete or not well documented, since caller should make sure an RCU grace period is respected before freeing stats_lock. This was partially addressed in commit 5d944c640b4 (gen_estimator: deadlock fix), but same problem exist for all gen_kill_estimator() users, if lock they use is not already RCU protected. A code review shows xt_RATEEST.c, act_api.c, act_police.c have this problem. Other are ok because they use qdisc lock, already RCU protected. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-11Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
2010-06-11Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 Conflicts: drivers/net/wireless/wl12xx/wl1271.h drivers/net/wireless/wl12xx/wl1271_cmd.h
2010-06-10net-next: remove useless union keywordChangli Gao
remove useless union keyword in rtable, rt6_info and dn_route. Since there is only one member in a union, the union keyword isn't useful. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-10pktgen: Fix accuracy of inter-packet delay.Daniel Turull
This patch correct a bug in the delay of pktgen. It makes sure the inter-packet interval is accurate. Signed-off-by: Daniel Turull <daniel.turull@gmail.com> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-10pkt_sched: gen_estimator: add a new lockEric Dumazet
gen_kill_estimator() / gen_new_estimator() is not always called with RTNL held. net/netfilter/xt_RATEEST.c is one user of these API that do not hold RTNL, so random corruptions can occur between "tc" and "iptables". Add a new fine grained lock instead of trying to use RTNL in netfilter. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-10ip: ip_ra_control() rcu fixEric Dumazet
commit 66018506e15b (ip: Router Alert RCU conversion) introduced RCU lookups to ip_call_ra_chain(). It missed proper deinit phase : When ip_ra_control() deletes an ip_ra_chain, it should make sure ip_call_ra_chain() users can not start to use socket during the rcu grace period. It should also delay the sock_put() after the grace period, or we risk a premature socket freeing and corruptions, as raw sockets are not rcu protected yet. This delay avoids using expensive atomic_inc_not_zero() in ip_call_ra_chain(). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-10net: deliver skbs on inactive slaves to exact matchesJohn Fastabend
Currently, the accelerated receive path for VLAN's will drop packets if the real device is an inactive slave and is not one of the special pkts tested for in skb_bond_should_drop(). This behavior is different then the non-accelerated path and for pkts over a bonded vlan. For example, vlanx -> bond0 -> ethx will be dropped in the vlan path and not delivered to any packet handlers at all. However, bond0 -> vlanx -> ethx and bond0 -> ethx will be delivered to handlers that match the exact dev, because the VLAN path checks the real_dev which is not a slave and netif_recv_skb() doesn't drop frames but only delivers them to exact matches. This patch adds a sk_buff flag which is used for tagging skbs that would previously been dropped and allows the skb to continue to skb_netif_recv(). Here we add logic to check for the deliver_no_wcard flag and if it is set only deliver to handlers that match exactly. This makes both paths above consistent and gives pkt handlers a way to identify skbs that come from inactive slaves. Without this patch in some configurations skbs will be delivered to handlers with exact matches and in others be dropped out right in the vlan path. I have tested the following 4 configurations in failover modes and load balancing modes. # bond0 -> ethx # vlanx -> bond0 -> ethx # bond0 -> vlanx -> ethx # bond0 -> ethx | vlanx -> -- Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-09ipv6: fix ICMP6_MIB_OUTERRORSEric Dumazet
In commit 1f8438a85366 (icmp: Account for ICMP out errors), I did a typo on IPV6 side, using ICMP6_MIB_OUTMSGS instead of ICMP6_MIB_OUTERRORS Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-09ipv6: mcast: RCU conversionsEric Dumazet
- ipv6_sock_mc_join() : doesnt touch dev refcount - ipv6_sock_mc_drop() : doesnt touch dev/idev refcounts - ip6_mc_find_dev() becomes ip6_mc_find_dev_rcu() (called from rcu), and doesnt touch dev/idev refcounts - ipv6_sock_mc_close() : doesnt touch dev/idev refcounts - ip6_mc_source() uses ip6_mc_find_dev_rcu() - ip6_mc_msfilter() uses ip6_mc_find_dev_rcu() - ip6_mc_msfget() uses ip6_mc_find_dev_rcu() - ipv6_dev_mc_dec(), ipv6_chk_mcast_addr(), igmp6_event_query(), igmp6_event_report(), mld_sendpack(), igmp6_send() dont touch idev refcount Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-09icmp: RCU conversion in icmp_address_reply()Eric Dumazet
- rcu_read_lock() already held by caller - use __in_dev_get_rcu() instead of in_dev_get() / in_dev_put() - remove goto out; Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-09Merge branch 'num_rx_queues' of git://kernel.ubuntu.com/rtg/net-2.6David S. Miller
2010-06-09caif: fix a couple range checksDan Carpenter
The extra ! character means that these conditions are always false. Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-09phonet: use call_rcu for phonet device freeJiri Pirko
Use call_rcu rather than synchronize_rcu. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-09net: Print num_rx_queues imbalance warning only when there are allocated queuesTim Gardner
BugLink: http://bugs.launchpad.net/bugs/591416 There are a number of network drivers (bridge, bonding, etc) that are not yet receive multi-queue enabled and use alloc_netdev(), so don't print a num_rx_queues imbalance warning in that case. Also, only print the warning once for those drivers that _are_ multi-queue enabled. Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
2010-06-09Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
2010-06-09netfilter: nfnetlink_log: RCU conversionEric Dumazet
- instances_lock becomes a spinlock - lockless lookups While nfnetlink_log probably not performance critical, using less rwlocks in our code is always welcomed... Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-06-09netfilter: nfnetlink_queue: some optimizationsEric Dumazet
- Use an atomic_t for id_sequence to avoid a spin_lock/spin_unlock pair - Group highly modified struct nfqnl_instance fields together Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-06-09netfilter: ip6_queue: rwlock to spinlock conversionEric Dumazet
Converts queue_lock rwlock to a spinlock. (readlocked part can be changed by reads of integer values) One atomic operation instead of four per ipq_enqueue_packet() call. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-06-09ipvs: Add missing locking during connection table hashing and unhashingSven Wegener
The code that hashes and unhashes connections from the connection table is missing locking of the connection being modified, which opens up a race condition and results in memory corruption when this race condition is hit. Here is what happens in pretty verbose form: CPU 0 CPU 1 ------------ ------------ An active connection is terminated and we schedule ip_vs_conn_expire() on this CPU to expire this connection. IRQ assignment is changed to this CPU, but the expire timer stays scheduled on the other CPU. New connection from same ip:port comes in right before the timer expires, we find the inactive connection in our connection table and get a reference to it. We proper lock the connection in tcp_state_transition() and read the connection flags in set_tcp_state(). ip_vs_conn_expire() gets called, we unhash the connection from our connection table and remove the hashed flag in ip_vs_conn_unhash(), without proper locking! While still holding proper locks we write the connection flags in set_tcp_state() and this sets the hashed flag again. ip_vs_conn_expire() fails to expire the connection, because the other CPU has incremented the reference count. We try to re-insert the connection into our connection table, but this fails in ip_vs_conn_hash(), because the hashed flag has been set by the other CPU. We re-schedule execution of ip_vs_conn_expire(). Now this connection has the hashed flag set, but isn't actually hashed in our connection table and has a dangling list_head. We drop the reference we held on the connection and schedule the expire timer for timeouting the connection on this CPU. Further packets won't be able to find this connection in our connection table. ip_vs_conn_expire() gets called again, we think it's already hashed, but the list_head is dangling and while removing the connection from our connection table we write to the memory location where this list_head points to. The result will probably be a kernel oops at some other point in time. This race condition is pretty subtle, but it can be triggered remotely. It needs the IRQ assignment change or another circumstance where packets coming from the same ip:port for the same service are being processed on different CPUs. And it involves hitting the exact time at which ip_vs_conn_expire() gets called. It can be avoided by making sure that all packets from one connection are always processed on the same CPU and can be made harder to exploit by changing the connection timeouts to some custom values. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Cc: stable@kernel.org Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-06-09netfilter: ip_queue: rwlock to spinlock conversionEric Dumazet
Converts queue_lock rwlock to a spinlock. (readlocked part can be changed by reads of integer values) One atomic operation instead of four per ipq_enqueue_packet() call. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-06-09netfilter: xt_sctp: use WORD_ROUND macro to calculate length of multiple of ↵Shan Wei
4 bytes Use WORD_ROUND to round an int up to the next multiple of 4. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-06-09netfilter: nf_conntrack: per_cpu untrackingEric Dumazet
NOTRACK makes all cpus share a cache line on nf_conntrack_untracked twice per packet, slowing down performance. This patch converts it to a per_cpu variable. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-06-08mac80211: fix deauth before assocJohannes Berg
When we receive a deauthentication frame before having successfully associated, we neither print a message nor abort assocation. The former makes it hard to debug, while the latter later causes a warning in cfg80211 when, as will typically be the case, association timed out. This warning was reported by many, e.g. in https://bugzilla.kernel.org/show_bug.cgi?id=15981, but I couldn't initially pinpoint it. I verified the fix by hacking hostapd to send a deauth frame instead of an association response. Cc: stable@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Tested-by: Miles Lane <miles.lane@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-08netfilter: nf_conntrack: IPS_UNTRACKED bitEric Dumazet
NOTRACK makes all cpus share a cache line on nf_conntrack_untracked twice per packet. This is bad for performance. __read_mostly annotation is also a bad choice. This patch introduces IPS_UNTRACKED bit so that we can use later a per_cpu untrack structure more easily. A new helper, nf_ct_untracked_get() returns a pointer to nf_conntrack_untracked. Another one, nf_ct_untracked_status_or() is used by nf_nat_init() to add IPS_NAT_DONE_MASK bits to untracked status. nf_ct_is_untracked() prototype is changed to work on a nf_conn pointer. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-06-08mac80211: Add netif state checking to ieee80211_ifa_changedJuuso Oikarinen
There's a window for ieee80211_ifa_changed() to get called whilst the managed mode mutex has not been initialized when opening and stopping the interface. Currently this causes a kernel BUG like the following: [ 132.460013] kernel BUG at /home/wifi/iwlwifi-2.6/net/mac80211/main.c:380! [ 132.460013] invalid opcode: 0000 [#1] SMP The mutex is initialized during open(), hence once netif_running() is true, the mutex should be valid. Fix by adding a netif_running() check to the function. Reported-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-07anycast: Some RCU conversionsEric Dumazet
- dev_get_by_flags() changed to dev_get_by_flags_rcu() - ipv6_sock_ac_join() dont touch dev & idev refcounts - ipv6_sock_ac_drop() dont touch dev & idev refcounts - ipv6_sock_ac_close() dont touch dev & idev refcounts - ipv6_dev_ac_dec() dount touch idev refcount - ipv6_chk_acast_addr() dont touch idev refcount Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-07net: avoid two atomic ops in ip_rcv_options()Eric Dumazet
in_dev_get() -> __in_dev_get_rcu() in a rcu protected function. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-07ipv4: avoid two atomic ops in ip_rt_redirect()Eric Dumazet
in_dev_get() -> __in_dev_get_rcu() in a rcu protected function. [ Fix build with CONFIG_IP_ROUTE_VERBOSE disabled. -DaveM ] Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-07igmp: avoid two atomic ops in igmp_rcv()Eric Dumazet
in_dev_get() -> __in_dev_get_rcu() in a rcu protected function. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-07ip: Router Alert RCU conversionEric Dumazet
Straightforward conversion to RCU. One rwlock becomes a spinlock, and is static. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-07Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl-core.h
2010-06-07mac80211: fix lock leak w/ ARP filtering and w/o CONFIG_INETJohn W. Linville
"mac80211: make ARP filtering depend on CONFIG_INET" introduced this potential locking leak. Reported-by: Juuso Oikarinen <juuso.oikarinen@nokia.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-07mac80211: fix function pointer checkHolger Schurig
This makes "iw wlan0 dump survey" work again with mac80211-based drivers that support it, e.g. ath5k. Signed-off-by: Holger Schurig <holgerschurig@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-06-07ipmr: dont corrupt listsEric Dumazet
ipmr_rules_exit() and ip6mr_rules_exit() free a list of items, but forget to properly remove these items from list. List head is not changed and still points to freed memory. This can trigger a fault later when icmpv6_sk_exit() is called. Fix is to either reinit list, or use list_del() to properly remove items from list before freeing them. bugzilla report : https://bugzilla.kernel.org/show_bug.cgi?id=16120 Introduced by commit d1db275dd3f6e4 (ipv6: ip6mr: support multiple tables) and commit f0ad0860d01e (ipv4: ipmr: support multiple tables) Reported-by: Alex Zhavnerchik <alex.vizor@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-07net: Remove unnecessary net action assertionjamal
The extra assertion to allow packet munging only when there are no other ptypes listening which may have worked around an old bug is unnecessary. It is sufficient to check if the skb is cloned before trampling on it. Thanks to Herbert Xu for being persistent and patient in getting this across. [Note that cloning checks and assertions are the general rule used by tc actions (documentation/networking/tc-actions-env-rules.txt)]. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-07net sched: make pedit check for clones insteadjamal
Now that the core path doesnt set OK to munge we detect writable skbs by looking to see if they are cloned. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-07htb: remove two unnecessary assignmentsChangli Gao
remove two unnecessary assignments we don't need to assign NULL when initialize structure objects. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- net/sched/sch_htb.c | 2 -- 1 file changed, 2 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-07raw: avoid two atomics in xmitEric Dumazet
Avoid two atomic ops per raw_send_hdrinc() call Avoid two atomic ops per raw6_send_hdrinc() call Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-07net-caif: Added missing lock validator constantsAlex Lorca
CAIF is using "xxx-AF_MAX" strings for the lock validator. It should use its own strings. Signed-off-by: Alex Lorca <alex.lorca@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-07tcp: Fix slowness in read /proc/net/tcpTom Herbert
This patch address a serious performance issue in reading the TCP sockets table (/proc/net/tcp). Reading the full table is done by a number of sequential read operations. At each read operation, a seek is done to find the last socket that was previously read. This seek operation requires that the sockets in the table need to be counted up to the current file position, and to count each of these requires taking a lock for each non-empty bucket. The whole algorithm is O(n^2). The fix is to cache the last bucket value, offset within the bucket, and the file position returned by the last read operation. On the next sequential read, the bucket and offset are used to find the last read socket immediately without needing ot scan the previous buckets the table. This algorithm t read the whole table is O(n). The improvement offered by this patch is easily show by performing cat'ing /proc/net/tcp on a machine with a lot of connections. With about 182K connections in the table, I see the following: - Without patch time cat /proc/net/tcp > /dev/null real 1m56.729s user 0m0.214s sys 1m56.344s - With patch time cat /proc/net/tcp > /dev/null real 0m0.894s user 0m0.290s sys 0m0.594s Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-06Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/sfc/net_driver.h drivers/net/sfc/siena.c
2010-06-06ip6mr: fix a typo in ip6mr_for_each_table()Eric Dumazet
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-05ipv6: avoid high order allocationsEric Dumazet
With mtu=9000, mld_newpack() use order-2 GFP_ATOMIC allocations, that are very unreliable, on machines where PAGE_SIZE=4K Limit allocated skbs to be at most one page. (order-0 allocations) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-05skbuff: add check for non-linear to warn_if_lro and needs_linearizeAlexander Duyck
We can avoid an unecessary cache miss by checking if the skb is non-linear before accessing gso_size/gso_type in skb_warn_if_lro, the same can also be done to avoid a cache miss on nr_frags if data_len is 0. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-05syncookies: update mss tablesFlorian Westphal
- ipv6 msstab: account for ipv6 header size - ipv4 msstab: add mss for Jumbograms. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-05syncookies: avoid unneeded tcp header flag double checkFlorian Westphal
caller: if (!th->rst && !th->syn && th->ack) callee: if (!th->ack) make the caller only check for !syn (common for 3whs), and move the !rst / ack test to the callee. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-05syncookies: make v4/v6 synflood warning behaviour the sameFlorian Westphal
both syn_flood_warning functions print a message, but ipv4 version only prints a warning if CONFIG_SYN_COOKIES=y. Make the v4 one behave like the v6 one. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-04tcp: use correct net ns in cookie_v4_check()Eric Dumazet
Its better to make a route lookup in appropriate namespace. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>