summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2010-10-21vlan: Enable software emulation for vlan accleration.Jesse Gross
Currently users of hardware vlan accleration need to know whether the device supports it before generating packets. However, vlan acceleration will soon be available in a more flexible manner so knowing ahead of time becomes much more difficult. This adds a software fallback path for vlan packets on devices without the necessary offloading support, similar to other types of hardware accleration. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-21vlan: Rename VLAN_GROUP_ARRAY_LEN to VLAN_N_VID.Jesse Gross
VLAN_GROUP_ARRAY_LEN is simply the number of possible vlan VIDs. Since vlan groups will soon be more of an implementation detail for vlan devices, rename the constant to be descriptive of its actual purpose. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-21ebtables: Allow filtering of hardware accelerated vlan frames.Jesse Gross
An upcoming commit will allow packets with hardware vlan acceleration information to be passed though more parts of the network stack, including packets trunked through the bridge. This adds support for matching and filtering those packets through ebtables. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-21Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/padovan/bluetooth-2.6
2010-10-21secmark: fix config problem when CONFIG_NF_CONNTRACK_SECMARK is not setEric Paris
When CONFIG_NF_CONNTRACK_SECMARK is not set we accidentally attempt to use the secmark fielf of struct nf_conn. Problem is when that config isn't set the field doesn't exist. whoops. Wrap the incorrect usage in the config. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21secmark: export secctx, drop secmark in procfsEric Paris
The current secmark code exports a secmark= field which just indicates if there is special labeling on a packet or not. We drop this field as it isn't particularly useful and instead export a new field secctx= which is the actual human readable text label. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21conntrack: export lsm context rather than internal secid via netlinkEric Paris
The conntrack code can export the internal secid to userspace. These are dynamic, can change on lsm changes, and have no meaning in userspace. We should instead be sending lsm contexts to userspace instead. This patch sends the secctx (rather than secid) to userspace over the netlink socket. We use a new field CTA_SECCTX and stop using the the old CTA_SECMARK field since it did not send particularly useful information. Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: Paul Moore <paul.moore@hp.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21secmark: make secmark object handling genericEric Paris
Right now secmark has lots of direct selinux calls. Use all LSM calls and remove all SELinux specific knowledge. The only SELinux specific knowledge we leave is the mode. The only point is to make sure that other LSMs at least test this generic code before they assume it works. (They may also have to make changes if they do not represent labels as strings) Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Paul Moore <paul.moore@hp.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-21secmark: do not return early if there was no errorEric Paris
Commit 4a5a5c73 attempted to pass decent error messages back to userspace for netfilter errors. In xt_SECMARK.c however the patch screwed up and returned on 0 (aka no error) early and didn't finish setting up secmark. This results in a kernel BUG if you use SECMARK. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>
2010-10-20ceph: fix num_pages_free accounting in pagelistSage Weil
Decrement the free page counter when removing a page from the free_list. Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20ceph: don't crash when passed bad mount optionsYehuda Sadeh
This only happened when parse_extra_token was not passed to ceph_parse_option() (hence, only happened in rbd). Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2010-10-20ceph: add pagelist_reserve, pagelist_truncate, pagelist_set_cursorGreg Farnum
These facilitate preallocation of pages so that we can encode into the pagelist in an atomic context. Signed-off-by: Greg Farnum <gregf@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20rbd: introduce rados block device (rbd), based on libcephYehuda Sadeh
The rados block device (rbd), based on osdblk, creates a block device that is backed by objects stored in the Ceph distributed object storage cluster. Each device consists of a single metadata object and data striped over many data objects. The rbd driver supports read-only snapshots. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20ceph: factor out libceph from Ceph file systemYehuda Sadeh
This factors out protocol and low-level storage parts of ceph into a separate libceph module living in net/ceph and include/linux/ceph. This is mostly a matter of moving files around. However, a few key pieces of the interface change as well: - ceph_client becomes ceph_fs_client and ceph_client, where the latter captures the mon and osd clients, and the fs_client gets the mds client and file system specific pieces. - Mount option parsing and debugfs setup is correspondingly broken into two pieces. - The mon client gets a generic handler callback for otherwise unknown messages (mds map, in this case). - The basic supported/required feature bits can be expanded (and are by ceph_fs_client). No functional change, aside from some subtle error handling cases that got cleaned up in the refactoring process. Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20net: avoid RCU for NOCACHE dstEric Dumazet
There is no point using RCU for dst we allocate for a very short time (used once). Change dst_release() to take DST_NOCACHE into account, but also change skb_dst_set_noref() to force a refcount increment for such dst. This is a _huge_ gain, because we dont waste memory to store xx thousand of dsts. Instead of queueing them to RCU, we can free them instantly. CPU caches can stay hot, re-using same memory blocks to hold temporary dsts. Note : remove unneeded smp_mb__before_atomic_dec(); in dst_release(), since atomic_dec_return() implies a full memory barrier. Stress test, 160.000.000 udp frames sent, IP route cache disabled (DDOS). Before: real 0m38.091s user 0m13.189s sys 7m53.018s After: real 0m29.946s user 0m12.157s sys 7m40.605s For reference, if IP route cache was enabled : real 0m32.030s user 0m10.521s sys 8m15.243s Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-20net: allocate tx queues in register_netdeviceTom Herbert
This patch introduces netif_alloc_netdev_queues which is called from register_device instead of alloc_netdev_mq. This makes TX queue allocation symmetric with RX allocation. Also, queue locks allocation is done in netdev_init_one_queue. Change set_real_num_tx_queues to fail if requested number < 1 or greater than number of allocated queues. Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-20net: cleanups in RX queue allocationTom Herbert
Clean up in RX queue allocation. In netif_set_real_num_rx_queues return error on attempt to set zero queues, or requested number is greater than number of allocated queues. In netif_alloc_rx_queues, do BUG_ON if queue_count is zero. Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-20net: fail alloc_netdev_mq if queue count < 1Tom Herbert
In alloc_netdev_mq fail if requested queue_count < 1. Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-20Merge branch 'for-davem' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
2010-10-20phonet: remove the unused variable pnChangli Gao
Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-20netpoll: Revert napi_poll fix for bonding driverNeil Horman
In an erlier patch I modified napi_poll so that devices with IFF_MASTER polled the per_cpu list instead of the device list for napi. I did this because the bonding driver has no napi instances to poll, it instead expects to check the slave devices napi instances, which napi_poll was unaware of. Looking at this more closely however, I now see this isn't strictly needed. As the bond driver poll_controller calls the slaves poll_controller via netpoll_poll_dev, which recursively calls poll_napi on each slave, allowing those napi instances to get serviced. The earlier patch isn't at all harmfull, its just not needed, so lets revert it to make the code cleaner. Sorry for the noise, Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: WANG Cong <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-19Fixed race condition at ip_vs.ko module init.Eduardo Blanco
Lists were initialized after the module was registered. Multiple ipvsadm processes at module load triggered a race condition that resulted in a null pointer dereference in do_ip_vs_get_ctl(). As a result, __ip_vs_mutex was left locked preventing all further ipvsadm commands. Signed-off-by: Eduardo J. Blanco <ejblanco@google.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2010-10-19inet: RCU changes in inetdev_by_index()Eric Dumazet
Convert inetdev_by_index() to not increment in_dev refcount. Callers hold RCU or RTNL, and should not decrement in_dev refcount. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-19net: avoid a dev refcount in ip_mc_find_dev()Eric Dumazet
We hold RTNL in ip_mc_find_dev(), no need to touch device refcount. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-19sunrpc: remove the big kernel lockArnd Bergmann
The sunrpc cache_ioctl function does not need the big kernel lock because it uses its own queue_lock already. rpc_pipe_ioctl apparently should be using i_lock like the other operations on the pipe file descriptor do. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2010-10-19ipvs: IPv6 tunnel modeHans Schillstrom
IPv6 encapsulation uses a bad source address for the tunnel. i.e. VIP will be used as local-addr and encap. dst addr. Decapsulation will not accept this. Example LVS (eth1 2003::2:0:1/96, VIP 2003::2:0:100) (eth0 2003::1:0:1/96) RS (ethX 2003::1:0:5/96) tcpdump 2003::2:0:100 > 2003::1:0:5: IP6 (hlim 63, next-header TCP (6) payload length: 40) 2003::3:0:10.50991 > 2003::2:0:100.http: Flags [S], cksum 0x7312 (correct), seq 3006460279, win 5760, options [mss 1440,sackOK,TS val 1904932 ecr 0,nop,wscale 3], length 0 In Linux IPv6 impl. you can't have a tunnel with an any cast address receiving packets (I have not tried to interpret RFC 2473) To have receive capabilities the tunnel must have: - Local address set as multicast addr or an unicast addr - Remote address set as an unicast addr. - Loop back addres or Link local address are not allowed. This causes us to setup a tunnel in the Real Server with the LVS as the remote address, here you can't use the VIP address since it's used inside the tunnel. Solution Use outgoing interface IPv6 address (match against the destination). i.e. use ip6_route_output() to look up the route cache and then use ipv6_dev_get_saddr(...) to set the source address of the encapsulated packet. Additionally, cache the results in new destination fields: dst_cookie and dst_saddr and properly check the returned dst from ip6_route_output. We now add xfrm_lookup call only for the tunneling method where the source address is a local one. Signed-off-by:Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-19netfilter: ctnetlink: add expectation deletion eventsPablo Neira Ayuso
This patch allows to listen to events that inform about expectations destroyed. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-18sched: Fix softirq time accountingVenkatesh Pallipadi
Peter Zijlstra found a bug in the way softirq time is accounted in VIRT_CPU_ACCOUNTING on this thread: http://lkml.indiana.edu/hypermail//linux/kernel/1009.2/01366.html The problem is, softirq processing uses local_bh_disable internally. There is no way, later in the flow, to differentiate between whether softirq is being processed or is it just that bh has been disabled. So, a hardirq when bh is disabled results in time being wrongly accounted as softirq. Looking at the code a bit more, the problem exists in !VIRT_CPU_ACCOUNTING as well. As account_system_time() in normal tick based accouting also uses softirq_count, which will be set even when not in softirq with bh disabled. Peter also suggested solution of using 2*SOFTIRQ_OFFSET as irq count for local_bh_{disable,enable} and using just SOFTIRQ_OFFSET while softirq processing. The patch below does that and adds API in_serving_softirq() which returns whether we are currently processing softirq or not. Also changes one of the usages of softirq_count in net/sched/cls_cgroup.c to in_serving_softirq. Looks like many usages of in_softirq really want in_serving_softirq. Those changes can be made individually on a case by case basis. Signed-off-by: Venkatesh Pallipadi <venki@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1286237003-12406-2-git-send-email-venki@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-18bonding: Fix napi poll for bonding driverNeil Horman
Usually the netpoll path, when preforming a napi poll can get away with just polling all the napi instances of the configured device. Thats not the case for the bonding driver however, as the napi instances which may wind up getting flagged as needing polling after the poll_controller call don't belong to the bonded device, but rather to the slave devices. Fix this by checking the device in question for the IFF_MASTER flag, if set, we know we need to check the full poll list for this cpu, rather than just the devices napi instance list. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-18bonding: Fix bonding drivers improper modification of netpoll structureNeil Horman
The bonding driver currently modifies the netpoll structure in its xmit path while sending frames from netpoll. This is racy, as other cpus can access the netpoll structure in parallel. Since the bonding driver points np->dev to a slave device, other cpus can inadvertently attempt to send data directly to slave devices, leading to improper locking with the bonding master, lost frames, and deadlocks. This patch fixes that up. This patch also removes the real_dev pointer from the netpoll structure as that data is really only used by bonding in the poll_controller, and we can emulate its behavior by check each slave for IS_UP. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-18IPv4: route.c: Change checks against 0xffffffff to ipv4_is_lbcast()Andy Walls
Change a few checks against the hardcoded broadcast address, 0xffffffff, to ipv4_is_lbcast(). Remove some existing checks using ipv4_is_lbcast() that are now obviously superfluous. Signed-off-by: Andy Walls <awalls@md.metrocast.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-18netfilter: fix kconfig unmet dependency warningRandy Dunlap
Fix netfilter kconfig unmet dependencies warning & spell out "compatible" while there. warning: (IP_NF_TARGET_TTL && NET && INET && NETFILTER && IP_NF_IPTABLES && NETFILTER_ADVANCED || IP6_NF_TARGET_HL && NET && INET && IPV6 && NETFILTER && IP6_NF_IPTABLES && NETFILTER_ADVANCED) selects NETFILTER_XT_TARGET_HL which has unmet direct dependencies ((IP_NF_MANGLE || IP6_NF_MANGLE) && NETFILTER_ADVANCED) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-18Update broken web addresses in the kernel.Justin P. Mattock
The patch below updates broken web addresses in the kernel Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Cc: Maciej W. Rozycki <macro@linux-mips.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Finn Thain <fthain@telegraphics.com.au> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Matt Turner <mattst88@gmail.com> Cc: Dimitry Torokhov <dmitry.torokhov@gmail.com> Cc: Mike Frysinger <vapier.adi@gmail.com> Acked-by: Ben Pfaff <blp@cs.stanford.edu> Acked-by: Hans J. Koch <hjk@linutronix.de> Reviewed-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-10-18tipc: Simplify bearer shutdown logicAllan Stephens
Optimize processing in TIPC's bearer shutdown code, including: 1. Remove an unnecessary check to see if TIPC bearer's can exist. 2. Don't release spinlocks before calling a media-specific disabling routine, since the routine can't sleep. 3. Make bearer_disable() operate directly on a struct bearer, instead of needlessly taking a name and then mapping that to the struct. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Reviewed-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-18tipc: Kill tipc_get_mode() completely.David S. Miller
It's completely unused and exporting a static symbol makes no sense and breaks the build. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-17Bluetooth: fix oops in l2cap_connect_reqNathan Holstein
In error cases when the ACL is insecure or we fail to allocate a new struct sock, we jump to the "response" label. If so, "sk" will be null and the kernel crashes. Signed-off-by: Nathan Holstein <nathan.holstein@gmail.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-10-17fib_hash: RCU conversion phase 2Eric Dumazet
Get rid of fib_hash_lock rwlock. The fn_zone hash table resize is the noticeable part of this patch. I added a seqlock per fn_zone, so that readers can restart their lookup in the (very rare) case a writer expanded the hash table. Add rcu heads in fib_alias and fib_node, use call_rcu() to defer their freeing, and use appropriate _rcu list manipulations. Stress test (160.000.000 udp frames sent, IP route cache disabled to mimic DDOS attack, FIB_HASH) Before: real 0m41.191s user 0m13.137s sys 8m55.241s After: real 0m38.091s user 0m13.189s sys 7m53.018s Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-17fib_hash: RCU conversion phase 1Eric Dumazet
First step for RCU conversion of fib_hash : struct fn_zone are created and never deleted. Very classic conversion, using rcu_assign_pointer(), rcu_dereference() and rtnl_dereference() verbs. __rcu markers on fz_next and fn_zone_list They are created under RTNL, we dont need fib_hash_lock anymore in fn_new_zone(). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-17fib_hash: embed initial hash table in fn_zoneEric Dumazet
While looking for false sharing problems, I noticed sizeof(struct fn_zone) was small (28 bytes) and possibly sharing a cache line with an often written kernel structure. Most of the time, fn_zone uses its initial hash table of 16 slots. We can avoid the false sharing problem by embedding this initial hash table in fn_zone itself, so that sizeof(fn_zone) > L1_CACHE_BYTES We did a similar optimization in commit a6501e080c (Reduce memory needs and speedup lookups) Add a fz_revorder field to speedup fn_hash() a bit. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-17tcp: use correct counters in CA_CWR state tooIlpo Järvinen
As CWR is stronger than CA_Disorder state, we can miscount SACK/Reno failure into other timeouts. Not a bad problem as it can happen only due to ECN, FRTO detecting spurious RTO or xmit error which are the only callers of tcp_enter_cwr. And even then losses and RTO must still follow thereafter to actually end up into the relevant code paths. Compile tested. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-17tcp: sack lost marking fixesIlpo Järvinen
When only fast rexmit should be done, tcp_mark_head_lost marks L too far. Also, sacked_upto below 1 is perfectly valid number, the packets == 0 then needs to be trapped elsewhere. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-16tipc: cleanup function namespacestephen hemminger
Do some cleanups of TIPC based on make namespacecheck 1. Don't export unused symbols 2. Eliminate dead code 3. Make functions and variables local 4. Rename buf_acquire to tipc_buf_acquire since it is used in several files Compile tested only. This make break out of tree kernel modules that depend on TIPC routines. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-16fib: avoid false sharing on fib_table_hashEric Dumazet
While doing profile analysis, I found fib_hash_table was sometime in a cache line shared by a possibly often written kernel structure. (CONFIG_IP_ROUTE_MULTIPATH || !CONFIG_IPV6_MULTIPLE_TABLES) It's hard to detect because not easily reproductible. Make sure we allocate a full cache line to keep this shared in all cpus caches. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-16fib_trie: use fls() instead of open coded loopEric Dumazet
fib_table_lookup() might use fls() to speedup an open coded loop. Noticed while doing a profile analysis. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-16fib: remove a useless synchronize_rcu() callEric Dumazet
fib_nl_delrule() calls synchronize_rcu() for no apparent reason, while rtnl is held. I suspect it was done to avoid an atomic_inc_not_zero() in fib_rules_lookup(), which commit 7fa7cb7109d07 added anyway. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-16fib6: use FIB_LOOKUP_NOREF in fib6_rule_lookup()Eric Dumazet
Avoid two atomic ops on found rule in fib6_rule_lookup() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-16net: allocate skbs on local nodeEric Dumazet
commit b30973f877 (node-aware skb allocation) spread a wrong habit of allocating net drivers skbs on a given memory node : The one closest to the NIC hardware. This is wrong because as soon as we try to scale network stack, we need to use many cpus to handle traffic and hit slub/slab management on cross-node allocations/frees when these cpus have to alloc/free skbs bound to a central node. skb allocated in RX path are ephemeral, they have a very short lifetime : Extra cost to maintain NUMA affinity is too expensive. What appeared as a nice idea four years ago is in fact a bad one. In 2010, NIC hardwares are multiqueue, or we use RPS to spread the load, and two 10Gb NIC might deliver more than 28 million packets per second, needing all the available cpus. Cost of cross-node handling in network and vm stacks outperforms the small benefit hardware had when doing its DMA transfert in its 'local' memory node at RX time. Even trying to differentiate the two allocations done for one skb (the sk_buff on local node, the data part on NIC hardware node) is not enough to bring good performance. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-15Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
2010-10-15Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/padovan/bluetooth-next-2.6 into for-davem
2010-10-15radiotap: fix vendor namespace parsingJohannes Berg
There's a bug with radiotap vendor namespace parsing if you don't register for the given namespace extensions. Fix this by passing only the unknown vendor namespaces and the registered data to frontends, but not both. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>