summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2013-01-03mac80211: optimise roaming time againJohannes Berg
The last fixes re-added the RCU synchronize penalty on roaming to fix the races. Split up sta_info_flush() now to get rid of that again, and let managed mode (and only it) delay the actual destruction. Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03mac80211: warn if unexpectedly removing stationsJohannes Berg
When an interface is brought down it must have been disconnected (or similar) in all modes other than WDS, so warn if any stations were removed in other modes. Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03mac80211: remove final sta_info_flush()Johannes Berg
When all interfaces have been removed, there can't be any stations left over, so there's no need to flush again. Remove this, and all code associated with it, which also simplifies the function. Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03mac80211: use short slot time in mesh for 5GHzChun-Yeow Yeoh
Use short slot time in 5GHz for mesh. The performance is increased from 16.4Mbps to 23.4Mbps for two directly connected mesh STAs operating in legacy rate using iperf measurement. Almost similar to the results claimed in IBSS mode. Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com> [call ieee80211_get_sdata_band() only once] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03mac80211: Allow disabling SGI-20Ben Greear
This allows user-space (wpa_supplicant) to disable short guard interval (SGI) for 20Mhz. The SGI-40 disable option is already handled. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03mac80211: fix maximum MTUChaitanya
The maximum MTU shouldn't take the headers into account, the maximum MSDU size is exactly the maximum MTU. Signed-off-by: T Krishna Chaitanya <chaitanyatk@posedge.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03mac80211: fix dtim_period in hidden SSID AP associationJohannes Berg
When AP's SSID is hidden the BSS can appear several times in cfg80211's BSS list: once with a zero-length SSID that comes from the beacon, and once for each SSID from probe reponses. Since the mac80211 stores its data in ieee80211_bss which is embedded into cfg80211_bss, mac80211's data will be duplicated too. This becomes a problem when a driver needs the dtim_period since this data exists only in the beacon's instance in cfg80211 bss table which isn't the instance that is used when associating. Remove the DTIM period from the BSS table and track it explicitly to avoid this problem. Cc: stable@vger.kernel.org Tested-by: Efi Tubul <efi.tubul@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03mac80211: use del_timer_sync for final sta cleanup timer deletionJohannes Berg
This is a very old bug, but there's nothing that prevents the timer from running while the module is being removed when we only do del_timer() instead of del_timer_sync(). The timer should normally not be running at this point, but it's not clearly impossible (or we could just remove this.) Cc: stable@vger.kernel.org Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03mac80211: fix station destruction in AP/mesh modesJohannes Berg
Unfortunately, commit b22cfcfcae5b, intended to speed up roaming by avoiding the synchronize_rcu() broke AP/mesh modes as it moved some code into that work item that will still call into the driver at a time where it's no longer expected to handle this: after the AP or mesh has been stopped. To fix this problem remove the per-station work struct, maintain a station cleanup list instead and flush this list when stations are flushed. To keep this patch smaller for stable, do this when the stations are flushed (sta_info_flush()). This unfortunately brings back the original roaming delay; I'll fix that again in a separate patch. Also, Ben reported that the original commit could sometimes (with many interfaces) cause long delays when an interface is set down, due to blocking on flush_workqueue(). Since we now maintain the cleanup list, this particular change of the original patch can be reverted. Cc: stable@vger.kernel.org [3.7] Reported-by: Ben Greear <greearb@candelatech.com> Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03mac80211: RMC buckets are just list headsThomas Pedersen
The array of rmc_entrys is redundant since only the list_head is used. Make this an array of list_heads instead and save ~6k per vif at runtime :D Signed-off-by: Thomas Pedersen <thomas@cozybit.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03mac80211: assign VLAN channel contextsJohannes Berg
Make AP_VLAN type interfaces track the AP master channel context so they have one assigned for the various lookups. Don't give them their own refcount etc. since they're just slaves to the AP master. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03mac80211: flush AP_VLAN stations when tearing down the BSS APFelix Fietkau
Signed-off-by: Felix Fietkau <nbd@openwrt.org> [change to flush stations with AP flush in second loop] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03mac80211: fix ibss scanningStanislaw Gruszka
Do not scan on no-IBSS and disabled channels in IBSS mode. Doing this can trigger Microcode errors on iwlwifi and iwlegacy drivers. Also rename ieee80211_request_internal_scan() function since it is only used in IBSS mode and simplify calling it from ieee80211_sta_find_ibss(). This patch should address: https://bugzilla.redhat.com/show_bug.cgi?id=883414 https://bugzilla.kernel.org/show_bug.cgi?id=49411 Reported-by: Jesse Kahtava <jesse_kahtava@f-m.fm> Reported-by: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: stable@vger.kernel.org Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-01-03bridge: add empty br_mdb_init() and br_mdb_uninit() definitions.Rami Rosen
This patch adds empty br_mdb_init() and br_mdb_uninit() definitions in br_private.h to avoid build failure when CONFIG_BRIDGE_IGMP_SNOOPING is not set. These methods were moved from br_multicast.c to br_netlink.c by commit 3ec8e9f085bcaef0de1077f555c2c5102c223390 Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-03bridge: Correctly unregister MDB rtnetlink handlersVlad Yasevich
Commit 63233159fd4e596568f5f168ecb0879b61631d47: bridge: Do not unregister all PF_BRIDGE rtnl operations introduced a bug where a removal of a single bridge from a multi-bridge system would remove MDB netlink handlers. The handlers should only be removed once all bridges are gone, but since we don't keep track of the number of bridge interfaces, it's simpler to do it when the bridge module is unloaded. To make it consistent, move the registration code into module initialization code path. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-02Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "Two of Alex's patches deal with a race when reseting server connections for open RBD images, one demotes some non-fatal BUGs to WARNs, and my patch fixes a protocol feature bit failure path." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: fix protocol feature mismatch failure path libceph: WARN, don't BUG on unexpected connection states libceph: always reset osds when kicking libceph: move linger requests sooner in kick_requests()
2012-12-30bridge: respect RFC2863 operational statestephen hemminger
The bridge link detection should follow the operational state of the lower device, rather than the carrier bit. This allows devices like tunnels that are controlled by userspace control plane to work with bridge STP link management. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Reviewed-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-30net: filter: return -EINVAL if BPF_S_ANC* operation is not supportedDaniel Borkmann
Currently, we return -EINVAL for malformed or wrong BPF filters. However, this is not done for BPF_S_ANC* operations, which makes it more difficult to detect if it's actually supported or not by the BPF machine. Therefore, we should also return -EINVAL if K is within the SKF_AD_OFF universe and the ancillary operation did not match. Why exactly is it needed? If tools such as libpcap/tcpdump want to make use of new ancillary operations (like filtering VLAN in kernel space), there is currently no sane way to test if this feature / BPF_S_ANC* op is present or not, since no error is returned. This patch will make life easier for that and allow for a proper usage for user space applications. There was concern, if this patch will break userland. Short answer: Yes and no. Long answer: It will "break" only for code that calls ... { BPF_LD | BPF_(W|H|B) | BPF_ABS, 0, 0, <K> }, ... where <K> is in [0xfffff000, 0xffffffff] _and_ <K> is *not* an ancillary. And here comes the BUT: assuming some *old* code will have such an instruction where <K> is between [0xfffff000, 0xffffffff] and it doesn't know ancillary operations, then this will give a non-expected / unwanted behavior as well (since we do not return the BPF machine with 0 after a failed load_pointer(), which was the case before introducing ancillary operations, but load sth. into the accumulator instead, and continue with the next instruction, for instance). Thus, user space code would already have been broken by introducing ancillary operations into the BPF machine per se. Code that does such a direct load, e.g. "load word at packet offset 0xffffffff into accumulator" ("ld [0xffffffff]") is quite broken, isn't it? The whole assumption of ancillary operations is that no-one intentionally calls things like "ld [0xffffffff]" and expect this word to be loaded from such a packet offset. Hence, we can also safely make use of this feature testing patch and facilitate application development. Therefore, at least from this patch onwards, we have *for sure* a check whether current or in future implemented BPF_S_ANC* ops are supported in the kernel. Patch was tested on x86_64. (Thanks to Eric for the previous review.) Cc: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Ani Sinha <ani@aristanetworks.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-28skbuff: make __kmalloc_reserve staticstephen hemminger
Sparse detected case where this local function should be static. It may even allow some compiler optimizations. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-28tcp: make proc_tcp_fastopen_key staticstephen hemminger
Detected by sparse. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-28sctp: make sctp_addr_wq_timeout_handler staticstephen hemminger
Fix sparse warning about local function that should be static. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-28net: use per task frag allocator in skb_append_datato_fragsEric Dumazet
Use the new per task frag allocator in skb_append_datato_frags(), to reduce number of frags and page allocator overhead. Tested: ifconfig lo mtu 16436 perf record netperf -t UDP_STREAM ; perf report before : Throughput: 32928 Mbit/s 51.79% netperf [kernel.kallsyms] [k] copy_user_generic_string 5.98% netperf [kernel.kallsyms] [k] __alloc_pages_nodemask 5.58% netperf [kernel.kallsyms] [k] get_page_from_freelist 5.01% netperf [kernel.kallsyms] [k] __rmqueue 3.74% netperf [kernel.kallsyms] [k] skb_append_datato_frags 1.87% netperf [kernel.kallsyms] [k] prep_new_page 1.42% netperf [kernel.kallsyms] [k] next_zones_zonelist 1.28% netperf [kernel.kallsyms] [k] __inc_zone_state 1.26% netperf [kernel.kallsyms] [k] alloc_pages_current 0.78% netperf [kernel.kallsyms] [k] sock_alloc_send_pskb 0.74% netperf [kernel.kallsyms] [k] udp_sendmsg 0.72% netperf [kernel.kallsyms] [k] zone_watermark_ok 0.68% netperf [kernel.kallsyms] [k] __cpuset_node_allowed_softwall 0.67% netperf [kernel.kallsyms] [k] fib_table_lookup 0.60% netperf [kernel.kallsyms] [k] memcpy_fromiovecend 0.55% netperf [kernel.kallsyms] [k] __udp4_lib_lookup after: Throughput: 47185 Mbit/s 61.74% netperf [kernel.kallsyms] [k] copy_user_generic_string 2.07% netperf [kernel.kallsyms] [k] prep_new_page 1.98% netperf [kernel.kallsyms] [k] skb_append_datato_frags 1.02% netperf [kernel.kallsyms] [k] sock_alloc_send_pskb 0.97% netperf [kernel.kallsyms] [k] enqueue_task_fair 0.97% netperf [kernel.kallsyms] [k] udp_sendmsg 0.91% netperf [kernel.kallsyms] [k] __ip_route_output_key 0.88% netperf [kernel.kallsyms] [k] __netif_receive_skb 0.87% netperf [kernel.kallsyms] [k] fib_table_lookup 0.85% netperf [kernel.kallsyms] [k] resched_task 0.78% netperf [kernel.kallsyms] [k] __udp4_lib_lookup 0.77% netperf [kernel.kallsyms] [k] _raw_spin_lock_irqsave Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-28rtnl: expose carrier value with possibility to set itJiri Pirko
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-28net: allow to change carrier via sysfsJiri Pirko
Make carrier writable Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-28net: add change_carrier netdev opJiri Pirko
This allows a driver to register change_carrier callback which will be called whenever user will like to change carrier state. This is useful for devices like dummy, gre, team and so on. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-28Merge branch 'master' of git://1984.lsi.us.es/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== The following batch contains Netfilter fixes for 3.8-rc1. They are a mixture of old bugs that have passed unnoticed (I'll pass these to stable) and more fresh ones from the previous merge window, they are: * Fix for MAC address in 6in4 tunnels via NFLOG that results in ulogd showing up wrong address, from Bob Hockney. * Fix a comment in nf_conntrack_ipv6, from Florent Fourcot. * Fix a leak an error path in ctnetlink while creating an expectation, from Jesper Juhl. * Fix missing ICMP time exceeded in the IPv6 defragmentation code, from Haibo Xi. * Fix inconsistent handling of routing changes in MASQUERADE for the new connections case, from Andrew Collins. * Fix a missing skb_reset_transport in ip[6]t_REJECT that leads to crashes in the ixgbe driver (since it seems to access the transport header with TSO enabled), from Mukund Jampala. * Recover obsoleted NOTRACK target by including it into the CT and spot a warning via printk about being obsoleted. Many people don't check the scheduled to be removal file under Documentation, so we follow some less agressive approach to kill this in a year or so. Spotted by Florian Westphal, patch from myself. * Fix race condition in xt_hashlimit that allows to create two or more entries, from myself. * Fix crash if the CT is used due to the recently added facilities to consult the dying and unconfirmed conntrack lists, from myself. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-28rfkill: don't use [delayed_]work_pending()Tejun Heo
There's no need to test whether a (delayed) work item in pending before queueing, flushing or cancelling it. Most uses are unnecessary and quite a few of them are buggy. Remove unnecessary pending tests from rfkill. Only compile tested. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Johannes Berg <johannes@sipsolutions.net> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: linux-wireless@vger.kernel.org
2012-12-27libceph: fix protocol feature mismatch failure pathSage Weil
We should not set con->state to CLOSED here; that happens in ceph_fault() in the caller, where it first asserts that the state is not yet CLOSED. Avoids a BUG when the features don't match. Since the fail_protocol() has become a trivial wrapper, replace calls to it with direct calls to reset_connection(). Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
2012-12-27libceph: WARN, don't BUG on unexpected connection statesAlex Elder
A number of assertions in the ceph messenger are implemented with BUG_ON(), killing the system if connection's state doesn't match what's expected. At this point our state model is (evidently) not well understood enough for these assertions to trigger a BUG(). Convert all BUG_ON(con->state...) calls to be WARN_ON(con->state...) so we learn about these issues without killing the machine. We now recognize that a connection fault can occur due to a socket closure at any time, regardless of the state of the connection. So there is really nothing we can assert about the state of the connection at that point so eliminate that assertion. Reported-by: Ugis <ugis22@gmail.com> Tested-by: Ugis <ugis22@gmail.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2012-12-27libceph: always reset osds when kickingAlex Elder
When ceph_osdc_handle_map() is called to process a new osd map, kick_requests() is called to ensure all affected requests are updated if necessary to reflect changes in the osd map. This happens in two cases: whenever an incremental map update is processed; and when a full map update (or the last one if there is more than one) gets processed. In the former case, the kick_requests() call is followed immediately by a call to reset_changed_osds() to ensure any connections to osds affected by the map change are reset. But for full map updates this isn't done. Both cases should be doing this osd reset. Rather than duplicating the reset_changed_osds() call, move it into the end of kick_requests(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2012-12-27libceph: move linger requests sooner in kick_requests()Alex Elder
The kick_requests() function is called by ceph_osdc_handle_map() when an osd map change has been indicated. Its purpose is to re-queue any request whose target osd is different from what it was when it was originally sent. It is structured as two loops, one for incomplete but registered requests, and a second for handling completed linger requests. As a special case, in the first loop if a request marked to linger has not yet completed, it is moved from the request list to the linger list. This is as a quick and dirty way to have the second loop handle sending the request along with all the other linger requests. Because of the way it's done now, however, this quick and dirty solution can result in these incomplete linger requests never getting re-sent as desired. The problem lies in the fact that the second loop only arranges for a linger request to be sent if it appears its target osd has changed. This is the proper handling for *completed* linger requests (it avoids issuing the same linger request twice to the same osd). But although the linger requests added to the list in the first loop may have been sent, they have not yet completed, so they need to be re-sent regardless of whether their target osd has changed. The first required fix is we need to avoid calling __map_request() on any incomplete linger request. Otherwise the subsequent __map_request() call in the second loop will find the target osd has not changed and will therefore not re-send the request. Second, we need to be sure that a sent but incomplete linger request gets re-sent. If the target osd is the same with the new osd map as it was when the request was originally sent, this won't happen. This can be fixed through careful handling when we move these requests from the request list to the linger list, by unregistering the request *before* it is registered as a linger request. This works because a side-effect of unregistering the request is to make the request's r_osd pointer be NULL, and *that* will ensure the second loop actually re-sends the linger request. Processing of such a request is done at that point, so continue with the next one once it's been moved. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2012-12-26ipv6/ip6_gre: set transport header correctlyIsaku Yamahata
ip6gre_xmit2() incorrectly sets transport header to inner payload instead of GRE header. It seems copy-and-pasted from ipip.c. Set transport header to gre header. (In ipip case the transport header is the inner ip header, so that's correct.) Found by inspection. In practice the incorrect transport header doesn't matter because the skb usually is sent to another net_device or socket, so the transport header isn't referenced. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26ipv4/ip_gre: set transport header correctly to gre headerIsaku Yamahata
ipgre_tunnel_xmit() incorrectly sets transport header to inner payload instead of GRE header. It seems copy-and-pasted from ipip.c. So set transport header to gre header. (In ipip case the transport header is the inner ip header, so that's correct.) Found by inspection. In practice the incorrect transport header doesn't matter because the skb usually is sent to another net_device or socket, so the transport header isn't referenced. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26IB/rds: suppress incompatible protocol when version is knownMarciniszyn, Mike
Add an else to only print the incompatible protocol message when version hasn't been established. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26IB/rds: Correct ib_api use with gs_dma_address/sg_dma_lenMarciniszyn, Mike
0b088e00 ("RDS: Use page_remainder_alloc() for recv bufs") added uses of sg_dma_len() and sg_dma_address(). This makes RDS DOA with the qib driver. IB ulps should use ib_sg_dma_len() and ib_sg_dma_address respectively since some HCAs overload ib_sg_dma* operations. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26tcp: should drop incoming frames without ACK flag setEric Dumazet
In commit 96e0bf4b5193d (tcp: Discard segments that ack data not yet sent) John Dykstra enforced a check against ack sequences. In commit 354e4aa391ed5 (tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation) I added more safety tests. But we missed fact that these tests are not performed if ACK bit is not set. RFC 793 3.9 mandates TCP should drop a frame without ACK flag set. " fifth check the ACK field, if the ACK bit is off drop the segment and return" Not doing so permits an attacker to only guess an acceptable sequence number, evading stronger checks. Many thanks to Zhiyun Qian for bringing this issue to our attention. See : http://web.eecs.umich.edu/~zhiyunq/pub/ccs12_TCP_sequence_number_inference.pdf Reported-by: Zhiyun Qian <zhiyunq@umich.edu> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: John Dykstra <john.dykstra1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26batman-adv: fix random jitter calculationAkinobu Mita
batadv_iv_ogm_emit_send_time() attempts to calculates a random integer in the range of 'orig_interval +- BATADV_JITTER' by the below lines. msecs = atomic_read(&bat_priv->orig_interval) - BATADV_JITTER; msecs += (random32() % 2 * BATADV_JITTER); But it actually gets 'orig_interval' or 'orig_interval - BATADV_JITTER' because '%' and '*' have same precedence and associativity is left-to-right. This adds the parentheses at the appropriate position so that it matches original intension. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Antonio Quartulli <ordex@autistici.org> Cc: Marek Lindner <lindner_marek@yahoo.de> Cc: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Cc: Antonio Quartulli <ordex@autistici.org> Cc: b.a.t.m.a.n@lists.open-mesh.org Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26netfilter: ctnetlink: fix leak in error path of ctnetlink_create_expectJesper Juhl
This patch fixes a leak in one of the error paths of ctnetlink_create_expect if no helper and no timeout is specified. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-26netfilter: xt_hashlimit: fix namespace destroy pathVitaly E. Lavrov
recent_net_exit() is called before recent_mt_destroy() in the destroy path of network namespaces. Make sure there are no entries in the parent proc entry xt_recent before removing it. Signed-off-by: Vitaly E. Lavrov <lve@guap.ru> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-26netfilter: xt_recent: fix namespace destroy pathVitaly E. Lavrov
recent_net_exit() is called before recent_mt_destroy() in the destroy path of network namespaces. Make sure there are no entries in the parent proc entry xt_recent before removing it. Signed-off-by: Vitaly E. Lavrov <lve@guap.ru> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-26netfilter: xt_hashlimit: fix race that results in duplicated entriesPablo Neira Ayuso
Two packets may race to create the same entry in the hashtable, double check if this packet lost race. This double checking only happens in the path of the packet that creates the hashtable for first time. Note that, with this patch, no packet drops occur if the race happens. Reported-by: Feng Gao <gfree.wind@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-24arp: fix a regression in arp_solicit()Cong Wang
Sedat reported the following commit caused a regression: commit 9650388b5c56578fdccc79c57a8c82fb92b8e7f1 Author: Eric Dumazet <edumazet@google.com> Date: Fri Dec 21 07:32:10 2012 +0000 ipv4: arp: fix a lockdep splat in arp_solicit This is due to the 6th parameter of arp_send() needs to be NULL for the broadcast case, the above commit changed it to an all-zero array by mistake. Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-24netfilter: xt_CT: recover NOTRACK target supportPablo Neira Ayuso
Florian Westphal reported that the removal of the NOTRACK target (9655050 netfilter: remove xt_NOTRACK) is breaking some existing setups. That removal was scheduled for removal since long time ago as described in Documentation/feature-removal-schedule.txt What: xt_NOTRACK Files: net/netfilter/xt_NOTRACK.c When: April 2011 Why: Superseded by xt_CT Still, people may have not notice / may have decided to stick to an old iptables version. I agree with him in that some more conservative approach by spotting some printk to warn users for some time is less agressive. Current iptables 1.4.16.3 already contains the aliasing support that makes it point to the CT target, so upgrading would fix it. Still, the policy so far has been to avoid pushing our users to upgrade. As a solution, this patch recovers the NOTRACK target inside the CT target and it now spots a warning. Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-22net: sched: integer overflow fixStefan Hasko
Fixed integer overflow in function htb_dequeue Signed-off-by: Stefan Hasko <hasko.stevo@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-22CONFIG_HOTPLUG removal from networking coreGreg KH
CONFIG_HOTPLUG is always enabled now, so remove the unused code that was trying to be compiled out when this option was disabled, in the networking core. Cc: Bill Pemberton <wfp5p@virginia.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-21bridge: call br_netpoll_disable in br_add_ifGao feng
When netdev_set_master faild in br_add_if, we should call br_netpoll_disable to do some cleanup jobs,such as free the memory of struct netpoll which allocated in br_netpoll_enable. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Acked-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-21ipv4: arp: fix a lockdep splat in arp_solicit()Eric Dumazet
Yan Burman reported following lockdep warning : ============================================= [ INFO: possible recursive locking detected ] 3.7.0+ #24 Not tainted --------------------------------------------- swapper/1/0 is trying to acquire lock: (&n->lock){++--..}, at: [<ffffffff8139f56e>] __neigh_event_send +0x2e/0x2f0 but task is already holding lock: (&n->lock){++--..}, at: [<ffffffff813f63f4>] arp_solicit+0x1d4/0x280 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&n->lock); lock(&n->lock); *** DEADLOCK *** May be due to missing lock nesting notation 4 locks held by swapper/1/0: #0: (((&n->timer))){+.-...}, at: [<ffffffff8104b350>] call_timer_fn+0x0/0x1c0 #1: (&n->lock){++--..}, at: [<ffffffff813f63f4>] arp_solicit +0x1d4/0x280 #2: (rcu_read_lock_bh){.+....}, at: [<ffffffff81395400>] dev_queue_xmit+0x0/0x5d0 #3: (rcu_read_lock_bh){.+....}, at: [<ffffffff813cb41e>] ip_finish_output+0x13e/0x640 stack backtrace: Pid: 0, comm: swapper/1 Not tainted 3.7.0+ #24 Call Trace: <IRQ> [<ffffffff8108c7ac>] validate_chain+0xdcc/0x11f0 [<ffffffff8108d570>] ? __lock_acquire+0x440/0xc30 [<ffffffff81120565>] ? kmem_cache_free+0xe5/0x1c0 [<ffffffff8108d570>] __lock_acquire+0x440/0xc30 [<ffffffff813c3570>] ? inet_getpeer+0x40/0x600 [<ffffffff8108d570>] ? __lock_acquire+0x440/0xc30 [<ffffffff8139f56e>] ? __neigh_event_send+0x2e/0x2f0 [<ffffffff8108ddf5>] lock_acquire+0x95/0x140 [<ffffffff8139f56e>] ? __neigh_event_send+0x2e/0x2f0 [<ffffffff8108d570>] ? __lock_acquire+0x440/0xc30 [<ffffffff81448d4b>] _raw_write_lock_bh+0x3b/0x50 [<ffffffff8139f56e>] ? __neigh_event_send+0x2e/0x2f0 [<ffffffff8139f56e>] __neigh_event_send+0x2e/0x2f0 [<ffffffff8139f99b>] neigh_resolve_output+0x16b/0x270 [<ffffffff813cb62d>] ip_finish_output+0x34d/0x640 [<ffffffff813cb41e>] ? ip_finish_output+0x13e/0x640 [<ffffffffa046f146>] ? vxlan_xmit+0x556/0xbec [vxlan] [<ffffffff813cb9a0>] ip_output+0x80/0xf0 [<ffffffff813ca368>] ip_local_out+0x28/0x80 [<ffffffffa046f25a>] vxlan_xmit+0x66a/0xbec [vxlan] [<ffffffffa046f146>] ? vxlan_xmit+0x556/0xbec [vxlan] [<ffffffff81394a50>] ? skb_gso_segment+0x2b0/0x2b0 [<ffffffff81449355>] ? _raw_spin_unlock_irqrestore+0x65/0x80 [<ffffffff81394c57>] ? dev_queue_xmit_nit+0x207/0x270 [<ffffffff813950c8>] dev_hard_start_xmit+0x298/0x5d0 [<ffffffff813956f3>] dev_queue_xmit+0x2f3/0x5d0 [<ffffffff81395400>] ? dev_hard_start_xmit+0x5d0/0x5d0 [<ffffffff813f5788>] arp_xmit+0x58/0x60 [<ffffffff813f59db>] arp_send+0x3b/0x40 [<ffffffff813f6424>] arp_solicit+0x204/0x280 [<ffffffff813a1a70>] ? neigh_add+0x310/0x310 [<ffffffff8139f515>] neigh_probe+0x45/0x70 [<ffffffff813a1c10>] neigh_timer_handler+0x1a0/0x2a0 [<ffffffff8104b3cf>] call_timer_fn+0x7f/0x1c0 [<ffffffff8104b350>] ? detach_if_pending+0x120/0x120 [<ffffffff8104b748>] run_timer_softirq+0x238/0x2b0 [<ffffffff813a1a70>] ? neigh_add+0x310/0x310 [<ffffffff81043e51>] __do_softirq+0x101/0x280 [<ffffffff814518cc>] call_softirq+0x1c/0x30 [<ffffffff81003b65>] do_softirq+0x85/0xc0 [<ffffffff81043a7e>] irq_exit+0x9e/0xc0 [<ffffffff810264f8>] smp_apic_timer_interrupt+0x68/0xa0 [<ffffffff8145122f>] apic_timer_interrupt+0x6f/0x80 <EOI> [<ffffffff8100a054>] ? mwait_idle+0xa4/0x1c0 [<ffffffff8100a04b>] ? mwait_idle+0x9b/0x1c0 [<ffffffff8100a6a9>] cpu_idle+0x89/0xe0 [<ffffffff81441127>] start_secondary+0x1b2/0x1b6 Bug is from arp_solicit(), releasing the neigh lock after arp_send() In case of vxlan, we eventually need to write lock a neigh lock later. Its a false positive, but we can get rid of it without lockdep annotations. We can instead use neigh_ha_snapshot() helper. Reported-by: Yan Burman <yanb@mellanox.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-21net: devnet_rename_seq should be a seqcountEric Dumazet
Using a seqlock for devnet_rename_seq is not a good idea, as device_rename() can sleep. As we hold RTNL, we dont need a protection for writers, and only need a seqcount so that readers can catch a change done by a writer. Bug added in commit c91f6df2db4972d3 (sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name) Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-21ip_gre: fix possible use after freeEric Dumazet
Once skb_realloc_headroom() is called, tiph might point to freed memory. Cache tiph->ttl value before the reallocation, to avoid unexpected behavior. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-21ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionallyIsaku Yamahata
ipgre_tunnel_xmit() parses network header as IP unconditionally. But transmitting packets are not always IP packet. For example such packet can be sent by packet socket with sockaddr_ll.sll_protocol set. So make the function check if skb->protocol is IP. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>