Age | Commit message (Collapse) | Author |
|
The reg_mutex is similar to the ones I just removed in
cfg80211 but even less useful since it protects global
data, and we hold the RTNL in all places (except module
unload) already.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Since it just does a quick check of the last regulatory
request, the function doesn't have to hold the reg mutex
but can use RCU instead.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
If the driver for some reason successfully finishes
scanning while in p2p_stop_device(), cfg80211 will
still set it to aborted. Simplify this code using the
new 'notified' value and only mark it aborted in case
the driver didn't notify cfg80211 at all (in which
case we also leak the request to not crash, this is
a driver bug.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Using separate locks in cfg80211 and mac80211 has always
caused issues, for example having to unlock in places in
mac80211 to call cfg80211, which even needed a framework
to make cfg80211 calls after some functions returned etc.
Additionally, I suspect some issues people have reported
with the cfg80211 state getting confused could be due to
such issues, when cfg80211 is asking mac80211 to change
state but mac80211 is in the process of telling cfg80211
that the state changed (in another way.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Virtually all code paths in cfg80211 already (need to) hold
the RTNL. As such, there's little point in having another
four mutexes for various parts of the code, they just cause
lock ordering issues (and much of the time, the RTNL and a
few of the others need thus be held.)
Simplify all this by getting rid of the extra four mutexes
and just use the RTNL throughout. Only a few code changes
were needed to do this and we can get rid of a work struct
for bonus points.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
There's no need to lock, we can just use an atomic_t.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The function is only used and needed by the wext code
for scanning, so move it there.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
|
|
Move mandatory rates calculation to cfg80211, shared with non mac80211 drivers.
Signed-off-by: Ashok Nagarajan <ashok@cozybit.com>
[extend documentation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
As estab_plinks is not a statistics member, don't show its debug information
along with other mesh stat members
Signed-off-by: Ashok Nagarajan <ashok@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
VHT uses peer AID in the PARTIAL_AID field in TDLS frames. The current
design for TDLS is to first add a dummy STA entry before completing TDLS
Setup and then update information on this STA entry based on what was
received from the peer during the setup exchange.
In theory, this could use NL80211_ATTR_STA_AID to set the peer AID just
like this is used in AP mode to set the AID of an association station.
However, existing cfg80211 validation rules prevent this attribute from
being used with set_station operation. To avoid interoperability issues
between different kernel and user space version combinations, introduce
a new nl80211 attribute for the purpose of setting TDLS peer AID. This
attribute can be used in both the new_station and set_station
operations. It is not supposed to be allowed to change the AID value
during the lifetime of the STA entry, but that validation is left for
drivers to do in the change_station callback.
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Pull networking fixes from David Miller:
"It's been a while since my last pull request so quite a few fixes have
piled up."
Indeed.
1) Fix nf_{log,queue} compilation with PROC_FS disabled, from Pablo
Neira Ayuso.
2) Fix data corruption on some tg3 chips with TSO enabled, from Michael
Chan.
3) Fix double insertion of VLAN tags in be2net driver, from Sarveshwar
Bandi.
4) Don't have TCP's MD5 support pass > PAGE_SIZE page offsets in
scatter-gather entries into the crypto layer, the crypto layer can't
handle that. From Eric Dumazet.
5) Fix lockdep splat in 802.1Q MRP code, also from Eric Dumazet.
6) Fix OOPS in netfilter log module when called from conntrack, from
Hans Schillstrom.
7) FEC driver needs to use netif_tx_{lock,unlock}_bh() rather than the
non-BH disabling variants. From Fabio Estevam.
8) TCP GSO can generate out-of-order packets, fix from Eric Dumazet.
9) vxlan driver doesn't update 'used' field of fdb entries when it
should, from Sridhar Samudrala.
10) ipv6 should use kzalloc() to allocate inet6 socket cork options,
otherwise we can OOPS in ip6_cork_release(). From Eric Dumazet.
11) Fix races in bonding set mode, from Nikolay Aleksandrov.
12) Fix checksum generation regression added by "r8169: fix 8168evl
frame padding.", from Francois Romieu.
13) ip_gre can look at stale SKB data pointer, fix from Eric Dumazet.
14) Fix checksum handling when GSO is enabled in bnx2x driver with
certain chips, from Yuval Mintz.
15) Fix double free in batman-adv, from Martin Hundebøll.
16) Fix device startup synchronization with firmware in tg3 driver, from
Nithin Sujit.
17) perf networking dropmonitor doesn't work at all due to mixed up
trace parameter ordering, from Ben Hutchings.
18) Fix proportional rate reduction handling in tcp_ack(), from Nandita
Dukkipati.
19) IPSEC layer doesn't return an error when a valid state is detected,
causing an OOPS. Fix from Timo Teräs.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (85 commits)
be2net: bug fix on returning an invalid nic descriptor
tcp: xps: fix reordering issues
net: Revert unused variable changes.
xfrm: properly handle invalid states as an error
virtio_net: enable napi for all possible queues during open
tcp: bug fix in proportional rate reduction.
net: ethernet: sun: drop unused variable
net: ethernet: korina: drop unused variable
net: ethernet: apple: drop unused variable
qmi_wwan: Added support for Cinterion's PLxx WWAN Interface
perf: net_dropmonitor: Remove progress indicator
perf: net_dropmonitor: Use bisection in symbol lookup
perf: net_dropmonitor: Do not assume ordering of dictionaries
perf: net_dropmonitor: Fix symbol-relative addresses
perf: net_dropmonitor: Fix trace parameter order
net: fec: use a more proper compatible string for MVF type device
qlcnic: Fix updating netdev->features
qlcnic: remove netdev->trans_start updates within the driver
qlcnic: Return proper error codes from probe failure paths
tg3: Update version to 3.132
...
|
|
Some chips can tell us if received frame was
encoded with STBC or not. To make this information available
in user space we can use updated radiotap specification:
http://www.radiotap.org/defined-fields/MCS
This patch will set number of STBC encoded spatial streams (Nss).
The HAVE_STBC flag should be provided by driver.
Signed-off-by: Oleksij Rempel <linux@rempel-privat.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
When vlan device is configured on top of the brige, it does
not support any offload capabilities because the bridge
device does not initiliaze vlan_fatures. Set vlan_fatures to
be equivalent to hw_fatures.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
commit 3853b5841c01a ("xps: Improvements in TX queue selection")
introduced ooo_okay flag, but the condition to set it is slightly wrong.
In our traces, we have seen ACK packets being received out of order,
and RST packets sent in response.
We should test if we have any packets still in host queue.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since Eric's commit efe117ab8 ("Speedup ieee80211_remove_interfaces")
there's a bug in mac80211 when it unregisters with AP_VLAN interfaces
up. If the AP_VLAN interface was registered after the AP it belongs
to (which is the typical case) and then we get into this code path,
unregister_netdevice_many() will crash because it isn't prepared to
deal with interfaces being closed in the middle of it. Exactly this
happens though, because we iterate the list, find the AP master this
AP_VLAN belongs to and dev_close() the dependent VLANs. After this,
unregister_netdevice_many() won't pick up the fact that the AP_VLAN
is already down and will do it again, causing a crash.
Cc: stable@vger.kernel.org [2.6.33+]
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
A lot of code in mac80211 assumes that the hw queues are
set up correctly for all interfaces (except for monitor)
but this isn't true for AP_VLAN interfaces. Fix this by
copying the AP master configuration when an AP VLAN is
brought up, after this the AP interface can't change its
configuration any more and needs to be brought down to
change it, which also forces AP_VLAN interfaces down, so
just copying in open() is sufficient.
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Copy & paste mistake - STATION_INFO_TX_BYTES64 is the name of the flag,
not NL80211_STA_INFO_TX_BYTES64.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The code I added in "mac80211: don't start new netdev queues
if driver stopped" crashes for monitor and AP VLAN interfaces
because while they have a netdev, they don't have queues set
up by the driver.
To fix the crash, exclude these from queue accounting here
and just start their netdev queues unconditionally.
For monitor, this is the best we can do, as we can redirect
frames there to any other interface and don't know which one
that will since it can be different for each frame.
For AP VLAN interfaces, we can do better later and actually
properly track the queue status. Not doing this is really a
separate bug though.
Reported-by: Ilan Peer <ilan.peer@intel.com>
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
If a P2P-Device is present and another virtual interface triggers
the connection work, the system crash because it tries to check
if the P2P-Device's netdev (which doesn't exist) is up. Skip any
wdevs that have no netdev to fix this.
Cc: stable@vger.kernel.org
Reported-by: YanBo <dreamfly281@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
If nf_log uses ipt_ULOG as logging output, we can deliver non-null
terminated strings to user-space since the maximum length of the
prefix that is passed by nf_log is NF_LOG_PREFIXLEN but pm->prefix
is 32 bytes long (ULOG_PREFIX_LEN).
This is actually happening already from nf_conntrack_tcp if ipt_ULOG
is used, since it is passing strings longer than 32 bytes.
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This avoids the situation where walking of a large number of connections
may prevent scheduling for a long time while also avoiding excessive
calls to rcu_read_unlock() and rcu_read_lock().
Note that in the case of !CONFIG_PREEMPT_RCU this will
add a call to cond_resched().
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This target has been superseded by NFLOG. Spot a warning
so we prepare removal in a couple of years.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
|
|
Don't panic if we hit an error while adding the nf_log or pernet
netfilter support, just bail out.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
|
|
Quoting https://bugzilla.netfilter.org/show_bug.cgi?id=812:
[ ip6tables -m addrtype ]
When I tried to use in the nat/PREROUTING it messes up the
routing cache even if the rule didn't matched at all.
[..]
If I remove the --limit-iface-in from the non-working scenario, so just
use the -m addrtype --dst-type LOCAL it works!
This happens when LOCAL type matching is requested with --limit-iface-in,
and the default ipv6 route is via the interface the packet we test
arrived on.
Because xt_addrtype uses ip6_route_output, the ipv6 routing implementation
creates an unwanted cached entry, and the packet won't make it to the
real/expected destination.
Silently ignoring --limit-iface-in makes the routing work but it breaks
rule matching (--dst-type LOCAL with limit-iface-in is supposed to only
match if the dst address is configured on the incoming interface;
without --limit-iface-in it will match if the address is reachable
via lo).
The test should call ipv6_chk_addr() instead. However, this would add
a link-time dependency on ipv6.
There are two possible solutions:
1) Revert the commit that moved ipt_addrtype to xt_addrtype,
and put ipv6 specific code into ip6t_addrtype.
2) add new "nf_ipv6_ops" struct to register pointers to ipv6 functions.
While the former might seem preferable, Pablo pointed out that there
are more xt modules with link-time dependeny issues regarding ipv6,
so lets go for 2).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
'name' has already set all zero when it is defined, so not need let
strncpy() to pad it again.
'name' is a string, better always let is NUL terminated, so use
strlcpy() instead of strncpy().
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Acked-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
With IP early demux added in linux-3.6, we perform TCP lookup in IP
layer before iptables hooks.
We can avoid doing a second lookup in xt_socket.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The percpu untracked ct are not currently used for XT_CT_NOTRACK.
xt_ct_tg_check()/xt_ct_target() provides a single ct.
Thats not optimal as the ct->ct_general.use cache line will bounce among
cpus.
Use the intended [1] thing : xt_ct_target() should select the percpu
object.
[1] Refs :
commit 5bfddbd46a95c97 ("netfilter: nf_conntrack: IPS_UNTRACKED bit")
commit b3c5163fe0193a7 ("netfilter: nf_conntrack: per_cpu untracking")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The error exit path needs err explicitly set. Otherwise it
returns success and the only caller, xfrm_output_resume(),
would oops in skb_dst(skb)->ops derefence as skb_dst(skb) is
NULL.
Bug introduced in commit bb65a9cb (xfrm: removes a superfluous
check and add a statistic).
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Cc: Li RongQing <roy.qing.li@gmail.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ipv6_addr_type(&addr)&IPV6_ADDR_SCOPE_MASK could be replaced
by ipv6_addr_scope(), which is slightly faster.
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ipv6_addr_any() is a faster way to determine if an addr
is ipv6 any addr, no need to compute the addr type.
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch is a fix for a bug triggering newly_acked_sacked < 0
in tcp_ack(.).
The bug is triggered by sacked_out decreasing relative to prior_sacked,
but packets_out remaining the same as pior_packets. This is because the
snapshot of prior_packets is taken after tcp_sacktag_write_queue() while
prior_sacked is captured before tcp_sacktag_write_queue(). The problem
is: tcp_sacktag_write_queue (tcp_match_skb_to_sack() -> tcp_fragment)
adjusts the pcount for packets_out and sacked_out (MSS change or other
reason). As a result, this delta in pcount is reflected in
(prior_sacked - sacked_out) but not in (prior_packets - packets_out).
This patch does the following:
1) initializes prior_packets at the start of tcp_ack() so as to
capture the delta in packets_out created by tcp_fragment.
2) introduces a new "previous_packets_out" variable that snapshots
packets_out right before tcp_clean_rtx_queue, so pkts_acked can be
correctly computed as before.
3) Computes pkts_acked using previous_packets_out, and computes
newly_acked_sacked using prior_packets.
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If a GSO packet has a length above tbf burst limit, the packet
is currently silently dropped.
Current way to handle this is to set the device in non GSO/TSO mode, or
setting high bursts, and its sub optimal.
We can actually segment too big GSO packets, and send individual
segments as tbf parameters allow, allowing for better interoperability.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is a generic solution to resolve a specific problem that I have observed.
If the encapsulation of an skb changes then ability to offload checksums
may also change. In particular it may be necessary to perform checksumming
in software.
An example of such a case is where a non-GRE packet is received but
is to be encapsulated and transmitted as GRE.
Another example relates to my proposed support for for packets
that are non-MPLS when received but MPLS when transmitted.
The cost of this change is that the value of the csum variable may be
checked when it previously was not. In the case where the csum variable is
true this is pure overhead. In the case where the csum variable is false it
leads to software checksumming, which I believe also leads to correct
checksums in transmitted packets for the cases described above.
Further analysis:
This patch relies on the return value of can_checksum_protocol()
being correct and in turn the return value of skb_network_protocol(),
used to provide the protocol parameter of can_checksum_protocol(),
being correct. It also relies on the features passed to skb_segment()
and in turn to can_checksum_protocol() being correct.
I believe that this problem has not been observed for VLANs because it
appears that almost all drivers, the exception being xgbe, set
vlan_features such that that the checksum offload support for VLAN packets
is greater than or equal to that of non-VLAN packets.
I wonder if the code in xgbe may be an oversight and the hardware does
support checksumming of VLAN packets. If so it may be worth updating the
vlan_features of the driver as this patch will force such checksums to be
performed in software rather than hardware.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Continue sending queries when leave is received if the user marks
it as a querier.
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Adam Baker <linux@baker-net.org.uk>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently we arm the expire timer when the mdb entry is added,
however, this causes problem when there is no querier sent
out after that.
So we should only arm the timer when a corresponding query is
received, as suggested by Herbert.
And he also mentioned "if there is no querier then group
subscriptions shouldn't expire. There has to be at least one querier
in the network for this thing to work. Otherwise it just degenerates
into a non-snooping switch, which is OK."
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Adam Baker <linux@baker-net.org.uk>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Quote from Adam:
"If it is believed that the use of 0.0.0.0
as the IP address is what is causing strange behaviour on other devices
then is there a good reason that a bridge rather than a router shouldn't
be the active querier? If not then using the bridge IP address and
having the querier enabled by default may be a reasonable solution
(provided that our querier obeys the election rules and shuts up if it
sees a query from a lower IP address that isn't 0.0.0.0). Just because a
device is the elected querier for IGMP doesn't appear to mean it is
required to perform any other routing functions."
And introduce a new troggle for it, as suggested by Herbert.
Suggested-by: Adam Baker <linux@baker-net.org.uk>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Adam Baker <linux@baker-net.org.uk>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The lockless RPC_IS_QUEUED() test in __rpc_execute means that we need to
be careful about ordering the calls to rpc_test_and_set_running(task) and
rpc_clear_queued(task). If we get the order wrong, then we may end up
testing the RPC_TASK_RUNNING flag after __rpc_execute() has looped
and changed the state of the rpc_task.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
|
|
On errors in batadv_mesh_init(), bat_counters will be freed in both
batadv_mesh_free() and batadv_softif_init_late(). This patch fixes this
by returning earlier from batadv_softif_init_late() in case of errors in
batadv_mesh_init() and by setting bat_counters to NULL after freeing.
Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
|
|
The cache_detail(*detail) in function cache_is_valid is not used any
more.
Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
The Kconfig symbol NFC_LLCP was removed in commit 30cc458765 ("NFC: Move
LLCP code to the NFC top level diirectory"). But the reference to its
macro in this Makefile was only commented out. Remove it now.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
|
|
TCP md5 code uses per cpu variables but protects access to them with
a shared spinlock, which is a contention point.
[ tcp_md5sig_pool_lock is locked twice per incoming packet ]
Makes things much simpler, by allocating crypto structures once, first
time a socket needs md5 keys, and not deallocating them as they are
really small.
Next step would be to allow crypto allocations being done in a NUMA
aware way.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A cpu executing the network receive path sheds packets when its input
queue grows to netdev_max_backlog. A single high rate flow (such as a
spoofed source DoS) can exceed a single cpu processing rate and will
degrade throughput of other flows hashed onto the same cpu.
This patch adds a more fine grained hashtable. If the netdev backlog
is above a threshold, IRQ cpus track the ratio of total traffic of
each flow (using 4096 buckets, configurable). The ratio is measured
by counting the number of packets per flow over the last 256 packets
from the source cpu. Any flow that occupies a large fraction of this
(set at 50%) will see packet drop while above the threshold.
Tested:
Setup is a muli-threaded UDP echo server with network rx IRQ on cpu0,
kernel receive (RPS) on cpu0 and application threads on cpus 2--7
each handling 20k req/s. Throughput halves when hit with a 400 kpps
antagonist storm. With this patch applied, antagonist overload is
dropped and the server processes its complete load.
The patch is effective when kernel receive processing is the
bottleneck. The above RPS scenario is a extreme, but the same is
reached with RFS and sufficient kernel processing (iptables, packet
socket tap, ..).
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
|
|
Another fix needed in ipgre_err(), as parse_gre_header() might change
skb->head.
Bug added in commit c54419321455 (GRE: Refactor GRE tunneling code.)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tcp_timeout_skb() was intended to trigger fast recovery on timeout,
unfortunately in reality it often causes spurious retransmission
storms during fast recovery. The particular sign is a fast retransmit
over the highest sacked sequence (SND.FACK).
Currently the RTO timer re-arming (as in RFC6298) offers a nice cushion
to avoid spurious timeout: when SND.UNA advances the sender re-arms
RTO and extends the timeout by icsk_rto. The sender does not offset
the time elapsed since the packet at SND.UNA was sent.
But if the next (DUP)ACK arrives later than ~RTTVAR and triggers
tcp_fastretrans_alert(), then tcp_timeout_skb() will mark any packet
sent before the icsk_rto interval lost, including one that's above the
highest sacked sequence. Most likely a large part of scorebard will be
marked.
If most packets are not lost then the subsequent DUPACKs with new SACK
blocks will cause the sender to continue to retransmit packets beyond
SND.FACK spuriously. Even if only one packet is lost the sender may
falsely retransmit almost the entire window.
The situation becomes common in the world of bufferbloat: the RTT
continues to grow as the queue builds up but RTTVAR remains small and
close to the minimum 200ms. If a data packet is lost and the DUPACK
triggered by the next data packet is slightly delayed, then a spurious
retransmission storm forms.
As the original comment on tcp_timeout_skb() suggests: the usefulness
of this feature is questionable. It also wastes cycles walking the
sack scoreboard and is actually harmful because of false recovery.
It's time to remove this.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ERROR: "memcpy_fromiovec" [drivers/vhost/vhost_scsi.ko] undefined!
That function is only present with CONFIG_NET. Turns out that
crypto/algif_skcipher.c also uses that outside net, but it actually
needs sockets anyway.
In addition, commit 6d4f0139d642c45411a47879325891ce2a7c164a added
CONFIG_NET dependency to CONFIG_VMCI for memcpy_toiovec, so hoist
that function and revert that commit too.
socket.h already includes uio.h, so no callers need updating; trying
only broke things fo x86_64 randconfig (thanks Fengguang!).
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
'discovery->data.info' length is 22, NICKNAME_MAX_LEN is 21, so the
strncpy() will always left the last byte of 'discovery->data.info'
uninitialized.
When 'text' length is longer than 21 (NICKNAME_MAX_LEN), if still left
the last byte of 'discovery->data.info' uninitialized, the next
strlen() will cause issue.
Also 'discovery->data' is 'struct irda_device_info' which defined in
"include/uapi/...", it may copy to user mode, so need whole initialized.
All together, need use kzalloc() instead of kmalloc() to initialize all
members firstly.
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds the support of peer address for IPv6. For example, it is
possible to specify the remote end of a 6inY tunnel.
This was already possible in IPv4:
ip addr add ip1 peer ip2 dev dev1
The peer address is specified with IFA_ADDRESS and the local address with
IFA_LOCAL (like explained in include/uapi/linux/if_addr.h).
Note that the API is not changed, because before this patch, it was not
possible to specify two different addresses in IFA_LOCAL and IFA_REMOTE.
There is a small change for the dump: if the peer is different from ::,
IFA_ADDRESS will contain the peer address instead of the local address.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The net/netlabel/netlabel_domainhash.c:netlbl_domhsh_add() function
does not properly validate new domain hash entries resulting in
potential problems when an administrator attempts to add an invalid
entry. One such problem, as reported by Vlad Halilov, is a kernel
BUG (found in netlabel_domainhash.c:netlbl_domhsh_audit_add()) when
adding an IPv6 outbound mapping with a CIPSO configuration.
This patch corrects this problem by adding the necessary validation
code to netlbl_domhsh_add() via the newly created
netlbl_domhsh_validate() function.
Ideally this patch should also be pushed to the currently active
-stable trees.
Reported-by: Vlad Halilov <vlad.halilov@gmail.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|