Age | Commit message (Collapse) | Author |
|
Currently the cfg80211's frame registration api receives wdev, however
mac80211 assumes per device filter configuration and ignores wdev.
Per device filtering is too wasteful, especially for multi-channel
devices.
Introduce new per vif frame registration API and use it for probe
request registrations in ieee80211_mgmt_frame_register()
Also call directly to ieee80211_configure_filter instead of using a work
since it is now allowed to sleep in ieee80211_mgmt_frame_register.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
In order to transfer many items in vendor commands, support the
dumpit netlink method for them.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
When checking if a TDLS chandef can be upgraded, IR-relaxation can be
taken into account to allow more channels.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Sometimes we are interested in testing TDLS performance in a specific
width setting. Add the ability to disable the wider-band feature, thereby
allowing the TDLS channel width to be controlled by the BSS width.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Queued frames aren't processed during scan, which results in an inability
to complete the BA session establishment until the scan ends. Since we
can't tx frames until the BA agreement setup is complete, it might result
in a very large latency during scan.
Fix this by allowing to process queued skbs while scanning in HW. This
should be ok since the devices which support hw scan should be able
to handle tx/rx while scanning.
During SW scan, mac80211 drops any txed frames besides probes and NDPs,
so it is still needed to delay processing of the queued frames till the
SW scan is done.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
As pointed out by sparse, this symbol should be static, make it so.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The HT MCS mask has 9 bytes, the VHT one only has 8 streams.
Split the loops to handle this correctly.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Sending over AF_PACKET RAW sockets we can sending frames which exceeds
MTU size. To handling it correct we need to change things in AF_PACKET
which knows on RAW sockets an additional FCS is set by hardware or
mac802154 transmit functionality.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
This patch cleanups needed_headroom, needed_tailroom and hard_header_len
fields for wpan and lowpan interfaces.
For wpan interfaces the worst case mac header len should be part of
needed_headroom, currently this is set as hard_header_len, but
hard_header_len should be set to the minimum header length which xmit
call assumes and this is the minimum frame length of 802.15.4.
The hard_header_len value will check inside send callbacl of AF_PACKET
raw sockets.
For lowpan interfaces, if fragmentation isn't needed the skb will
call dev_hard_header for 802154 layer and queue it afterwards. This
happens without new skb allocation, so we need the same headroom and
tailroom lengths like 802154 inside 802154 6lowpan layer. At least we
assume as minimum header length an ipv6 header size.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
The current header_ops callback structure of net device are used mostly
from 802.15.4 upper-layers. Because this callback structure is a very
generic one, which is also used by e.g. DGRAM AF_PACKET sockets, we
can't make this callback structure 802.15.4 specific which is currently
is.
I saw the smallest "constraint" for calling this callback with
dev_hard_header/dev_parse_header by AF_PACKET which assign a 8 byte
array for address void pointers. Currently 802.15.4 specific protocols
like af802154 and 6LoWPAN will assign the "struct ieee802154_addr" as
these parameters which is greater than 8 bytes. The current callback
implementation for header_ops.create assumes always a complete
"struct ieee802154_addr" which AF_PACKET can't never handled and is
greater than 8 bytes.
For that reason we introduce now a "generic" create/parse header_ops
callback which allows handling with intra-pan extended addresses only.
This allows a small use-case with AF_PACKET to send "somehow" a valid
dataframe over DGRAM.
To keeping the current dev_hard_header behaviour we introduce a similar
callback structure "wpan_dev_header_ops" which contains 802.15.4 specific
upper-layer header creation functionality, which can be called by
wpan_dev_hard_header.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
Sometimes upper-layer protocols wants to generate a new mac header by
filling "struct ieee802154_hdr" only. These upper-layers sets for the
address settings the source and dest fields, but not the fc fields for
indicate the source and dest address mode. This patch changes the
"ieee802154_hdr_push" function so the fc address fields are set
according the source and dest fields of "struct ieee802154_hdr".
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
This patch adds a missing list_del when a device description will be
deleted.
Cc: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
Before allowing lockless LISTEN processing, we need to make
sure to arm the SYN_RECV timer before the req socket is visible
in hash tables.
Also, req->rsk_hash should be written before we set rsk_refcnt
to a non zero value.
Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ying Cai <ycai@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When creating a timewait socket, we need to arm the timer before
allowing other cpus to find it. The signal allowing cpus to find
the socket is setting tw_refcnt to non zero value.
As we set tw_refcnt in __inet_twsk_hashdance(), we therefore need to
call inet_twsk_schedule() first.
This also means we need to remove tw_refcnt changes from
inet_twsk_schedule() and let the caller handle it.
Note that because we use mod_timer_pinned(), we have the guarantee
the timer wont expire before we set tw_refcnt as we run in BH context.
To make things more readable I introduced inet_twsk_reschedule() helper.
When rearming the timer, we can use mod_timer_pending() to make sure
we do not rearm a canceled timer.
Note: This bug can possibly trigger if packets of a flow can hit
multiple cpus. This does not normally happen, unless flow steering
is broken somehow. This explains this bug was spotted ~5 months after
its introduction.
A similar fix is needed for SYN_RECV sockets in reqsk_queue_hash_req(),
but will be provided in a separate patch for proper tracking.
Fixes: 789f558cfb36 ("tcp/dccp: get rid of central timewait timer")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Ying Cai <ycai@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch makes TLP to use 1 sec timer by default when RTT is
not available due to SYN/ACK retransmission or SYN cookies.
Prior to this change, the lack of RTT prevents TLP so the first
data packets sent can only be recovered by fast recovery or RTO.
If the fast recovery fails to trigger the RTO is 3 second when
SYN/ACK is retransmitted. With this patch we can trigger fast
recovery in 1sec instead.
Note that we need to check Fast Open more properly. A Fast Open
connection could be (accepted then) closed before it receives
the final ACK of 3WHS so the state is FIN_WAIT_1. Without the
new check, TLP will retransmit FIN instead of SYN/ACK.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently SYN/ACK RTT is measured in jiffies. For LAN the SYN/ACK
RTT is often measured as 0ms or sometimes 1ms, which would affect
RTT estimation and min RTT samping used by some congestion control.
This patch improves SYN/ACK RTT to be usec resolution if platform
supports it. While the timestamping of SYN/ACK is done in request
sock, the RTT measurement is carefully arranged to avoid storing
another u64 timestamp in tcp_sock.
For regular handshake w/o SYNACK retransmission, the RTT is sampled
right after the child socket is created and right before the request
sock is released (tcp_check_req() in tcp_minisocks.c)
For Fast Open the child socket is already created when SYN/ACK was
sent, the RTT is sampled in tcp_rcv_state_process() after processing
the final ACK an right before the request socket is released.
If the SYN/ACK was retransmistted or SYN-cookie was used, we rely
on TCP timestamps to measure the RTT. The sample is taken at the
same place in tcp_rcv_state_process() after the timestamp values
are validated in tcp_validate_incoming(). Note that we do not store
TS echo value in request_sock for SYN-cookies, because the value
is already stored in tp->rx_opt used by tcp_ack_update_rtt().
One side benefit is that the RTT measurement now happens before
initializing congestion control (of the passive side). Therefore
the congestion control can use the SYN/ACK RTT.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The iucv code uses arrays as arguments. Even though this does not
really cause a problem, it could be misleading, since the compiler
turns array arguments into just a pointer argument. To be more
precise this patch changes the array arguments into pointers.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:
====================
pull request: bluetooth-next 2015-09-18
Here's the first bluetooth-next pull request for the 4.4 kernel:
- ieee802154 cleanups & fixes
- debugfs support for the at86rf230 driver
- Support for quirky (seemingly counterfeit) CSR Bluetooth controllers
- Power management and device config improvements for Intel controllers
- Fix for devices with incorrect advertising data length
- Fix for closing HCI user channel socket
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The commit c0bb07df7d981e4091432754e30c9c720e2c0c78 ("netlink:
Reset portid after netlink_insert failure") introduced a race
condition where if two threads try to autobind the same socket
one of them may end up with a zero port ID. This led to kernel
deadlocks that were observed by multiple people.
This patch reverts that commit and instead fixes it by introducing
a separte rhash_portid variable so that the real portid is only set
after the socket has been successfully hashed.
Fixes: c0bb07df7d98 ("netlink: Reset portid after netlink_insert failure")
Reported-by: Tejun Heo <tj@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This was already done a long time ago in
commit 64194c31a0b6 ("inet: Make tunnel RX/TX byte counters more consistent")
but tx path was broken (at least since 3.10).
Before the patch the gre header was included on tx.
After the patch:
$ ping -c1 192.168.0.121 ; ip -s l ls dev gre1
PING 192.168.0.121 (192.168.0.121) 56(84) bytes of data.
64 bytes from 192.168.0.121: icmp_req=1 ttl=64 time=2.95 ms
--- 192.168.0.121 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 2.955/2.955/2.955/0.000 ms
7: gre1@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1468 qdisc noqueue state UNKNOWN mode DEFAULT group default
link/gre 10.16.0.249 peer 10.16.0.121
RX: bytes packets errors dropped overrun mcast
84 1 0 0 0 0
TX: bytes packets errors dropped carrier collsns
84 1 0 0 0 0
Reported-by: Julien Meunier <julien.meunier@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patch contains Netfilter fixes for your net tree, they are:
1) nf_log_unregister() should only set to NULL the logger that is being
unregistered, instead of everything else. Patch from Florian Westphal.
2) Fix a crash when accessing physoutdev from PREROUTING in br_netfilter.
This is partially reverting the patch to shrink nf_bridge_info to 32 bytes.
Also from Florian.
3) Use existing match/target extensions in the internal nft_compat extension
lists when the extension is family unspecific (ie. NFPROTO_UNSPEC).
4) Wait for rcu grace period before leaving nf_log_unregister().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The msg pointer into header may change after skb linearization.
We must reinitialize it after calling skb_linearize to prevent
operating on a freed or invalid pointer.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reported-by: Tamás Végh <tamas.vegh@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Replace time_t type and get_seconds function which are not y2038 safe
on 32-bit systems. Function ktime_get_seconds use monotonic instead of
real time and therefore will not cause overflow.
Signed-off-by: Ksenija Stanojevic <ksenija.stanojevic@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Man page of ip-route(8) says following about route types:
unreachable - these destinations are unreachable. Packets are dis‐
carded and the ICMP message host unreachable is generated. The local
senders get an EHOSTUNREACH error.
blackhole - these destinations are unreachable. Packets are dis‐
carded silently. The local senders get an EINVAL error.
prohibit - these destinations are unreachable. Packets are discarded
and the ICMP message communication administratively prohibited is
generated. The local senders get an EACCES error.
In the inet6 address family, this was correct, except the local senders
got ENETUNREACH error instead of EHOSTUNREACH in case of unreachable route.
In the inet address family, all three route types generated ICMP message
net unreachable, and the local senders got ENETUNREACH error.
In both address families all three route types now behave consistently
with documentation.
Signed-off-by: Nikola Forró <nforro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Under all conditions, it should be quite sufficient just to mark
the socket as disconnected. It will then be closed by the
transport shutdown or reconnect code.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Avoid all races with the connect/disconnect handlers by taking the
transport lock.
Reported-by:"Suzuki K. Poulose" <suzuki.poulose@arm.com>
Acked-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Use nf_ct_net(ct) instead of guessing that the netdevice out can
reliably report the network namespace the conntrack operation is
happening in.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Instead of calling dev_net on a likley looking network device
pass state->net into nf_xfrm_me_harder.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Only pass the void *priv parameter out of the nf_hook_ops. That is
all any of the functions are interested now, and by limiting what is
passed it becomes simpler to change implementation details.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This should be more cache efficient as state is more likely to be in
core, and the netfilter core will stop passing in ops soon.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
As gre does not have the srckey in the packet gre_pkt_to_tuple
needs to perform a lookup in it's per network namespace tables.
Pass in the proper network namespace to all pkt_to_tuple
implementations to ensure gre (and any similar protocols) can get this
right.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Stop guessing the struct net instead of remember it. Guessing is just
silly and will be problematic in the future when I implement routes
between network namespaces.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This allows them to stop guessing the network namespace with pick_net.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
net_devices
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
devices
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
As xt_action_param lives on the stack this does not bloat any
persistent data structures.
This is a first step in making netfilter code that needs to know
which network namespace it is executing in simpler.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
- Add nft_pktinfo.pf to replace ops->pf
- Add nft_pktinfo.hook to replace ops->hooknum
This simplifies the code, makes it more readable, and likely reduces
cache line misses. Maintainability is enhanced as the details of
nft_hook_ops are of no concern to the recpients of nft_pktinfo.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The values of nf_hook_state.hook and nf_hook_ops.hooknum must be the
same by definition.
We are more likely to access the fields in nf_hook_state over the
fields in nf_hook_ops so with a little luck this results in
fewer cache line misses, and slightly more consistent code.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The values of ops->hooknum and state->hook are guaraneted to be equal
making the hook argument to ip6t_do_table, arp_do_table, and
ipt_do_table is unnecessary. Remove the unnecessary hook argument.
In the callers use state->hook instead of ops->hooknum for clarity and
to reduce the number of cachelines the callers touch.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Nearly everything thing of interest to ebt_do_table is already present
in nf_hook_state. Simplify ebt_do_table by just passing in the skb,
nf_hook_state, and the table. This make the code easier to read and
maintenance easier.
To support this create an nf_hook_state on the stack in ebt_broute
(the only caller without a nf_hook_state already available). This new
nf_hook_state adds no new computations to ebt_broute, but does use a
few more bytes of stack.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next
Simon Horman says:
====================
IPVS Updates for v4.4
please consider these IPVS Updates for v4.4.
The updates include the following from Alex Gartrell:
* Scheduling of ICMP
* Sysctl to ignore tunneled packets; and hence some packet-looping scenarios
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Some remote devices (ie Gigaset G-Tag) misbehave with ADV data length.
This can lead to incorrect EIR format in device found event when
ADV_DATA and SCAN_RSP are merged (terminator field before SCAN_RSP
part).
Fix this by inspecting ADV_DATA and correct its length if terminator
is found.
> HCI Event: LE Meta Event (0x3e) plen 42 [hci0] 32.172182
LE Advertising Report (0x02)
Num reports: 1
Event type: Connectable undirected - ADV_IND (0x00)
Address type: Public (0x00)
Address: 7C:2F:80:94:97:5A (Gigaset Communications GmbH)
Data length: 30
Flags: 0x06
LE General Discoverable Mode
BR/EDR Not Supported
Company: Gigaset Communications GmbH (384)
Data: 021512348094975abbc5
16-bit Service UUIDs (partial): 1 entry
Battery Service (0x180f)
RSSI: -65 dBm (0xbf)
> HCI Event: LE Meta Event (0x3e) plen 27 [hci0] 32.172191
LE Advertising Report (0x02)
Num reports: 1
Event type: Scan response - SCAN_RSP (0x04)
Address type: Public (0x00)
Address: 7C:2F:80:94:97:5A (Gigaset Communications GmbH)
Data length: 15
Name (complete): Gigaset G-tag
RSSI: -59 dBm (0xc5)
Note "Data length: 30" in ADV_DATA which results in 9 extra zero bytes
after Battery Service UUID. Terminator field present in the middle of
EIR in Device Found event resulted in userspace stop parsing EIR and
skipping device name.
@ Device Found: 7C:2F:80:94:97:5A (1) rssi -59 flags 0x0000
02 01 06 0d ff 80 01 02 15 12 34 80 94 97 5a bb ..........4...Z.
c5 03 02 0f 18 00 00 00 00 00 00 00 00 00 0e 09 ................
47 69 67 61 73 65 74 20 47 2d 74 61 67 Gigaset G-tag
With this fix EIR with merged ADV_DATA and SCAN_RSP in device found
event is properly formatted:
@ Device Found: 7C:2F:80:94:97:5A (1) rssi -59 flags 0x0000
02 01 06 0d ff 80 01 02 15 12 34 80 94 97 5a bb ..........4...Z.
c5 03 02 0f 18 0e 09 47 69 67 61 73 65 74 20 47 .......Gigaset G
2d 74 61 67 -tag
Signed-off-by: Szymon Janc <ext.szymon.janc@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
This patch adds ratelimited version of the BT_ERR macro.
Signed-off-by: Szymon Janc <ext.szymon.janc@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
Memory placement in sch_dsmark is silly : Better place mask/value
in the same cache line.
Also, we can embed small arrays in the first cache line and
remove a potential cache miss.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Tracking idle time in bictcp_cwnd_event() is imprecise, as epoch_start
is normally set at ACK processing time, not at send time.
Doing a proper fix would need to add an additional state variable,
and does not seem worth the trouble, given CUBIC bug has been there
forever before Jana noticed it.
Let's simply not set epoch_start in the future, otherwise
bictcp_update() could overflow and CUBIC would again
grow cwnd too fast.
This was detected thanks to a packetdrill test Neal wrote that was flaky
before applying this fix.
Fixes: 30927520dbae ("tcp_cubic: better follow cubic curve after idle period")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Cc: Jana Iyengar <jri@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Johan Hedberg says:
====================
pull request: bluetooth 2015-09-17
Here's one important patch for the 4.3-rc series that fixes an issue
with Bluetooth LE encryption failing because of a too early check for
the SMP context.
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If we didn't call ATMARP_MKIP before ATMARP_ENCAP the VCC descriptor is
non-existant and we'll end up dereferencing a NULL ptr:
[1033173.491930] kasan: GPF could be caused by NULL-ptr deref or user memory accessirq event stamp: 123386
[1033173.493678] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[1033173.493689] Modules linked in:
[1033173.493697] CPU: 9 PID: 23815 Comm: trinity-c64 Not tainted 4.2.0-next-20150911-sasha-00043-g353d875-dirty #2545
[1033173.493706] task: ffff8800630c4000 ti: ffff880063110000 task.ti: ffff880063110000
[1033173.493823] RIP: clip_ioctl (net/atm/clip.c:320 net/atm/clip.c:689)
[1033173.493826] RSP: 0018:ffff880063117a88 EFLAGS: 00010203
[1033173.493828] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 000000000000000c
[1033173.493830] RDX: 0000000000000002 RSI: ffffffffb3f10720 RDI: 0000000000000014
[1033173.493832] RBP: ffff880063117b80 R08: ffff88047574d9a4 R09: 0000000000000000
[1033173.493834] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff1000c622f53
[1033173.493836] R13: ffff8800cb905500 R14: ffff8808d6da2000 R15: 00000000fffffdfd
[1033173.493840] FS: 00007fa56b92d700(0000) GS:ffff880478000000(0000) knlGS:0000000000000000
[1033173.493843] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[1033173.493845] CR2: 0000000000000000 CR3: 00000000630e8000 CR4: 00000000000006a0
[1033173.493855] Stack:
[1033173.493862] ffffffffb0b60444 000000000000eaea 0000000041b58ab3 ffffffffb3c3ce32
[1033173.493867] ffffffffb0b6f3e0 ffffffffb0b60444 ffffffffb5ea2e50 1ffff1000c622f5e
[1033173.493873] ffff8800630c4cd8 00000000000ee09a ffffffffb3ec4888 ffffffffb5ea2de8
[1033173.493874] Call Trace:
[1033173.494108] do_vcc_ioctl (net/atm/ioctl.c:170)
[1033173.494113] vcc_ioctl (net/atm/ioctl.c:189)
[1033173.494116] svc_ioctl (net/atm/svc.c:605)
[1033173.494200] sock_do_ioctl (net/socket.c:874)
[1033173.494204] sock_ioctl (net/socket.c:958)
[1033173.494244] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607)
[1033173.494290] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613)
[1033173.494295] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:186)
[1033173.494362] Code: fa 48 c1 ea 03 80 3c 02 00 0f 85 50 09 00 00 49 8b 9e 60 06 00 00 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 14 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 14 09 00
All code
========
0: fa cli
1: 48 c1 ea 03 shr $0x3,%rdx
5: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1)
9: 0f 85 50 09 00 00 jne 0x95f
f: 49 8b 9e 60 06 00 00 mov 0x660(%r14),%rbx
16: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
1d: fc ff df
20: 48 8d 7b 14 lea 0x14(%rbx),%rdi
24: 48 89 fa mov %rdi,%rdx
27: 48 c1 ea 03 shr $0x3,%rdx
2b:* 0f b6 04 02 movzbl (%rdx,%rax,1),%eax <-- trapping instruction
2f: 48 89 fa mov %rdi,%rdx
32: 83 e2 07 and $0x7,%edx
35: 38 d0 cmp %dl,%al
37: 7f 08 jg 0x41
39: 84 c0 test %al,%al
3b: 0f 85 14 09 00 00 jne 0x955
Code starting with the faulting instruction
===========================================
0: 0f b6 04 02 movzbl (%rdx,%rax,1),%eax
4: 48 89 fa mov %rdi,%rdx
7: 83 e2 07 and $0x7,%edx
a: 38 d0 cmp %dl,%al
c: 7f 08 jg 0x16
e: 84 c0 test %al,%al
10: 0f 85 14 09 00 00 jne 0x92a
[1033173.494366] RIP clip_ioctl (net/atm/clip.c:320 net/atm/clip.c:689)
[1033173.494368] RSP <ffff880063117a88>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
David Woodhouse reports skb_under_panic when we try to push ethernet
header to fragmented ipv6 skbs:
skbuff: skb_under_panic: text:c1277f1e len:1294 put:14 head:dec98000
data:dec97ffc tail:0xdec9850a end:0xdec98f40 dev:br-lan
[..]
ip6_finish_output2+0x196/0x4da
David further debugged this:
[..] offending fragments were arriving here with skb_headroom(skb)==10.
Which is reasonable, being the Solos ADSL card's header of 8 bytes
followed by 2 bytes of PPP frame type.
The problem is that if netfilter ipv6 defragmentation is used, skb_cow()
in ip6_forward will only see reassembled skb.
Therefore, headroom is overestimated by 8 bytes (we pulled fragment
header) and we don't check the skbs in the frag_list either.
We can't do these checks in netfilter defrag since outdev isn't known yet.
Furthermore, existing tests in ip6_fragment did not consider the fragment
or ipv6 header size when checking headroom of the fraglist skbs.
While at it, also fix a skb leak on memory allocation -- ip6_fragment
must consume the skb.
I tested this e1000 driver hacked to not allocate additional headroom
(we end up in slowpath, since LL_RESERVED_SPACE is 16).
If 2 bytes of headroom are allocated, fastpath is taken (14 byte
ethernet header was pulled, so 16 byte headroom available in all
fragments).
Reported-by: David Woodhouse <dwmw2@infradead.org>
Diagnosed-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Sergey, Richard and Fabio reported an oops in ip_route_input_noref. e.g., from Richard:
[ 0.877040] BUG: unable to handle kernel NULL pointer dereference at 0000000000000056
[ 0.877597] IP: [<ffffffff8155b5e2>] ip_route_input_noref+0x1a2/0xb00
[ 0.877597] PGD 3fa14067 PUD 3fa6e067 PMD 0
[ 0.877597] Oops: 0000 [#1] SMP
[ 0.877597] Modules linked in: virtio_net virtio_pci virtio_ring virtio
[ 0.877597] CPU: 1 PID: 119 Comm: ifconfig Not tainted 4.2.0+ #1
[ 0.877597] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 0.877597] task: ffff88003fab0bc0 ti: ffff88003faa8000 task.ti: ffff88003faa8000
[ 0.877597] RIP: 0010:[<ffffffff8155b5e2>] [<ffffffff8155b5e2>] ip_route_input_noref+0x1a2/0xb00
[ 0.877597] RSP: 0018:ffff88003ed03ba0 EFLAGS: 00010202
[ 0.877597] RAX: 0000000000000046 RBX: 00000000ffffff8f RCX: 0000000000000020
[ 0.877597] RDX: ffff88003fab50b8 RSI: 0000000000000200 RDI: ffffffff8152b4b8
[ 0.877597] RBP: ffff88003ed03c50 R08: 0000000000000000 R09: 0000000000000000
[ 0.877597] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003fab6f00
[ 0.877597] R13: ffff88003fab5000 R14: 0000000000000000 R15: ffffffff81cb5600
[ 0.877597] FS: 00007f6de5751700(0000) GS:ffff88003ed00000(0000) knlGS:0000000000000000
[ 0.877597] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.877597] CR2: 0000000000000056 CR3: 000000003fa6d000 CR4: 00000000000006e0
[ 0.877597] Stack:
[ 0.877597] 0000000000000000 0000000000000046 ffff88003fffa600 ffff88003ed03be0
[ 0.877597] ffff88003f9e2c00 697da8c0017da8c0 ffff880000000000 000000000007fd00
[ 0.877597] 0000000000000000 0000000000000046 0000000000000000 0000000400000000
[ 0.877597] Call Trace:
[ 0.877597] <IRQ>
[ 0.877597] [<ffffffff812bfa1f>] ? cpumask_next_and+0x2f/0x40
[ 0.877597] [<ffffffff8158e13c>] arp_process+0x39c/0x690
[ 0.877597] [<ffffffff8158e57e>] arp_rcv+0x13e/0x170
[ 0.877597] [<ffffffff8151feec>] __netif_receive_skb_core+0x60c/0xa00
[ 0.877597] [<ffffffff81515795>] ? __build_skb+0x25/0x100
[ 0.877597] [<ffffffff81515795>] ? __build_skb+0x25/0x100
[ 0.877597] [<ffffffff81521ff6>] __netif_receive_skb+0x16/0x70
[ 0.877597] [<ffffffff81522078>] netif_receive_skb_internal+0x28/0x90
[ 0.877597] [<ffffffff8152288f>] napi_gro_receive+0x7f/0xd0
[ 0.877597] [<ffffffffa0017906>] virtnet_receive+0x256/0x910 [virtio_net]
[ 0.877597] [<ffffffffa0017fd8>] virtnet_poll+0x18/0x80 [virtio_net]
[ 0.877597] [<ffffffff815234cd>] net_rx_action+0x1dd/0x2f0
[ 0.877597] [<ffffffff81053228>] __do_softirq+0x98/0x260
[ 0.877597] [<ffffffff8164969c>] do_softirq_own_stack+0x1c/0x30
The root cause is use of res.table uninitialized.
Thanks to Nikolay for noticing the uninitialized use amongst the maze of
gotos.
As Nikolay pointed out the second initialization is not required to fix
the oops, but rather to fix a related problem where a valid lookup should
be invalidated before creating the rth entry.
Fixes: b7503e0cdb5d ("net: Add FIB table id to rtable")
Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Reported-by: Richard Alpe <richard.alpe@ericsson.com>
Reported-by: Fabio Estevam <festevam@gmail.com>
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Existing bpf_clone_redirect() helper clones skb before redirecting
it to RX or TX of destination netdev.
Introduce bpf_redirect() helper that does that without cloning.
Benchmarked with two hosts using 10G ixgbe NICs.
One host is doing line rate pktgen.
Another host is configured as:
$ tc qdisc add dev $dev ingress
$ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \
action bpf run object-file tcbpf1_kern.o section clone_redirect_xmit drop
so it receives the packet on $dev and immediately xmits it on $dev + 1
The section 'clone_redirect_xmit' in tcbpf1_kern.o file has the program
that does bpf_clone_redirect() and performance is 2.0 Mpps
$ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \
action bpf run object-file tcbpf1_kern.o section redirect_xmit drop
which is using bpf_redirect() - 2.4 Mpps
and using cls_bpf with integrated actions as:
$ tc filter add dev $dev root pref 10 \
bpf run object-file tcbpf1_kern.o section redirect_xmit integ_act classid 1
performance is 2.5 Mpps
To summarize:
u32+act_bpf using clone_redirect - 2.0 Mpps
u32+act_bpf using redirect - 2.4 Mpps
cls_bpf using redirect - 2.5 Mpps
For comparison linux bridge in this setup is doing 2.1 Mpps
and ixgbe rx + drop in ip_rcv - 7.8 Mpps
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|