Age | Commit message (Collapse) | Author |
|
Follow the now usual transmit descriptor update path:
1. content change
2. dma_wmb
3. ownership change
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The NAPI receive path depends on desc->rx_status but it does not
enforce any explicit receive barrier.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The driver no longer produces holes in its receive ring so rx_head_desc
only duplicates cur_rx.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rationales:
- throttle work under memory pressure
- lower receive descriptor recycling latency for the network adapter
- lower the maintenance burden of uncommon paths
The patch is twofold:
- it fails early if the receive ring can't be completely initialized
at dev->open() time
- it drops packets on the floor in the napi receive handler so as to
keep the received ring full
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It's used to initialize the receive ring but it will actually shine when
the receive poll code is reworked.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a macvlan device is placed in promiscuous mode, it currently
just sets it's multicast mask to permissive, but doesn't change
the state of the lower device. As a result, not all multicast
traffic can be received on such device. Additionally, none of
a vlan traffic can be received on such device as well.
This patch propagates the promiscuous mode setting to lower device
so that lower device may receive all packets that macvlan may
be interested in.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit c243d7e20996254f89c28d4838b5feca735c030d.
That patch is solving a non-existant problem while creating a
real problem. Just because a socket is allocated in the init
name space doesn't mean that it gets hashed in the init name space.
When we unhash it the name space must be the same as the one
we had when we hashed it. So this patch is completely bogus
and causes socket leaks.
Reported-by: Andrey Wagin <avagin@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Tom Herbert says:
====================
net: Eliminate calls to flow_dissector and introduce flow_keys_digest
In this patch set we add skb_get_hash_perturb which gets the skbuff
hash for a packet and perturbs it using a provided key and jhash1.
This function is used in serveral qdiscs and eliminates many calls
to flow_dissector and jhash3 to get a perturbed hash for a packet.
To handle the sch_choke issue (passes flow_keys in skbuff cb) we
add flow_keys_digest which is a digest of a flow constructed
from a flow_keys structure.
This is the second version of these patches I posted a while ago,
and is prerequisite work to increasing the size of the flow_keys
structure and hashing over it (full IPv6 address, flow label, VLAN ID,
etc.).
Version 2:
- Add keyval parameter to __flow_hash_from_keys which allows caller to
set the initval for jhash
- Perturb always does flow dissection and creates hash based on
input perturb value which acts as the keyval to __flow_hash_from_keys
- Added a _flow_keys_digest_data which is used in make_flow_keys_digest.
This fills out the digest by populating individual fields instead
of copying the whole structure.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Call make_flow_keys_digest to get a digest from flow keys and
use that to pass skbuff cb and for comparing flows.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some users of flow keys (well just sch_choke now) need to pass
flow_keys in skbuff cb, and use them for exact comparisons of flows
so that skb->hash is not sufficient. In order to increase size of
the flow_keys structure, we introduce another structure for
the purpose of passing flow keys in skbuff cb. We limit this structure
to sixteen bytes, and we will technically treat this as a digest of
flow_keys struct hence its name flow_keys_digest. In the first
incaranation we just copy the flow_keys structure up to 16 bytes--
this is the same information previously passed in the cb. In the
future, we'll adapt this for larger flow_keys and could use something
like SHA-1 over the whole flow_keys to improve the quality of the
digest.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Call skb_get_hash_perturb instead of doing skb_flow_dissect and then
jhash by hand.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Call skb_get_hash_perturb instead of doing skb_flow_dissect and then
jhash by hand.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Call skb_get_hash_perturb instead of doing skb_flow_dissect and then
jhash by hand.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Call skb_get_hash_perturb instead of doing skb_flow_dissect and then
jhash by hand.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This calls flow_disect and __skb_get_hash to procure a hash for a
packet. Input includes a key to initialize jhash. This function
does not set skb->hash.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Alexander Duyck says:
====================
Clean-up some bits related to netpoll
This patch set cleans up some minor items related to netpoll. The first
patch addresses an Rx clean-up bug that is triggered due to an assumption
that napi->poll wouldn't be called with a budget of 0. The other two
patches address dev_kfree_skb being called in the xmit path which isn't
valid since netpoll will call ndo_start_xmit with IRQs disabled.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With netpoll making use of the transmit function it is possible for the
ndo_start_xmit function to be called with irqs disabled. As such we need
to use dev_kfree_skb_any in the Tx cleanup path for frames that are
dropped.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The function r8169_csum_workaround is called in the ndo_start_xmit path of
the r8169 driver. As such it should not be using dev_kfree_skb as it is
not irq safe, so instead we should be using dev_kfree_skb_any for freeing
in the dropped path, and dev_consume_skb_any for any frames that were
transmitted.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The netpoll path will call napi->poll with a budget of 0 in order to clean
the Tx rings only. This change updates the fm10k driver so that it will
correctly support that instead of cleaning 1 Rx frame if a budget of 0 is
received.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add the missing unregister for the mv88e6352_switch_driver.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In setups with a global scope address on an interface, and a lesser
scope address on an interface sending IGMP reports, the reports can be
sent using the other interfaces global scope address rather than the
local interface address. RFC 2236 suggests:
Ignore the Report if you cannot identify the source address of
the packet as belonging to a subnet assigned to the interface on
which the packet was received.
since such reports could be forged.
Look at the protocol when deciding if a RT_SCOPE_LINK address should
be used for the packet.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
this is a pull request of a single patch for net/master.
The patch is contributed by Jeppe Ledet-Pedersen, it fixes the
extended frame handling in the xilinx driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
TC classifiers/actions were converted to RCU by John in the series:
http://thread.gmane.org/gmane.linux.network/329739/focus=329739
and many follow on patches.
This is the last patch from that series that finally drops
ingress spin_lock.
Single cpu ingress+u32 performance goes from 22.9 Mpps to 24.5 Mpps.
In two cpu case when both cores are receiving traffic on the same
device and go into the same ingress+u32 the performance jumps
from 4.5 + 4.5 Mpps to 23.5 + 23.5 Mpps
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This file is built off of a tristate Kconfig option and also contains
modular function calls so it should explicitly include module.h to
avoid compile breakage during header shuffles done in the future.
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
rdma_conn_param private data is copied using memcpy after headers such
as cma_hdr (see cma_resolve_ib_udp as example). so the start of the
private data is aligned to the end of the structure that come before. if
this structure end with u32 the meaning is that the start of the private
data will be 4 bytes aligned. structures that use u8/u16/u32/u64 are
naturally aligned but in case the structure start is not 8 bytes aligned,
all u64 members of this structure will not be aligned. to solve this issue
we must use special macros that allow unaligned access to those
unaligned members.
Addresses the following kernel log seen when attempting to use RDMA:
Kernel unaligned access at TPC[10507a88] rds_ib_cm_connect_complete+0x1bc/0x1e0 [rds_rdma]
Acked-by: Chien Yen <chien.yen@oracle.com>
Signed-off-by: shamir rabinovitch <shamir.rabinovitch@oracle.com>
[Minor tweaks for top of tree by:]
Signed-off-by: David Ahern <david.ahern@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We currently limit the hash table size to 64K which is very bad
as even 10 years ago it was relatively easy to generate millions
of sockets.
Since the hash table is naturally limited by memory allocation
failure, we don't really need an explicit limit so this patch
removes it.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Kenneth Klette Jonassen says:
====================
tcp: SACK RTTM changes for congestion control
This patch series improves SACK RTT measurements for congestion control:
o Picks the latest sequence SACKed for RTT, i.e. most accurate delay
signal.
o Calls the congestion control's pkts_acked hook with SACK RTTMs
even when not sequentially ACKing new data.
V2: amend misleading comment
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Invoking pkts_acked is currently conditioned on FLAG_ACKED:
receiving a cumulative ACK of new data, or ACK with SYN flag set.
Remove this condition so that CC may get RTT measurements from all SACKs.
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tcp_sacktag_one() always picks the earliest sequence SACKed for RTT.
This might not make sense for congestion control in cases where:
1. ACKs are lost, i.e. a SACK following a lost SACK covers both
new and old segments at the receiver.
2. The receiver disregards the RFC 5681 recommendation to immediately
ACK out-of-order segments.
Give congestion control a RTT for the latest segment SACKed, which is the
most accurate RTT estimate, but preserve the conservative RTT for RTO.
Removes the call to skb_mstamp_get() in tcp_sacktag_one().
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Later patch passes two values set in tcp_sacktag_one() to
tcp_clean_rtx_queue(). Prepare passing them via struct tcp_sacktag_state.
Acked-by: Yuchung Cheng <ycheng@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Thomas Graf says:
====================
rhashtable self-test improvements
This series improves the rhashtable self-test to:
* Avoid allocation of test objects
* Measure the time of test runs
* Use the iterator to walk the table for consistency
* Account for failed insertions due to memory pressure or
utilization pressure
* Ignore failed insertions when checking for consistency
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Account for failed inserts due to memory pressure or EBUSY and
ignore failed entries during the consistency check.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As resizes may continue to run in the background, use walker to
ensure we see all entries. Also print the encountered number
of rehashes queued up while traversing.
This may lead to warnings due to entries being seen multiple
times. We consider them non-fatal.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
By far the most expensive part of the selftest was the allocation
of entries. Using a static array allows to measure the rhashtable
operations.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This only blows up the size of the test structure for no gain
in test coverage. Reduces size of test_obj from 24 to 16 bytes.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make test configurable by allowing to specify all relevant knobs
through module parameters.
Do several test runs and measure the average time it takes to
insert & remove all entries. Note, a deferred resize might still
continue to run in the background.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Alexander Duyck says:
====================
A few minor clean-ups to eth_type_trans
This series addresses a few minor issues I found in eth_type_trans that
that allow us to gain back something like 3 or more cycles per packet.
The first change is to drop the byte swap since it isn't necessary. On x86
we could just check the first byte and compare that against the upper 8
bits of the Ethertype to determine if we are dealing with a size value or
not.
The second makes it so that the value we read in to test for multicast can
be used for the address comparison. This allows us to avoid a second read
of the destination address.
The final change is to avoid some unneeded instructions in computing the
Ethernet header pointer. When we start the call the Ethernet header is at
skb->data, so we just use that rather than computing mac_header, and then
adding that back to skb->head.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Avoid recomputing the Ethernet header location and instead just use the
pointer provided by skb->data. The problem with using eth_hdr is that the
compiler wasn't smart enough to realize that skb->head + skb->mac_header
was the same thing as skb->data before it added ETH_HLEN. By just caching
it off before calling skb_pull_inline we can avoid a few unnecessary
instructions.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This change makes it so that we process the address in
is_multicast_ether_addr at the same size as the other calls. This allows
us to avoid duplicate reads when used with other calls such as
is_zero_ether_addr or eth_addr_copy. In addition I have added a 64 bit
version of the function so in eth_type_trans we can process the destination
address as a 64 bit value throughout.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This change takes advantage of the fact that ETH_P_802_3_MIN is aligned to
512 so as a result we can actually ignore the lower 8b when comparing the
Ethertype to ETH_P_802_3_MIN. This allows us to avoid a byte swap by simply
masking the value and comparing it to the byte swapped value for
ETH_P_802_3_MIN.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
Under presence of TSO/GSO/GRO packets, codel at low rates can be quite
useless. In following example, not a single packet was ever dropped,
while average delay in codel queue is ~100 ms !
qdisc codel 0: parent 1:12 limit 16000p target 5.0ms interval 100.0ms
Sent 134376498 bytes 88797 pkt (dropped 0, overlimits 0 requeues 0)
backlog 13626b 3p requeues 0
count 0 lastcount 0 ldelay 96.9ms drop_next 0us
maxpacket 9084 ecn_mark 0 drop_overlimit 0
This comes from a confusion of what should be the minimal backlog. It is
pretty clear it is not 64KB or whatever max GSO packet ever reached the
qdisc.
codel intent was to use MTU of the device.
After the fix, we finally drop some packets, and rtt/cwnd of my single
TCP flow are meeting our expectations.
qdisc codel 0: parent 1:12 limit 16000p target 5.0ms interval 100.0ms
Sent 102798497 bytes 67912 pkt (dropped 1365, overlimits 0 requeues 0)
backlog 6056b 3p requeues 0
count 1 lastcount 1 ldelay 36.3ms drop_next 0us
maxpacket 10598 ecn_mark 0 drop_overlimit 0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kathleen Nichols <nichols@pollere.com>
Cc: Dave Taht <dave.taht@gmail.com>
Cc: Van Jacobson <vanj@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch divides the IPv6 flow label space into two ranges:
0-7ffff is reserved for flow label manager, 80000-fffff will be
used for creating auto flow labels (per RFC6438). This only affects how
labels are set on transmit, it does not affect receive. This range split
can be disbaled by systcl.
Background:
IPv6 flow labels have been an unmitigated disappointment thus far
in the lifetime of IPv6. Support in HW devices to use them for ECMP
is lacking, and OSes don't turn them on by default. If we had these
we could get much better hashing in IPv6 networks without resorting
to DPI, possibly eliminating some of the motivations to to define new
encaps in UDP just for getting ECMP.
Unfortunately, the initial specfications of IPv6 did not clarify
how they are to be used. There has always been a vague concept that
these can be used for ECMP, flow hashing, etc. and we do now have a
good standard how to this in RFC6438. The problem is that flow labels
can be either stateful or stateless (as in RFC6438), and we are
presented with the possibility that a stateless label may collide
with a stateful one. Attempts to split the flow label space were
rejected in IETF. When we added support in Linux for RFC6438, we
could not turn on flow labels by default due to this conflict.
This patch splits the flow label space and should give us
a path to enabling auto flow labels by default for all IPv6 packets.
This is an API change so we need to consider compatibility with
existing deployment. The stateful range is chosen to be the lower
values in hopes that most uses would have chosen small numbers.
Once we resolve the stateless/stateful issue, we can proceed to
look at enabling RFC6438 flow labels by default (starting with
scaled testing).
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In my earlier commit:
653437d02f1f ("ipv6: Stop /128 route from disappearing after pmtu update"),
there was a horrible typo. Instead of checking RTF_LOCAL on
rt->rt6i_flags, it was checked on rt->dst.flags. This patch fixes
it.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hajime Tazaki <tazaki@sfc.wide.ad.jp>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Some miscellaneous bug fixes and some final on-disk and ABI changes
for ext4 encryption which provide better security and performance"
* tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix growing of tiny filesystems
ext4: move check under lock scope to close a race.
ext4: fix data corruption caused by unwritten and delayed extents
ext4 crypto: remove duplicated encryption mode definitions
ext4 crypto: do not select from EXT4_FS_ENCRYPTION
ext4 crypto: add padding to filenames before encrypting
ext4 crypto: simplify and speed up filename encryption
|
|
Pull drm fixes from Dave Airlie:
"One intel fix, one rockchip fix, and a bunch of radeon fixes for some
regressions from audio rework and vm stability"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/i915/chv: Implement WaDisableShadowRegForCpd
drm/radeon: fix userptr return value checking (v2)
drm/radeon: check new address before removing old one
drm/radeon: reset BOs address after clearing it.
drm/radeon: fix lockup when BOs aren't part of the VM on release
drm/radeon: add SI DPM quirk for Sapphire R9 270 Dual-X 2G GDDR5
drm/radeon: adjust pll when audio is not enabled
drm/radeon: only enable audio streams if the monitor supports it
drm/radeon: only mark audio as connected if the monitor supports it (v3)
drm/radeon/audio: don't enable packets until the end
drm/radeon: drop dce6_dp_enable
drm/radeon: fix ordering of AVI packet setup
drm/radeon: Use drm_calloc_ab for CS relocs
drm/rockchip: fix error check when getting irq
MAINTAINERS: add entry for Rockchip drm drivers
|
|
git://anongit.freedesktop.org/drm-intel into drm-fixes
Just a single intel fix
* tag 'drm-intel-fixes-2015-04-30' of git://anongit.freedesktop.org/drm-intel:
drm/i915/chv: Implement WaDisableShadowRegForCpd
|