Age | Commit message (Collapse) | Author |
|
When dealing with a list of dismantling netns, we can scan
tcp_metrics once, saving cpu cycles.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Having a global list of labels do not scale to thousands of
netns in the cloud era. This causes quadratic behavior on
netns creation and deletion.
This is time having a per netns list of ~10 labels.
Tested:
$ time perf record (for f in `seq 1 3000` ; do ip netns add tast$f; done)
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 3.637 MB perf.data (~158898 samples) ]
real 0m20.837s # instead of 0m24.227s
user 0m0.328s
sys 0m20.338s # instead of 0m23.753s
16.17% ip [kernel.kallsyms] [k] netlink_broadcast_filtered
12.30% ip [kernel.kallsyms] [k] netlink_has_listeners
6.76% ip [kernel.kallsyms] [k] _raw_spin_lock_irqsave
5.78% ip [kernel.kallsyms] [k] memset_erms
5.77% ip [kernel.kallsyms] [k] kobject_uevent_env
5.18% ip [kernel.kallsyms] [k] refcount_sub_and_test
4.96% ip [kernel.kallsyms] [k] _raw_read_lock
3.82% ip [kernel.kallsyms] [k] refcount_inc_not_zero
3.33% ip [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
2.11% ip [kernel.kallsyms] [k] unmap_page_range
1.77% ip [kernel.kallsyms] [k] __wake_up
1.69% ip [kernel.kallsyms] [k] strlen
1.17% ip [kernel.kallsyms] [k] __wake_up_common
1.09% ip [kernel.kallsyms] [k] insert_header
1.04% ip [kernel.kallsyms] [k] page_remove_rmap
1.01% ip [kernel.kallsyms] [k] consume_skb
0.98% ip [kernel.kallsyms] [k] netlink_trim
0.51% ip [kernel.kallsyms] [k] kernfs_link_sibling
0.51% ip [kernel.kallsyms] [k] filemap_map_pages
0.46% ip [kernel.kallsyms] [k] memcpy_erms
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We can build one skb and let it be cloned in netlink.
This is much faster, and use less memory (all clones will
share the same skb->head)
Tested:
time perf record (for f in `seq 1 3000` ; do ip netns add tast$f; done)
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 4.110 MB perf.data (~179584 samples) ]
real 0m24.227s # instead of 0m52.554s
user 0m0.329s
sys 0m23.753s # instead of 0m51.375s
14.77% ip [kernel.kallsyms] [k] __ip6addrlbl_add
14.56% ip [kernel.kallsyms] [k] netlink_broadcast_filtered
11.65% ip [kernel.kallsyms] [k] netlink_has_listeners
6.19% ip [kernel.kallsyms] [k] _raw_spin_lock_irqsave
5.66% ip [kernel.kallsyms] [k] kobject_uevent_env
4.97% ip [kernel.kallsyms] [k] memset_erms
4.67% ip [kernel.kallsyms] [k] refcount_sub_and_test
4.41% ip [kernel.kallsyms] [k] _raw_read_lock
3.59% ip [kernel.kallsyms] [k] refcount_inc_not_zero
3.13% ip [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
1.55% ip [kernel.kallsyms] [k] __wake_up
1.20% ip [kernel.kallsyms] [k] strlen
1.03% ip [kernel.kallsyms] [k] __wake_up_common
0.93% ip [kernel.kallsyms] [k] consume_skb
0.92% ip [kernel.kallsyms] [k] netlink_trim
0.87% ip [kernel.kallsyms] [k] insert_header
0.63% ip [kernel.kallsyms] [k] unmap_page_range
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
No need to iterate over strings, just copy in one efficient memcpy() call.
Tested:
time perf record "(for f in `seq 1 3000` ; do ip netns add tast$f; done)"
[ perf record: Woken up 10 times to write data ]
[ perf record: Captured and wrote 8.224 MB perf.data (~359301 samples) ]
real 0m52.554s # instead of 1m7.492s
user 0m0.309s
sys 0m51.375s # instead of 1m6.875s
9.88% ip [kernel.kallsyms] [k] netlink_broadcast_filtered
8.86% ip [kernel.kallsyms] [k] string
7.37% ip [kernel.kallsyms] [k] __ip6addrlbl_add
5.68% ip [kernel.kallsyms] [k] netlink_has_listeners
5.52% ip [kernel.kallsyms] [k] memcpy_erms
4.76% ip [kernel.kallsyms] [k] __alloc_skb
4.54% ip [kernel.kallsyms] [k] vsnprintf
3.94% ip [kernel.kallsyms] [k] format_decode
3.80% ip [kernel.kallsyms] [k] kmem_cache_alloc_node_trace
3.71% ip [kernel.kallsyms] [k] kmem_cache_alloc_node
3.66% ip [kernel.kallsyms] [k] kobject_uevent_env
3.38% ip [kernel.kallsyms] [k] strlen
2.65% ip [kernel.kallsyms] [k] _raw_spin_lock_irqsave
2.20% ip [kernel.kallsyms] [k] kfree
2.09% ip [kernel.kallsyms] [k] memset_erms
2.07% ip [kernel.kallsyms] [k] ___cache_free
1.95% ip [kernel.kallsyms] [k] kmem_cache_free
1.91% ip [kernel.kallsyms] [k] _raw_read_lock
1.45% ip [kernel.kallsyms] [k] ksize
1.25% ip [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
1.00% ip [kernel.kallsyms] [k] widen_string
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This removes some #ifdef pollution and will ease follow up patches.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
gen estimator has been rewritten in commit 1c0d32fde5bd
("net_sched: gen_estimator: complete rewrite of rate estimators"),
the caller no longer needs to wait for a grace period. So this
patch gets rid of it.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch is pretty much a carbon copy of
commit 3079c652141f ("caif: Fix napi poll list corruption")
with "caif" replaced by "emac".
The commit d75b1ade567f ("net: less interrupt masking in NAPI")
breaks emac.
It is now required that if the entire budget is consumed when poll
returns, the napi poll_list must remain empty. However, like some
other drivers emac tries to do a last-ditch check and if there is
more work it will call napi_reschedule and then immediately process
some of this new work. Should the entire budget be consumed while
processing such new work then we will violate the new caller
contract.
This patch fixes this by not touching any work when we reschedule
in emac.
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If the hash to port mapping table does not have a valid port (i.e. when
a port goes down), fall back to the simple hashing mechanism to avoid
dropping packets.
Signed-off-by: Jim Hanko <hanko@drivescale.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Our recent change exposed a bug in TCP Fastopen Client that syzkaller
found right away [1]
When we prepare skb with SYN+DATA, we attempt to transmit it,
and we update socket state as if the transmit was a success.
In socket RTX queue we have two skbs, one with the SYN alone,
and a second one containing the DATA.
When (malicious) ACK comes in, we now complain that second one had no
skb_mstamp.
The proper fix is to make sure that if the transmit failed, we do not
pretend we sent the DATA skb, and make it our send_head.
When 3WHS completes, we can now send the DATA right away, without having
to wait for a timeout.
[1]
WARNING: CPU: 0 PID: 100189 at net/ipv4/tcp_input.c:3117 tcp_clean_rtx_queue+0x2057/0x2ab0 net/ipv4/tcp_input.c:3117()
WARN_ON_ONCE(last_ackt == 0);
Modules linked in:
CPU: 0 PID: 100189 Comm: syz-executor1 Not tainted
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
0000000000000000 ffff8800b35cb1d8 ffffffff81cad00d 0000000000000000
ffffffff828a4347 ffff88009f86c080 ffffffff8316eb20 0000000000000d7f
ffff8800b35cb220 ffffffff812c33c2 ffff8800baad2440 00000009d46575c0
Call Trace:
[<ffffffff81cad00d>] __dump_stack
[<ffffffff81cad00d>] dump_stack+0xc1/0x124
[<ffffffff812c33c2>] warn_slowpath_common+0xe2/0x150
[<ffffffff812c361e>] warn_slowpath_null+0x2e/0x40
[<ffffffff828a4347>] tcp_clean_rtx_queue+0x2057/0x2ab0 n
[<ffffffff828ae6fd>] tcp_ack+0x151d/0x3930
[<ffffffff828baa09>] tcp_rcv_state_process+0x1c69/0x4fd0
[<ffffffff828efb7f>] tcp_v4_do_rcv+0x54f/0x7c0
[<ffffffff8258aacb>] sk_backlog_rcv
[<ffffffff8258aacb>] __release_sock+0x12b/0x3a0
[<ffffffff8258ad9e>] release_sock+0x5e/0x1c0
[<ffffffff8294a785>] inet_wait_for_connect
[<ffffffff8294a785>] __inet_stream_connect+0x545/0xc50
[<ffffffff82886f08>] tcp_sendmsg_fastopen
[<ffffffff82886f08>] tcp_sendmsg+0x2298/0x35a0
[<ffffffff82952515>] inet_sendmsg+0xe5/0x520
[<ffffffff8257152f>] sock_sendmsg_nosec
[<ffffffff8257152f>] sock_sendmsg+0xcf/0x110
Fixes: 8c72c65b426b ("tcp: update skb->skb_mstamp more carefully")
Fixes: 783237e8daf1 ("net-tcp: Fast Open client - sending SYN-data")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Florian Westphal says:
====================
test_rhashtable: don't allocate huge static array
Add a test case for the rhlist interface.
While at it, cleanup current rhashtable test a bit and add a check
for max_size support.
No changes since v1, except in last patch.
kbuild robot complained about large onstack allocation caused by
struct rhltable when lockdep is enabled.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
also test rhltable. rhltable remove operations are slow as
deletions require a list walk, thus test with 1/16th of the given
entry count number to get a run duration similar to rhashtable one.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
add a test that tries to insert more than max_size elements.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
pass the entries to test as an argument instead.
Followup patch will add an rhlist test case; rhlist delete opererations
are slow so we need to use a smaller number to test it.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Florian Fainelli says:
====================
net: dsa: b53/bcm_sf2 cleanups
This patch series is a first pass set of clean-ups to reduce the number of LOCs
between b53 and bcm_sf2 and sharing as many functions as possible.
There is a number of additional cleanups queued up locally that require more
thorough testing.
Changes in v3:
- remove one extra argument for the b53_build_io_op macro (David Laight)
- added additional Reviewed-by tags from Vivien
Changes in v2:
- added Reviewed-by tags from Vivien
- added a missing EXPORT_SYMBOL() in patch 8
- fixed a typo in patch 5
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Export b53_{enable,disable}_port and use these two functions in
bcm_sf2_port_setup and bcm_sf2_port_disable. The generic functions
cannot be used without wrapping because we need to manage additional
switch integration details (PHY, Broadcom tag etc.).
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The magic number 8 in 3 locations in bcm_sf2_cfp.c actually designates
the number of switch port egress queues, so use that define instead of
open-coding it.
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
bcm_sf2 and b53 do exactly the same thing, so share that piece.
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for enabling and disabling EEE, as well as re-negotiating it in
.adjust_link() and in .port_enable().
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move the bcm_sf2 EEE-related functions to the b53 driver because this is shared
code amongst Gigabit capable switch, only 5325 and 5365 are too old to support
that.
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation for migrating the EEE code from bcm_sf2 to b53, define the full
EEE register page and offsets within that page.
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The code to enable Broadcom tags/headers is largely switch independent,
and in preparation for enabling it for multiple devices with b53, move
the code we have in bcm_sf2.c to b53_common.c
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Instead of repeating the same pattern: acquire mutex, read/write,
release mutex, define a macro: b53_build_op() which takes the type
(read|write), I/O size, and value (scalar or pointer). This helps with
fixing bugs that could exist (e.g: missing barrier, lock etc.).
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is no need to configure the enabled ports once in bcm_sf2_sw_setup() and
then a second time around when dsa_switch_ops::port_enable is called, just do
it when port_enable is called which is better in terms of power consumption and
correctness.
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is no need to configure the enabled ports once in b53_setup() and then a
second time around when dsa_switch_ops::port_enable is called, just do it when
port_enable is called which is better in terms of power consumption and
correctness.
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation for future changes allowing the configuring of multiple
CPU ports, make b53_enable_cpu_port() take a port argument.
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is not used anywhere, so remove it.
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Salil Mehta says:
====================
Bug fixes for the HNS3 Ethernet Driver for Hip08 SoC
This patch set presents some bug fixes for the HNS3 Ethernet driver identified
during internal testing & stabilization efforts.
Change Log:
Patch V2: Resolved comments from Leon Romanovsky
Patch V1: Initial Submit
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When register/unregister ae_dev, ae_dev should match all client
in the client_list. Enet and roce can co-exists together so we
should continue checking for enet and roce presence together.
So break should not be there.
Above caused problems in loading and unloading of modules.
Fixes: 38eddd126772 ("net: hns3: Add support of the HNAE3 framework")
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When there is no vlan id in the packets, hardware will treat the vlan id
as 0 and look for the mac_vlan table. This patch set the default vlan id
of PF as 0. Without this config, it will fail when look for mac_vlan
table, and hardware will drop packets.
Fixes: 6427264ef330 ("net: hns3: Add HNS3 Acceleration Engine &
Compatibility Layer Support")
Signed-off-by: Mingguang Qu <qumingguang@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch replaces the ethernet address copy instance with more
appropriate ether_addr_copy() function.
Fixes: 6427264ef330 ("net: hns3: Add HNS3 Acceleration Engine &
Compatibility Layer Support")
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fixes the initialization of MAC address, fetched from HNS3
firmware i.e. when it is not randomly generated, to the HNS3 hardware.
Fixes: ca60906d2795 ("net: hns3: Add support of HNS3 Ethernet Driver for
hip08 SoC")
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fixes the vector-to-ring map and unmap command and adds
INT_GL(for, Gap Limiting Interrupts) and VF id to it as required
by the hardware interface.
Fixes: 6427264ef330 ("net: hns3: Add HNS3 Acceleration Engine &
Compatibility Layer Support")
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Mingguang Qu <qumingguang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fixes the IMP command being used to unmap the vector
from the corresponding ring.
Fixes: 6427264ef330 ("net: hns3: Add HNS3 Acceleration Engine &
Compatibility Layer Support")
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Default phy address of every port is 0. Therefore, phy address for
each port need to be fetched from firmware and device initialized
with fetched non-default phy address.
Fixes: 6427264ef330 ("net: hns3: Add HNS3 Acceleration Engine &
Compatibility Layer Support")
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Vivien Didelot says:
====================
net: dsa: move master ethtool code
The DSA core overrides the master device's ethtool_ops structure so that
it can inject statistics and such of its dedicated switch CPU port.
This ethtool code is currently called on unnecessary conditions or
before the master interface and its switch CPU port get wired up.
This patchset fixes this.
Similarly to slave.c where the DSA slave net_device is the entry point
of the dsa_slave_* functions, this patchset also isolates the master's
ethtool code in a new master.c file, where the DSA master net_device is
the entry point of the dsa_master_* functions.
This is a first step towards better control of the master device and
support for multiple CPU ports.
====================
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
DSA overrides the master device ethtool ops, so that it can inject stats
from its dedicated switch CPU port as well.
The related code is currently split in dsa.c and slave.c, but it only
scopes the master net device. Move it to a new master.c DSA core file.
This file will be later extented with master net device specific code.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
DSA overrides the master's ethtool ops so that we can inject its CPU
port's statistics. Because of that, we need to setup the ethtool ops
after the master's dsa_ptr pointer has been assigned, not before.
This patch setups the ethtool ops after dsa_ptr is assigned, and
restores them before it gets cleared.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a DSA switch tree is meant to be applied, it already has a CPU
port. Thus remove the condition of dst->cpu_dp.
Moreover, the next lines access dst->cpu_dp unconditionally.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is no need to store a copy of the master ethtool ops, storing the
original pointer in DSA and the new one in the master netdev itself is
enough.
In the meantime, set orig_ethtool_ops to NULL when restoring the master
ethtool ops and check the presence of the master original ethtool ops as
well as its needed functions before calling them.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rather than letting the ti-cpufreq driver match against 'ti,am4372'
machine compatible during probe let's match against 'ti,am43' so that we
can support both 'ti,am4372' and 'ti,am438x' platforms which both match
to this compatible.
Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
syzkaller reported following splat [1]
Since hard irq are disabled by the caller, bpf_map_free_id()
should not try to enable/disable BH.
Another solution would be to change htab_map_delete_elem() to
defer the free_htab_elem() call after
raw_spin_unlock_irqrestore(&b->lock, flags), but this might be not
enough to cover other code paths.
[1]
WARNING: CPU: 1 PID: 8052 at kernel/softirq.c:161 __local_bh_enable_ip
+0x1e/0x160 kernel/softirq.c:161
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 8052 Comm: syz-executor1 Not tainted 4.13.0-next-20170915+
#23
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
panic+0x1e4/0x417 kernel/panic.c:181
__warn+0x1c4/0x1d9 kernel/panic.c:542
report_bug+0x211/0x2d0 lib/bug.c:183
fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
RIP: 0010:__local_bh_enable_ip+0x1e/0x160 kernel/softirq.c:161
RSP: 0018:ffff8801cdcd7748 EFLAGS: 00010046
RAX: 0000000000000082 RBX: 0000000000000201 RCX: 0000000000000000
RDX: 1ffffffff0b5933c RSI: 0000000000000201 RDI: ffffffff85ac99e0
RBP: ffff8801cdcd7758 R08: ffffffff85b87158 R09: 1ffff10039b9aec6
R10: ffff8801c99f24c0 R11: 0000000000000002 R12: ffffffff817b0b47
R13: dffffc0000000000 R14: ffff8801cdcd77e8 R15: 0000000000000001
__raw_spin_unlock_bh include/linux/spinlock_api_smp.h:176 [inline]
_raw_spin_unlock_bh+0x30/0x40 kernel/locking/spinlock.c:207
spin_unlock_bh include/linux/spinlock.h:361 [inline]
bpf_map_free_id kernel/bpf/syscall.c:197 [inline]
__bpf_map_put+0x267/0x320 kernel/bpf/syscall.c:227
bpf_map_put+0x1a/0x20 kernel/bpf/syscall.c:235
bpf_map_fd_put_ptr+0x15/0x20 kernel/bpf/map_in_map.c:96
free_htab_elem+0xc3/0x1b0 kernel/bpf/hashtab.c:658
htab_map_delete_elem+0x74d/0x970 kernel/bpf/hashtab.c:1063
map_delete_elem kernel/bpf/syscall.c:633 [inline]
SYSC_bpf kernel/bpf/syscall.c:1479 [inline]
SyS_bpf+0x2188/0x46a0 kernel/bpf/syscall.c:1451
entry_SYSCALL_64_fastpath+0x1f/0xbe
Fixes: f3f1c054c288 ("bpf: Introduce bpf_map ID")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When reading data from trace_pipe, tracing_wait_pipe() performs a
check to see if tracing has been turned off after some data was read.
Currently, this check always looks at global trace state, but it
should be checking the trace instance where trace_pipe is located at.
Because of this bug, cat instances/i1/trace_pipe in the following
script will immediately exit instead of waiting for data:
cd /sys/kernel/debug/tracing
echo 0 > tracing_on
mkdir -p instances/i1
echo 1 > instances/i1/tracing_on
echo 1 > instances/i1/events/sched/sched_process_exec/enable
cat instances/i1/trace_pipe
Link: http://lkml.kernel.org/r/20170917102348.1615-1-tahsin@google.com
Cc: stable@vger.kernel.org
Fixes: 10246fa35d4f ("tracing: give easy way to clear trace buffer")
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
skb->rbnode shares space with skb->next, skb->prev and skb->tstamp
Current uses (TCP receive ofo queue and netem) need to save/restore
tstamp, while skb->dev is either NULL (TCP) or a constant for a given
queue (netem).
Since we plan using an RB tree for TCP retransmit queue to speedup SACK
processing with large BDP, this patch exchanges skb->dev and
skb->tstamp.
This saves some overhead in both TCP and netem.
v2: removes the swtstamp field from struct tcp_skb_cb
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Wei Wang <weiwan@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Clarify that rhashtable_walk_{stop,start} will not reset the iterator to
the beginning of the hash table. Confusion between rhashtable_walk_enter
and rhashtable_walk_start has already lead to a bug.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Jiri Pirko says:
====================
mlxsw: Prepare for multicast router offload
Yotam says:
This patch-set makes various preparations needed for the multicast router
offloading, which include:
- Add the needed registers.
- Add needed ACL actions.
- Add new traps and trap groups.
- Exporting needed private structs and enums.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add three new traps needed for multicast routing:
- PIM: Trap for PIM protocol control packets.
- RPF: Trap for packets that fail the RPF check on a specific hardware
route entry.
- MULTICAST: Generic trap for multicast. It is used for routes that trap
the packets to the CPU.
The RPF and MULTICAST traps have rate limiters as these traps may have
line-rate of packets trapped. The PIM trap has a rate limiter similarly to
other L3 control protocols. The rate limiters are implemented by adding
three new trap groups for the newly introduced traps.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The mlxsw_sp_rif struct, defined as private struct in spectrum_router.c
will be used in the multicast router source file. Due to the fact that the
dev field will be needed by the multicast router logic, add an access
function to it.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Turn on two bits on the Spectrum RIF configuration:
- IPv4 multicast: when a multicast packet arrives on a RIF, send it to go
through multicast routes lookup.
- IPv4 multicast forwarding enable: when multicast packet arrives on a
RIF, allow it to be forwarded by multicast routes. If this bit is not
set, multicast packets will go through multicast routing lookup but will
be dropped at the egress of the ports.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|