Age | Commit message (Collapse) | Author |
|
CONFIG_IRQ_DOMAIN_DEBUG is similar to CONFIG_GENERIC_IRQ_DEBUGFS,
just with less information.
Spring cleanup time.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Yang Shunyong <shunyong.yang@hxt-semitech.com>
Link: https://lkml.kernel.org/r/20180117142647.23622-1-marc.zyngier@arm.com
|
|
It doesn't make sense to have an indirect call thunk with esp/rsp as
retpoline code won't work correctly with the stack pointer register.
Removing it will help compiler writers to catch error in case such
a thunk call is emitted incorrectly.
Fixes: 76b043848fd2 ("x86/retpoline: Add initial retpoline support")
Suggested-by: Jeff Law <law@redhat.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Kees Cook <keescook@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1516658974-27852-1-git-send-email-longman@redhat.com
|
|
John Fastabend says:
====================
The sockmap sample is pretty simple at the moment. All it does is open
a few sockets attach BPF programs/sockmaps and sends a few packets.
However, for testing and debugging I wanted to have more control over
the sendmsg format and data than provided by tools like iperf3/netperf,
etc. The reason is for testing BPF programs and stream parser it is
helpful to be able submit multiple sendmsg calls with different msg
layouts. For example lots of 1B iovs or a single large MB of data, etc.
Additionally, my current test setup requires an entire orchestration
layer (cilium) to run. As well as lighttpd and http traffic generators
or for kafka testing brokers and clients. This makes it a bit more
difficult when doing performance optimizations to incrementally test
small changes and come up with performance delta's and perf numbers.
By adding a few more options and an additional few tests the sockmap
sample program can show a more complete example and do some of the
above. Because the sample program is self contained it doesn't require
additional infrastructure to run either.
This series, although still fairly crude, does provide some nice
additions. They are
- a new sendmsg tests with a sender and recv threads
- a new base tests so we can get metrics/data without BPF
- multiple GBps of throughput on base and sendmsg tests
- automatically set rlimit and common variables
That said the UI is still primitive, more features could be added,
more tests might be useful, the reporting is bare bones, etc. But,
IMO lets push this now rather than sit on it for weeks until I get
time to do the above improvements. Additional patches can address
the other limitations/issues. Another thing I am considering is
moving this into selftests, after a few more fixes so we avoid
false failures, so that we get more sockmap testing.
v2: removed bogus file added by patch 3/7
v3: 1/7 replace goto out with returns, remove sighandler update,
2/7 free iov in error cases
3/7 fix bogus makefile change, bail out early on errors
v4: add Martin's "nits" and ACKs along with fixes to 2/7 iov free
also pointed out by Martin.
Thanks Daniel and Martin for the reviews!
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Avoid extra step of setting limit from cmdline and do it directly in
the program.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Put client sockets in blocking mode otherwise with sendmsg tests
its easy to overrun the socket buffers which results in the test
being aborted.
The original non-blocking was added to handle listen/accept with
a single thread the client/accepted sockets do not need to be
non-blocking.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Add a base test that does not use BPF hooks to test baseline case.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Report bytes/sec sent as well as total bytes. Useful to get rough
idea how different configurations and usage patterns perform with
sockmap.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Currently for SENDMSG tests first send completes then recv runs. This
does not work well for large data sizes and/or many iterations. So
fork the recv and send handler so that we run both send and recv. In
the future we can add a parameter to do more than a single fork of
tx/rx.
With this we can get many GBps of data which helps exercise the
sockmap code.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
When testing BPF programs using sockmap I often want to have more
control over how sendmsg is exercised. This becomes even more useful
as new sockmap program types are added.
This adds a test type option to select type of test to run. Currently,
only "ping" and "sendmsg" are supported, but more can be added as
needed.
The new help argument gives the following,
Usage: ./sockmap --cgroup <cgroup_path>
options:
--help -h
--cgroup -c
--rate -r
--verbose -v
--iov_count -i
--length -l
--test -t
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
sockmap sample program takes arguments from cmd line but it reads them
in using offsets into the array. Because we want to add more arguments
in the future lets do proper argument handling.
Also refactor code to pull apart sock init and ping/pong test. This
allows us to add new tests in the future.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
mostly revert the previous workaround and make
'dubious pointer arithmetic' test useful again.
Use (ptr - ptr) << const instead of ptr << const to generate large scalar.
The rest stays as before commit 2b36047e7889.
Fixes: 2b36047e7889 ("selftests/bpf: fix test_align")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Tejun reported the following cpu-hotplug lock (percpu-rwsem) read recursion:
tg_set_cfs_bandwidth()
get_online_cpus()
cpus_read_lock()
cfs_bandwidth_usage_inc()
static_key_slow_inc()
cpus_read_lock()
Reported-by: Tejun Heo <tj@kernel.org>
Tested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180122215328.GP3397@worktop
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
debug_show_all_locks() iterates all tasks and print held locks whole
holding tasklist_lock. This can take a while on a slow console device
and may end up triggering NMI hardlockup detector if someone else ends
up waiting for tasklist_lock.
Touch the NMI watchdog while printing the held locks to avoid
spuriously triggering the hardlockup detector.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@fb.com
Link: http://lkml.kernel.org/r/20180122220055.GB1771050@devbig577.frc2.facebook.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Both Geert and DaveJ reported that the recent futex commit:
c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")
introduced a problem with setting OWNER_DEAD. We set the bit on an
uninitialized variable and then entirely optimize it away as a
dead-store.
Move the setting of the bit to where it is more useful.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")
Link: http://lkml.kernel.org/r/20180122103947.GD2228@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
For all of these, a simple DEVICE_ATTR_*() macro should be used instead,
so convert the drivers to use them.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
It really should be DEVICE_ATTR_WO(), no need to "open code" it.
Acked-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
There's no need to have DEVICE_ATTR() in these crazy macros, so use the
proper DEVICE_ATTR_*() versions intead.
Cc: Matthieu CASTET <castet.matthieu@free.fr>
Cc: Stanislaw Gruszka <stf_xl@wp.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Instead of "open coding" a DEVICE_ATTR() define, use the
DEVICE_ATTR_WO() macro instead, which does everything properly instead.
This does require a few static functions to be renamed to work properly,
but thanks to a script from Joe Perches, this was easily done.
Reported-by: Joe Perches <joe@perches.com>
Cc: Peter Chen <Peter.Chen@nxp.com>
Cc: Valentina Manea <valentina.manea.m@gmail.com>
Acked-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Acked-by: Johan Hovold <johan@kernel.org>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Instead of "open coding" a DEVICE_ATTR() define, use the
DEVICE_ATTR_RO() macro instead, which does everything properly instead.
This does require a few static functions to be renamed to work properly,
but thanks to a script from Joe Perches, this was easily done.
Reported-by: Joe Perches <joe@perches.com>
Cc: Matthieu CASTET <castet.matthieu@free.fr>
Cc: Stanislaw Gruszka <stf_xl@wp.pl>
Cc: Oliver Neukum <oneukum@suse.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Acked-by: Pete Zaitcev <zaitcev@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Instead of "open coding" a DEVICE_ATTR() define, use the
DEVICE_ATTR_RW() macro instead, which does everything properly instead.
This does require a few static functions to be renamed to work properly,
but thanks to a script from Joe Perches, this was easily done.
Reported-by: Joe Perches <joe@perches.com>
Cc: Matthieu CASTET <castet.matthieu@free.fr>
Cc: Stanislaw Gruszka <stf_xl@wp.pl>
Cc: Peter Chen <Peter.Chen@nxp.com>
Cc: Mathias Nyman <mathias.nyman@intel.com>
Acked-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Assign true or false to boolean variables instead of an integer value.
This issue was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Jakub Kicinski says:
====================
bpf and netdevsim test updates
A number of test improvements (delayed by merges). Quentin provides
patches for checking printing to the verifier log from the drivers
and validating extack messages are propagated. There is also a test
for replacing TC filters to avoid adding back the bug Daniel recently
fixed in net and stable.
====================
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Daniel discovered recently I broke TC filter replace (and fixed
it in commit ad9294dbc227 ("bpf: fix cls_bpf on filter replace")).
Add a test to make sure it never happens again.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make netdevsim print a message to the BPF verifier log buffer when a
program is offloaded.
Then use this message in hardware offload selftests to make sure that
using this buffer actually prints the message to the console for
eBPF hardware offload.
The message is appended after the last instruction is processed with the
verifying function from netdevsim. Output looks like the following:
$ tc filter add dev foo ingress bpf obj sample_ret0.o \
sec .text verbose skip_sw
Prog section '.text' loaded (5)!
- Type: 3
- Instructions: 2 (0 over limit)
- License:
Verifier analysis:
0: (b7) r0 = 0
1: (95) exit
[netdevsim] Hello from netdevsim!
processed 2 insns, stack depth 0
"verbose" flag is required to see it in the console since netdevsim does
not throw an error after printing the message.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We should not compile netdevsim/bpf.c if BPF syscall is not
enabled. Otherwise bpf core would have to provide wrappers
for all functions offload drivers may call, even though
system will never see a BPF object.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add checks to test that netlink extack messages are correctly displayed
in some expected error cases for eBPF offload to netdevsim with TC and
XDP.
iproute2 may be built without libmnl support, in which case the extack
messages will not be reported. Try to detect this condition, and when
enountered print a mild warning to the user and skip the extack validation.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the recently added extack support for TC eBPF filters in netdevsim.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2018-01-23
This series contains updates to i40e and i40evf only.
Pawel enables FlatNVM support on x722 devices by allowing nvmupdate tool
to configure the preservation flags in the AdminQ command.
Mitch fixes a potential divide by zero error when DCB is enabled and
the firmware fails to configure the VSI, so check for this state.
Fixed a bug where the driver could fail to adhere to ETS bandwidth
allocations if 8 traffic classes were configured on the switch.
Sudheer fixes a potential deadlock by avoiding to call
flush_schedule_work() in i40evf_remove(), since cancel_work_sync()
and cancel_delayed_work_sync() already cleans up necessary work items.
Fixed an issue with the problematic detection and recovery from
hung queues in the PF which was causing lost interrupts. This is done
by triggering a software interrupt so that interrupts are forced on
and if we are already in napi_poll and an interrupt fires, napi_poll
will not be rescheduled and the interrupt is lost.
Avinash fixes an issue in the VF where is was possible to issue a
reset_task while the device is currently being removed.
Michal fixes an issue occurring while calling i40e_led_set() with
the blink parameter set to true, which was causing the activity LED
instead of the link LED to blink for port identification.
Shiraz changes the client interface to not call client close/open on
netdev down/up events, since this causes a lot of thrash that is
not needed. Instead, disable the PE TCP-ENA flag during a netdev
down event and re-enable on a netdev up event, since this blocks all
TCP traffic to the RDMA protocol engine.
Alan fixes an issue which was causing a potential transmit hang by
ignoring the PF link up message if the VF state is not yet in the
RUNNING state.
Amritha fixes the channel VSI recreation during the reset flow to
reconfigure the transmit rings and the queue context associated with
the channel VSI.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
with the introduction of commit
b0eb57cb97e7837ebb746404c2c58c6f536f23fa, it appears that rq->buf_info
is improperly handled. While it is heap allocated when an rx queue is
setup, and freed when torn down, an old line of code in
vmxnet3_rq_destroy was not properly removed, leading to rq->buf_info[0]
being set to NULL prior to its being freed, causing a memory leak, which
eventually exhausts the system on repeated create/destroy operations
(for example, when the mtu of a vmxnet3 interface is changed
frequently.
Fix is pretty straight forward, just move the NULL set to after the
free.
Tested by myself with successful results
Applies to net, and should likely be queued for stable, please
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-By: boyang@redhat.com
CC: boyang@redhat.com
CC: Shrikrishna Khare <skhare@vmware.com>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: David S. Miller <davem@davemloft.net>
Acked-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 513674b5a2c9 ("net: reevalulate autoflowlabel setting after
sysctl setting") removed the initialisation of
ipv6_pinfo::autoflowlabel and added a second flag to indicate
whether this field or the net namespace default should be used.
The getsockopt() handling for this case was not updated, so it
currently returns 0 for all sockets for which IPV6_AUTOFLOWLABEL is
not explicitly enabled. Fix it to return the effective value, whether
that has been set at the socket or net namespace level.
Fixes: 513674b5a2c9 ("net: reevalulate autoflowlabel setting after sysctl ...")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Davide Caratti says:
====================
net/sched: remove spinlock from 'csum' action
Similarly to what has been done earlier with other actions [1][2], this
series tries to improve the performance of 'csum' tc action, removing a
spinlock in the data path. Patch 1 lets act_csum use per-CPU counters;
patch 2 removes spin_{,un}lock_bh() calls from the act() method.
test procedure (using pktgen from https://github.com/netoptimizer):
# ip link add name eth1 type dummy
# ip link set dev eth1 up
# tc qdisc add dev eth1 root handle 1: prio
# for a in pass drop; do
> tc filter del dev eth1 parent 1: pref 10 matchall action csum udp
> tc filter add dev eth1 parent 1: pref 10 matchall action csum udp $a
> for n in 2 4; do
> ./pktgen_bench_xmit_mode_queue_xmit.sh -v -s 64 -t $n -n 1000000 -i eth1
> done
> done
test results:
| | before patch | after patch
$a | $n | avg. pps/thread | avg. pps/thread
-----+----+-----------------+----------------
pass | 2 | 1671463 ± 4% | 1920789 ± 3%
pass | 4 | 648797 ± 1% | 738190 ± 1%
drop | 2 | 3212692 ± 2% | 3719811 ± 2%
drop | 4 | 1078824 ± 1% | 1328099 ± 1%
references:
[1] https://www.spinics.net/lists/netdev/msg334760.html
[2] https://www.spinics.net/lists/netdev/msg465862.html
v3 changes:
- use rtnl_dereference() in place of rcu_dereference() in tcf_csum_dump()
v2 changes:
- add 'drop' test, it produces more contentions
- use RCU-protected struct to store 'action' and 'update_flags', to avoid
reading the values from subsequent configurations
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
use RCU instead of spin_{,unlock}_bh() to protect concurrent read/write on
act_csum configuration, to reduce the effects of contention in the data
path when multiple readers are present.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
use per-CPU counters, like other TC actions do, instead of maintaining one
set of stats across all cores. This allows updating act_csum stats without
the need of protecting them using spin_{,un}lock_bh() invocations.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In pppoe_sendmsg(), reserving dev->hard_header_len bytes of headroom
was probably fine before the introduction of ->needed_headroom in
commit f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom").
But now, virtual devices typically advertise the size of their overhead
in dev->needed_headroom, so we must also take it into account in
skb_reserve().
Allocation size of skb is also updated to take dev->needed_tailroom
into account and replace the arbitrary 32 bytes with the real size of
a PPPoE header.
This issue was discovered by syzbot, who connected a pppoe socket to a
gre device which had dev->header_ops->create == ipgre_header and
dev->hard_header_len == 0. Therefore, PPPoE didn't reserve any
headroom, and dev_hard_header() crashed when ipgre_header() tried to
prepend its header to skb->data.
skbuff: skb_under_panic: text:000000001d390b3a len:31 put:24
head:00000000d8ed776f data:000000008150e823 tail:0x7 end:0xc0 dev:gre0
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:104!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 3670 Comm: syzkaller801466 Not tainted
4.15.0-rc7-next-20180115+ #97
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:skb_panic+0x162/0x1f0 net/core/skbuff.c:100
RSP: 0018:ffff8801d9bd7840 EFLAGS: 00010282
RAX: 0000000000000083 RBX: ffff8801d4f083c0 RCX: 0000000000000000
RDX: 0000000000000083 RSI: 1ffff1003b37ae92 RDI: ffffed003b37aefc
RBP: ffff8801d9bd78a8 R08: 1ffff1003b37ae8a R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff86200de0
R13: ffffffff84a981ad R14: 0000000000000018 R15: ffff8801d2d34180
FS: 00000000019c4880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000208bc000 CR3: 00000001d9111001 CR4: 00000000001606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
skb_under_panic net/core/skbuff.c:114 [inline]
skb_push+0xce/0xf0 net/core/skbuff.c:1714
ipgre_header+0x6d/0x4e0 net/ipv4/ip_gre.c:879
dev_hard_header include/linux/netdevice.h:2723 [inline]
pppoe_sendmsg+0x58e/0x8b0 drivers/net/ppp/pppoe.c:890
sock_sendmsg_nosec net/socket.c:630 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:640
sock_write_iter+0x31a/0x5d0 net/socket.c:909
call_write_iter include/linux/fs.h:1775 [inline]
do_iter_readv_writev+0x525/0x7f0 fs/read_write.c:653
do_iter_write+0x154/0x540 fs/read_write.c:932
vfs_writev+0x18a/0x340 fs/read_write.c:977
do_writev+0xfc/0x2a0 fs/read_write.c:1012
SYSC_writev fs/read_write.c:1085 [inline]
SyS_writev+0x27/0x30 fs/read_write.c:1082
entry_SYSCALL_64_fastpath+0x29/0xa0
Admittedly PPPoE shouldn't be allowed to run on non Ethernet-like
interfaces, but reserving space for ->needed_headroom is a more
fundamental issue that needs to be addressed first.
Same problem exists for __pppoe_xmit(), which also needs to take
dev->needed_headroom into account in skb_cow_head().
Fixes: f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom")
Reported-by: syzbot+ed0838d0fa4c4f2b528e20286e6dc63effc7c14d@syzkaller.appspotmail.com
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It takes 1sec for bond link down notification to hit user-space
when all slaves of the bond go down. 1sec is too long for
protocol daemons in user-space relying on bond notification
to recover (eg: multichassis lag implementations in user-space).
Since the link event code already marks team device port link events
as urgent, this patch moves the code to cover all lag ports and master.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The cxl driver currently declares in its table of supported PCI
devices the class "Processing accelerators". Therefore it may be
called to probe for opencapi devices, which generates errors, as the
config space of a cxl device is not compatible with opencapi.
So remove support for the generic class, as we now have (at least) two
drivers for devices of the same class. Most cxl devices are FPGAs with
a PSL which will show a known device ID of 0x477. Other devices are
really supported by the cxlflash driver and are already listed in the
table. So removing the class is expected to go unnoticed.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
OCXL_BASE triggers the platform support needed by the driver.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Define a few trace points so that we can use the standard tracing
mechanism for debug and/or monitoring.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Some of the functions done by the generic driver should also be needed
by other opencapi drivers: attaching a context to an adapter,
translation fault handling, AFU interrupt allocation...
So to avoid code duplication, the driver provides a kernel API that
other drivers can use, similar to calling a in-kernel library.
It is still a bit theoretical, for lack of real hardware, and will
likely need adjustements down the road. But we used the cxlflash
driver as a guinea pig.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Add user APIs through ioctl to allocate, free, and be notified of an
AFU interrupt.
For opencapi, an AFU can trigger an interrupt on the host by sending a
specific command targeting a 64-bit object handle. On POWER9, this is
implemented by mapping a special page in the address space of a
process and a write to that page will trigger an interrupt.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Add an ocxl driver to handle generic opencapi devices. Of course, it's
not meant to be the only opencapi driver, any device is free to
implement its own. But if a host application only needs basic services
like attaching to an opencapi adapter, have translation faults handled
or allocate AFU interrupts, it should suffice.
The AFU config space must follow the opencapi specification and use
the expected vendor/device ID to be seen by the generic driver.
The driver exposes the device AFUs as a char device in /dev/ocxl/
Note that the driver currently doesn't handle memory attached to the
opencapi device.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
In the opencapi protocol, host memory contexts are referenced by a
'actag'. During setup, a driver must tell the device how many actags
it can used, and what values are acceptable.
On POWER9, the NPU can handle 64 actags per link, so they must be
shared between all the PCI functions of the link. To get a global
picture of how many actags are used by each AFU of every function, we
capture some data at the end of PCI enumeration, so that actags can be
shared fairly if needed.
This is not powernv specific per say, but rather a consequence of the
opencapi configuration specification being quite general. The number
of available actags on POWER9 makes it more likely to be hit. This is
somewhat mitigated by the fact that existing AFUs are coded by
requesting a reasonable count of actags and existing devices carry
only one AFU.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Implement a few platform-specific calls which can be used by drivers:
- provide the Transaction Layer capabilities of the host, so that the
driver can find some common ground and configure the device and host
appropriately.
- provide the hw interrupt to be used for translation faults raised by
the NPU
- map/unmap some NPU mmio registers to get the fault context when the
NPU raises an address translation fault
The rest are wrappers around the previously-introduced opal calls.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
Add opal calls to interact with the NPU:
OPAL_NPU_SPA_SETUP: set the Shared Process Area (SPA)
The SPA is a table containing one entry (Process Element) per memory
context which can be accessed by the opencapi device.
OPAL_NPU_SPA_CLEAR_CACHE: clear the context cache
The NPU keeps a cache of recently accessed memory contexts. When a
Process Element is removed from the SPA, the cache for the link must
be cleared.
OPAL_NPU_TL_SET: configure the Transaction Layer
The Transaction Layer specification defines several templates for
messages to be exchanged on the link. During link setup, the host and
device must negotiate what templates are supported on both sides and
at what rates those messages can be sent.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The configuration space for opencapi devices doesn't have a PCI
Express capability, therefore confusing linux in thinking it's of an
old PCI type with a 256-byte configuration space size, instead of the
desired 4k. So add a PCI fixup to declare the correct size.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
The NPU was already abstracted by opal as a virtual PHB for nvlink,
but it helps to be able to differentiate between a nvlink or opencapi
PHB, as it's not completely transparent to linux. In particular, PE
assignment differs and we'll also need the information in later
patches.
So rename existing PNV_PHB_NPU type to PNV_PHB_NPU_NVLINK and add a
new type PNV_PHB_NPU_OCAPI.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
10GbE Intel Wired LAN Driver Updates 2018-01-23
This series contains updates to ixgbe only.
Shannon Nelson provides an implementation of the ipsec hardware offload
feature for the ixgbe driver for these devices: x540, x550, 82599.
The ixgbe NICs support ipsec offload for 1024 Rx and 1024 Tx Security
Associations (SAs), using up to 128 inbound IP addresses, and using the
rfc4106(gcm(aes)) encryption. This code does not yet support checksum
offload, or TSO in conjunction with the ipsec offload - those will be
added in the future.
This code shows improvements in both packet throughput and CPU utilization.
For example, here are some quicky numbers that show the magnitude of the
performance gain on a single run of "iperf -c <dest>" with the ipsec
offload on both ends of a point-to-point connection:
9.4 Gbps - normal case
7.6 Gbps - ipsec with offload
343 Mbps - ipsec no offload
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Sebastian Reichel says:
====================
GEHC Bx50 Switch Support
This adds support for the internal switch found in GE Healthcare
B450v3, B650v3 and B850v3. All devices use a GPIO bitbanged MDIO
bus to communicate with the switch and a PCIe based network card
for exchanging network data. The cpu network data link requires,
that the switch's internal phy interface is enabled, so support
for that is added by the first patch in this series.
The patch series is based on v4.15-rc8.
Changes since PATCHv4:
* Introduce dsa_port_link_(un)register_of and mark the fixed
variant static.
* Update patch description to describe the phy<->phy connection
from i210 to the Marvell switch
Changes since PATCHv3:
* Enable the phy in dsa_port_setup() instead of abusing the
fixed link setup function
Changes since PATCHv2:
* Add phy nodes to switch in bx50.dtsi and reference them
from switch ports
* Enable cpu-port's phy based on 'phy-handle' instead of 'phy-mode'
Changes since PATCHv1:
* Use 'marvell,mv88e6085' instead of introducing compatible
string for mv88e6240.
* Fix indention of DT nodes
* Only enable 'cpu' phy, if explicitly set to "internal".
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This adds support for the Marvell switch and names the network
ports according to the labels, that can be found next to the
connectors. The switch is connected to the host system using a
PCI based network card.
The PCI bus configuration has been written using the following
information:
root@b450v3# lspci -tv
-[0000:00]---00.0-[01]----00.0 Intel Corporation I210 Gigabit Network Connection
root@b450v3# lspci -nn
00:00.0 PCI bridge [0604]: Synopsys, Inc. Device [16c3:abcd] (rev 01)
01:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This adds support for the Marvell switch and names the network
ports according to the labels, that can be found next to the
connectors. The switch is connected to the host system using a
PCI based network card.
The PCI bus configuration has been written using the following
information:
root@b650v3# lspci -tv
-[0000:00]---00.0-[01]----00.0 Intel Corporation I210 Gigabit Network Connection
root@b650v3# lspci -nn
00:00.0 PCI bridge [0604]: Synopsys, Inc. Device [16c3:abcd] (rev 01)
01:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|