Age | Commit message (Collapse) | Author |
|
Florian reported UDP xmit drops that could be root caused to the
too small neigh limit.
Current limit is 64 KB, meaning that even a single UDP socket would hit
it, since its default sk_sndbuf comes from net.core.wmem_default
(~212992 bytes on 64bit arches).
Once ARP/ND resolution is in progress, we should allow a little more
packets to be queued, at least for one producer.
Once neigh arp_queue is filled, a rogue socket should hit its sk_sndbuf
limit and either block in sendmsg() or return -EAGAIN.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since the removal of NET_DMA, dmaengine.h header file shouldn't be needed
by netdevice.h anymore.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Tariq repored local pings to linklocal address is failing:
$ ifconfig ens8
ens8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 11.141.16.6 netmask 255.255.0.0 broadcast 11.141.255.255
inet6 fe80::7efe:90ff:fecb:7502 prefixlen 64 scopeid 0x20<link>
ether 7c:fe:90:cb:75:02 txqueuelen 1000 (Ethernet)
RX packets 12 bytes 1164 (1.1 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 30 bytes 2484 (2.4 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
$ /bin/ping6 -c 3 fe80::7efe:90ff:fecb:7502%ens8
PING fe80::7efe:90ff:fecb:7502%ens8(fe80::7efe:90ff:fecb:7502) 56 data bytes
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
NSH (Network Service Header)[1] is a new protocol for service
function chaining, it can be handled as a L3 protocol like
IPv4 and IPv6, Eth + NSH + Inner packet or VxLAN-gpe + NSH +
Inner packet are two typical use cases.
This patch adds NSH header structures and helpers for NSH GSO
support and Open vSwitch NSH support.
[1] https://datatracker.ietf.org/doc/draft-ietf-sfc-nsh/
[Jiri: added nsh_hdr() helper and renamed the header struct to "struct
nshhdr" to match the usual pattern. Removed packet type defines, these are
now shared with VXLAN-GPE.]
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The values are shared between VXLAN-GPE and NSH. Originally probably by
coincidence but I notified both working groups about this last year and they
seem to keep the values in sync since then.
Hopefully they'll get a single IANA registry for the values, too. (I asked
them for that.)
Factor out the code to be shared by the NSH implementation.
NSH and MPLS values are added in this patch, too. For MPLS, the drafts
incorrectly assign only a single value, while we have two MPLS ethertypes.
I raised the problem with both groups. For now, I assume the value is for
unicast.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The NSH draft says:
An IEEE EtherType, 0x894F, has been allocated for NSH.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds the forces IFE lfb type according to IEEE registered
ethertypes. See http://standards-oui.ieee.org/ethertype/eth.txt for more
information. Since there exists the IFE subsystem it can be used there.
This patch also use the correct word "ForCES" instead of "FoRCES" which
is a spelling error inside the IEEE ethertype specification.
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adding support for updating the FW on new port mac, when port mac change
is requested by the user. This info is required by the FW as OEM
management tools require this info directly from the NIC FW.
Check device capability bit to verify the FW supports user mac.
If the FW does support it, use set_port command to notify the FW on the
new mac.
The feature is relevant only to PF port mac.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to avoid temporary large structs on the stack,
allocate them dynamically.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tal Alon <talal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A few useful tracepoints to trace bridge forwarding
database updates.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
"Late fixes for libata. There's a minor platform driver fix but the
important one is READ LOG PAGE.
This is a new ATA command which is used to test some optional features
but it broke probing of some devices - they locked up instead of
failing the unknown command.
Christoph tried blacklisting, but, after finding out there are
multiple devices which fail this way, backed off to testing feature
bit in IDENTIFY data first, which is a bit lossy (we can miss features
on some devices) but should be a lot safer"
* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
Revert "libata: quirk read log on no-name M.2 SSD"
libata: check for trusted computing in IDENTIFY DEVICE data
libata: quirk read log on no-name M.2 SSD
sata: ahci-da850: Fix some error handling paths in 'ahci_da850_probe()'
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers-next patches for 4.14
rsi driver is getting a lot of new features lately, but as usual
active development happening on iwlwifi as well as other drivers.
I pulled wireless-drivers to fix multiple conflicts in iwlwifi and to
make it easier further development.
Major changes:
ath10k
* initial UBS bus support (no full support yet)
* add tdls support for 10.4 firmware
ath9k
* add Dell Wireless 1802
wil6210
* support FW RSSI reporting
rsi
* support legacy power save, U-APSD, rf-kill and AP mode
* RTS threshold configuration
brcmfmac
* support CYW4373 SDIO/USB chipset
iwlwifi
* some more code moved to a new directory
* add new PCI ID for 7265D
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Creating as specific xdp_redirect_map variant of the xdp tracepoints
allow users to write simpler/faster BPF progs that get attached to
these tracepoints.
Goal is to still keep the tracepoints in xdp_redirect and xdp_redirect_map
similar enough, that a tool can read the top part of the TP_STRUCT and
produce similar monitor statistics.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is a need to separate the xdp_redirect tracepoint into two
tracepoints, for separating the error case from the normal forward
case.
Due to the extreme speeds XDP is operating at, loading a tracepoint
have a measurable impact. Single core XDP REDIRECT (ethtool tuned
rx-usecs 25) can do 13.7 Mpps forwarding, but loading a simple
bpf_prog at the tracepoint (with a return 0) reduce perf to 10.2 Mpps
(CPU E5-1650 v4 @ 3.60GHz, driver: ixgbe)
The overhead of loading a bpf-based tracepoint can be calculated to
cost 25 nanosec ((1/13782002-1/10267937)*10^9 = -24.83 ns).
Using perf record on the tracepoint event, with a non-matching --filter
expression, the overhead is much larger. Performance drops to 8.3 Mpps,
cost 48 nanosec ((1/13782002-1/8312497)*10^9 = -47.74))
Having a separate tracepoint for err cases, which should be less
frequent, allow running a continuous monitor for errors while not
affecting the redirect forward performance (this have also been
verified by measurements).
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Given previous patch expose the map_id, it seems natural to also
report the bpf prog id.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To make sense of the map index, the tracepoint user also need to know
that map we are talking about. Supply the map pointer but only expose
the map->id.
The 'to_index' is renamed 'to_ifindex'. In the xdp_redirect_map case,
this is the result of the devmap lookup. The map lookup key is exposed
as map_index, which is needed to troubleshoot in case the lookup failed.
The 'to_ifindex' is placed after 'err' to keep TP_STRUCT as common as
possible.
This also keeps the TP_STRUCT similar enough, that userspace can write
a monitor program, that doesn't need to care about whether
bpf_redirect or bpf_redirect_map were used.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Supplying the action argument XDP_REDIRECT to the tracepoint xdp_redirect
is redundant as it is only called in-case this action was specified.
Remove the argument, but keep "act" member of the tracepoint struct and
populate it with XDP_REDIRECT. This makes it easier to write a common bpf_prog
processing events.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 35f0b6a779b8b7a98faefd7c1c660b4dac9a5c26.
We now conditionalize issuing of READ LOG PAGE on the TRUSTED
COMPUTING SUPPORTED bit in the identity data and this shouldn't be
necessary.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
ATA-8 and later mirrors the TRUSTED COMPUTING SUPPORTED bit in word 48 of
the IDENTIFY DEVICE data. Check this before issuing a READ LOG PAGE
command to avoid issues with buggy devices. The only downside is that
we can't support Security Send / Receive for a device with an older
revision due to the conflicting use of this field in earlier
specifications.
tj: The reason we need this is because some devices which don't
support READ LOG PAGE lock up after getting issued that command.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
COMPLETION_INITIALIZER_ONSTACK()
In theory, COMPLETION_INITIALIZER_ONSTACK() should never affect the
stack allocation of the caller. However, on some compilers, a temporary
structure was allocated for the return value of
COMPLETION_INITIALIZER_ONSTACK().
For example in write_journal() with LOCKDEP_COMPLETIONS=y (GCC is 7.1.1):
io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
2462: e8 00 00 00 00 callq 2467 <write_journal+0x47>
2467: 48 8d 85 80 fd ff ff lea -0x280(%rbp),%rax
246e: 48 c7 c6 00 00 00 00 mov $0x0,%rsi
2475: 48 c7 c2 00 00 00 00 mov $0x0,%rdx
x->done = 0;
247c: c7 85 90 fd ff ff 00 movl $0x0,-0x270(%rbp)
2483: 00 00 00
init_waitqueue_head(&x->wait);
2486: 48 8d 78 18 lea 0x18(%rax),%rdi
248a: e8 00 00 00 00 callq 248f <write_journal+0x6f>
if (commit_start + commit_sections <= ic->journal_sections) {
248f: 41 8b 87 a8 00 00 00 mov 0xa8(%r15),%eax
io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
2496: 48 8d bd e8 f9 ff ff lea -0x618(%rbp),%rdi
249d: 48 8d b5 90 fd ff ff lea -0x270(%rbp),%rsi
24a4: b9 17 00 00 00 mov $0x17,%ecx
24a9: f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi)
if (commit_start + commit_sections <= ic->journal_sections) {
24ac: 41 39 c6 cmp %eax,%r14d
io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
24af: 48 8d bd 90 fd ff ff lea -0x270(%rbp),%rdi
24b6: 48 8d b5 e8 f9 ff ff lea -0x618(%rbp),%rsi
24bd: b9 17 00 00 00 mov $0x17,%ecx
24c2: f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi)
We can obviously see the temporary structure allocated, and the compiler
also does two meaningless memcpy with "rep movsq".
And according to:
https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html#Statement-Exprs
The return value of a statement expression is returned by value, so the
temporary variable is created in COMPLETION_INITIALIZER_ONSTACK(), and
that's why the temporary structures are allocted.
To fix this, make the brace block in COMPLETION_INITIALIZER_ONSTACK()
return a pointer and dereference it outside the block rather than return
the whole structure, in this way, we are able to teach the compiler not
to do the unnecessary stack allocation.
This could also reduce the stack size even if !LOCKDEP, for example in
write_journal(), compiled with gcc 7.1.1, the result of command:
objdump -d drivers/md/dm-integrity.o | ./scripts/checkstack.pl x86
before:
0x0000246a write_journal [dm-integrity.o]: 696
after:
0x00002b7a write_journal [dm-integrity.o]: 296
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: walken@google.com
Cc: willy@infradead.org
Link: http://lkml.kernel.org/r/20170823152542.5150-3-boqun.feng@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
struct call_single_data is used in IPIs to transfer information between
CPUs. Its size is bigger than sizeof(unsigned long) and less than
cache line size. Currently it is not allocated with any explicit alignment
requirements. This makes it possible for allocated call_single_data to
cross two cache lines, which results in double the number of the cache lines
that need to be transferred among CPUs.
This can be fixed by requiring call_single_data to be aligned with the
size of call_single_data. Currently the size of call_single_data is the
power of 2. If we add new fields to call_single_data, we may need to
add padding to make sure the size of new definition is the power of 2
as well.
Fortunately, this is enforced by GCC, which will report bad sizes.
To set alignment requirements of call_single_data to the size of
call_single_data, a struct definition and a typedef is used.
To test the effect of the patch, I used the vm-scalability multiple
thread swap test case (swap-w-seq-mt). The test will create multiple
threads and each thread will eat memory until all RAM and part of swap
is used, so that huge number of IPIs are triggered when unmapping
memory. In the test, the throughput of memory writing improves ~5%
compared with misaligned call_single_data, because of faster IPIs.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Huang, Ying <ying.huang@intel.com>
[ Add call_single_data_t and align with size of call_single_data. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/87bmnqd6lz.fsf@yhuang-mobile.sh.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Where XHLOCK_{SOFT,HARD} are save/restore points in the xhlocks[] to
ensure the temporal IRQ events don't interact with task state, the
XHLOCK_PROC is a fundament different beast that just happens to share
the interface.
The purpose of XHLOCK_PROC is to annotate independent execution inside
one task. For example workqueues, each work should appear to run in its
own 'pristine' 'task'.
Remove XHLOCK_PROC in favour of its own interface to avoid confusion.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boqun.feng@gmail.com
Cc: david@fromorbit.com
Cc: johannes@sipsolutions.net
Cc: kernel-team@lge.com
Cc: oleg@redhat.com
Cc: tj@kernel.org
Link: http://lkml.kernel.org/r/20170829085939.ggmb6xiohw67micb@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
For understanding how the workload maps to memory channels and hardware
behavior, it's very important to collect address maps with physical
addresses. For example, 3D XPoint access can only be found by filtering
the physical address.
Add a new sample type for physical address.
perf already has a facility to collect data virtual address. This patch
introduces a function to convert the virtual address to physical address.
The function is quite generic and can be extended to any architecture as
long as a virtual address is provided.
- For kernel direct mapping addresses, virt_to_phys is used to convert
the virtual addresses to physical address.
- For user virtual addresses, __get_user_pages_fast is used to walk the
pages tables for user physical address.
- This does not work for vmalloc addresses right now. These are not
resolved, but code to do that could be added.
The new sample type requires collecting the virtual address. The
virtual address will not be output unless SAMPLE_ADDR is applied.
For security, the physical address can only be exposed to root or
privileged user.
Tested-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: mpe@ellerman.id.au
Link: http://lkml.kernel.org/r/1503967969-48278-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
I just noticed that hw.itrace_started and hw.config are aliased to the
same location. Now, the PT driver happens to use both, which works out
fine by sheer luck:
- STORE(hw.itrace_start) is ordered before STORE(hw.config), in the
program order, although there are no compiler barriers to ensure that,
- to the perf_log_itrace_start() hw.itrace_start looks set at the same
time as when it is intended to be set because both stores happen in the
same path,
- hw.config is never reset to zero in the PT driver.
Now, the use of hw.config by the PT driver makes more sense (it being a
HW PMU) than messing around with itrace_started, which is an awkward API
to begin with.
This patch replaces hw.itrace_started with an attach_state bit and an
API call for the PMU drivers to use to communicate the condition.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20170330153956.25994-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Add support to new XRQ(eXtended shared Receive Queue)
hardware object. It supports SRQ semantics with addition
of extended receive buffers topologies and offloads.
Currently supports tag matching topology and rendezvouz offload.
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Make XRQ capabilities available via ibv_query_device() verb.
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add tm_list_size parameter to struct ib_uverbs_create_xsrq.
If SRQ type is tag-matching this field defines maximum size
of tag matching list. Otherwise, it is expected to be zero.
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
This patch adds new SRQ type - IB_SRQT_TM. The new SRQ type supports tag
matching and rendezvous offloads for MPI applications.
When SRQ receives a message it will search through the matching list
for the corresponding posted receive buffer. The process of searching
the matching list is called tag matching.
In case the tag matching results in a match, the received message will
be placed in the address specified by the receive buffer. In case no
match was found the message will be placed in a generic buffer until the
corresponding receive buffer will be posted. These messages are called
unexpected and their set is called an unexpected list.
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Before this change CQ attached to SRQ was part of XRC specific extension.
Moving CQ handle out makes it available to other types extending SRQ
functionality.
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
This patch adds following TM XRQ capabilities:
* max_rndv_hdr_size - Max size of rendezvous request message
* max_num_tags - Max number of entries in tag matching list
* max_ops - Max number of outstanding list operations
* max_sge - Max number of SGE in tag matching entry
* flags - the following flags are currently defined:
- IB_TM_CAP_RC - Support tag matching on RC transport
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
* add offload_type field to mlx5_ifc_qpc_bits
* update mlx5_ifc_xrqc_bits layout
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
When running perf on the ftrace:function tracepoint, there is a bug
which can be reproduced by:
perf record -e ftrace:function -a sleep 20 &
perf record -e ftrace:function ls
perf script
ls 10304 [005] 171.853235: ftrace:function:
perf_output_begin
ls 10304 [005] 171.853237: ftrace:function:
perf_output_begin
ls 10304 [005] 171.853239: ftrace:function:
task_tgid_nr_ns
ls 10304 [005] 171.853240: ftrace:function:
task_tgid_nr_ns
ls 10304 [005] 171.853242: ftrace:function:
__task_pid_nr_ns
ls 10304 [005] 171.853244: ftrace:function:
__task_pid_nr_ns
We can see that all the function traces are doubled.
The problem is caused by the inconsistency of the register
function perf_ftrace_event_register() with the probe function
perf_ftrace_function_call(). The former registers one probe
for every perf_event. And the latter handles all perf_events
on the current cpu. So when two perf_events on the current cpu,
the traces of them will be doubled.
So this patch adds an extra parameter "event" for perf_tp_event,
only send sample data to this event when it's not NULL.
Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: alexander.shishkin@linux.intel.com
Cc: huawei.libin@huawei.com
Link: http://lkml.kernel.org/r/1503668977-12526-1-git-send-email-zhouchengming1@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Allow a client call that failed on network error to be retried, provided
that the Tx queue still holds DATA packet 1. This allows an operation to
be submitted to another server or another address for the same server
without having to repackage and re-encrypt the data so far processed.
Two new functions are provided:
(1) rxrpc_kernel_check_call() - This is used to find out the completion
state of a call to guess whether it can be retried and whether it
should be retried.
(2) rxrpc_kernel_retry_call() - Disconnect the call from its current
connection, reset the state and submit it as a new client call to a
new address. The new address need not match the previous address.
A call may be retried even if all the data hasn't been loaded into it yet;
a partially constructed will be retained at the same point it was at when
an error condition was detected. msg_data_left() can be used to find out
how much data was packaged before the error occurred.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Add a callback to rxrpc_kernel_send_data() so that a kernel service can get
a notification that the AF_RXRPC call has transitioned out the Tx phase and
is now waiting for a reply or a final ACK.
This is called from AF_RXRPC with the call state lock held so the
notification is guaranteed to come before any reply is passed back.
Further, modify the AFS filesystem to make use of this so that we don't have
to change the afs_call state before sending the last bit of data.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Since the 'expiry' variable of 'struct key_preparsed_payload' has been
changed to 'time64_t' type, which is year 2038 safe on 32bits system.
In net/rxrpc subsystem, we need convert 'u32' type to 'time64_t' type
when copying ticket expires time to 'prep->expiry', then this patch
introduces two helper functions to help convert 'u32' to 'time64_t'
type.
This patch also uses ktime_get_real_seconds() to get current time instead
of get_seconds() which is not year 2038 safe on 32bits system.
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
git://people.freedesktop.org/~syeh/repos_linux into drm-next
vmwgfx add fence fd support.
* 'drm-vmwgfx-next' of git://people.freedesktop.org/~syeh/repos_linux:
drm/vmwgfx: Bump the version for fence FD support
drm/vmwgfx: Add export fence to file descriptor support
drm/vmwgfx: Add support for imported Fence File Descriptor
drm/vmwgfx: Prepare to support fence fd
drm/vmwgfx: Fix incorrect command header offset at restart
drm/vmwgfx: Support the NOP_ERROR command
drm/vmwgfx: Restart command buffers after errors
drm/vmwgfx: Move irq bottom half processing to threads
drm/vmwgfx: Don't use drm_irq_[un]install
|
|
git://anongit.freedesktop.org/git/drm-misc into drm-next
UAPI Changes:
- Rename u32 to __u32 in struct drm_format_modifier_blob (Lionel)
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
* tag 'drm-misc-next-fixes-2017-08-28' of git://anongit.freedesktop.org/git/drm-misc:
drm: rename u32 in __u32 in uapi
|
|
This IOCTL provides a mechanism for userspace to trigger a sync object
directly. There are other ways that userspace can trigger a syncobj
such as submitting a dummy batch somewhere or hanging on to a triggered
sync_file and doing an import. This just provides an easy way to
manually trigger the sync object without weird hacks.
The motivation for this IOCTL is Vulkan fences. Vulkan lets you create
a fence already in the signaled state so that you can wait on it
immediatly without stalling. We could also handle this with a new
create flag to ask the driver to create a syncobj that is already
signaled but the IOCTL seemed a bit cleaner and more generic.
v2:
- Take an array of sync objects (Dave Airlie)
v3:
- Throw -EINVAL if pad != 0
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
This just resets the dma_fence to NULL so it looks like it's never been
signaled. This will be useful once we add the new wait API for allowing
wait on "submit and signal" behavior.
v2:
- Take an array of sync objects (Dave Airlie)
v3:
- Throw -EINVAL if pad != 0
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Christian König <christian.koenig@amd.com> (v1)
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Make use of the ndo_vlan_rx_{add,kill}_vid callbacks to have the NCSI
stack process new VLAN tags and configure the channel VLAN filter
appropriately.
Several VLAN tags can be set and a "Set VLAN Filter" packet must be sent
for each one, meaning the ncsi_dev_state_config_svf state must be
repeated. An internal list of VLAN tags is maintained, and compared
against the current channel's ncsi_channel_filter in order to keep track
within the state. VLAN filters are removed in a similar manner, with the
introduction of the ncsi_dev_state_config_clear_vids state. The maximum
number of VLAN tag filters is determined by the "Get Capabilities"
response from the channel.
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2017-08-27
This series contains updates to i40e and i40evf only.
Sudheer updates code comments and state variable so that adminq_subtask
will have accutate information whenever it gets scheduled.
Mariusz stores information about FEC modes, to be used to printing link
states information, so that we do not need to call admin queue when
reporting link status. Adds VF support for controlling VLAN tag
stripping via ethtool.
Jake provides the majority of changes in this series, starting with
increasing the size of the prefix buffer so that it can hold enough
characters for every possible input, which prevents snprintf truncation.
Fixed other string truncation errors/warnings produced by GCC 7.x.
Removed an unnecessary workaround for resetting XPS. Fixed an issue
where there is a mismatched affinity mask value, so initialize the value
to cpu_possible_mask and invert the logic for checking incorrect CPU vs
IRQ affinity so that the exceptional case is handled at the check.
Removed ULTRA latency mode due to several issues found and will be
looking at better solution for small packet workloads.
Akeem fixes an issue where the incorrect flag was being used to set
promiscuous mode for unicast, which was enabling promiscuous mode only
for multicast instead of unicast.
Carolyn fixes an issue where an error return value is set, but this
value can be overwritten before we actually do exit the function. So
remove the error code assignment and add code comments for better
understanding on why we do not need to set and return the error.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
And finally, move the irda include files into
drivers/staging/irda/include/net/irda. Yes, it's a long path, but it
makes it easy for us to just add a Makefile directory path addition and
all of the net and drivers code "just works".
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ACPI OEM ID / OEM Table ID / Revision can be used to identify
a platform based on ACPI firmware info. acpi_blacklisted(),
intel_pstate_platform_pwr_mgmt_exists(), and some other funcs,
have been using similar check to detect a list of platforms
that require special handlings.
Move the platform check in acpi_blacklisted() to a new common
utility function, acpi_match_platform_list(), so that other
drivers do not have to implement their own version.
There is no change in functionality.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
When !NUMA, cpumask_of_node(@node) equals cpu_online_mask regardless of
@node. The assumption seems that if !NUMA, there shouldn't be more than
one node and thus reporting cpu_online_mask regardless of @node is
correct. However, that assumption was broken years ago to support
DISCONTIGMEM and whether a system has multiple nodes or not is
separately controlled by NEED_MULTIPLE_NODES.
This means that, on a system with !NUMA && NEED_MULTIPLE_NODES,
cpumask_of_node() will report cpu_online_mask for all possible nodes,
indicating that the CPUs are associated with multiple nodes which is an
impossible configuration.
This bug has been around forever but doesn't look like it has caused any
noticeable symptoms. However, it triggers a WARN recently added to
workqueue to verify NUMA affinity configuration.
Fix it by reporting empty cpumask on non-zero nodes if !NUMA.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
A destroy of an MR prior to destroying the QP can cause the following
diagnostic if the QP is referencing the MR being de-registered:
hfi1 0000:05:00.0: hfi1_0: rvt_dereg_mr timeout mr ffff8808562108
00 pd ffff880859b20b00
The solution is to when the a non-zero refcount is encountered when
the MR is destroyed the QPs needs to be iterated looking for QPs in
the same PD as the MR. If rvt_qp_mr_clean() detects any such QP
references the rkey/lkey, the QP needs to be put into an error state
via a call to rvt_qp_error() which will trigger the clean up of any
stuck references.
This solution is as specified in IBTA 1.3 Volume 1 11.2.10.5.
[This is reproduced with the 0.4.9 version of qperf and the rc_bw test]
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
There are currently 3 spots in the qib and hfi1 driver that have
knowledge of the internal QP hash list that should only be in
scope to rdmavt QP code.
Add an iterator API for processing all QPs to hide the
nature of the RCU hashlist.
The API consists of:
- rvt_qp_iter_init()
* For iterating QPs one at a time for seq_file semantics
- rvt_qp_iter_next()
* For iterating QPs one at a time for seq_file semantics
- rvt_qp_iter()
* For iterating all QPs
The first two are used for things like seq_file prints.
The last is for code that just needs to iterate all QPs
in the system.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Commit c5cff8561d2d adds rcu grace period before freeing fib6_node. This
generates a new sparse warning on rt->rt6i_node related code:
net/ipv6/route.c:1394:30: error: incompatible types in comparison
expression (different address spaces)
./include/net/ip6_fib.h:187:14: error: incompatible types in comparison
expression (different address spaces)
This commit adds "__rcu" tag for rt6i_node and makes sure corresponding
rcu API is used for it.
After this fix, sparse no longer generates the above warning.
Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node")
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq
Pull devfreq changes for v4.14 from MyungJoo Ham.
* 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq:
PM / devfreq: Fix memory leak when fail to register device
PM / devfreq: Add dependency on PM_OPP
PM / devfreq: Move private devfreq_update_stats() into devfreq
PM / devfreq: Convert to using %pOF instead of full_name
|
|
Similar to gre, vxlan, geneve, ipip tunnels, allow ERSPAN tunnels to
operate in 'collect metadata' mode. bpf_skb_[gs]et_tunnel_key() helpers
can make use of it right away. OVS can use it as well in the future.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|