Age | Commit message (Collapse) | Author |
|
It was reported that a considerable amount of cycles were spent on the
expensive indirect calls on fib6_rule_lookup. This patch introduces an
inline helper called pol_route_func that uses the indirect_call_wrappers
to avoid the indirect calls.
This patch saves around 50ns per call.
Performance was measured on the receiver by checking the amount of
syncookies that server was able to generate under a synflood load.
Traffic was generated using trafgen[1] which was pushing around 1Mpps on
a single queue. Receiver was using only one rx queue which help to
create a bottle neck and make the experiment rx-bounded.
These are the syncookies generated over 10s from the different runs:
Whithout the patch:
TcpExtSyncookiesSent 3553749 0.0
TcpExtSyncookiesSent 3550895 0.0
TcpExtSyncookiesSent 3553845 0.0
TcpExtSyncookiesSent 3541050 0.0
TcpExtSyncookiesSent 3539921 0.0
TcpExtSyncookiesSent 3557659 0.0
TcpExtSyncookiesSent 3526812 0.0
TcpExtSyncookiesSent 3536121 0.0
TcpExtSyncookiesSent 3529963 0.0
TcpExtSyncookiesSent 3536319 0.0
With the patch:
TcpExtSyncookiesSent 3611786 0.0
TcpExtSyncookiesSent 3596682 0.0
TcpExtSyncookiesSent 3606878 0.0
TcpExtSyncookiesSent 3599564 0.0
TcpExtSyncookiesSent 3601304 0.0
TcpExtSyncookiesSent 3609249 0.0
TcpExtSyncookiesSent 3617437 0.0
TcpExtSyncookiesSent 3608765 0.0
TcpExtSyncookiesSent 3620205 0.0
TcpExtSyncookiesSent 3601895 0.0
Without the patch the average is 354263 pkt/s or 2822 ns/pkt and with
the patch the average is 360738 pkt/s or 2772 ns/pkt which gives an
estimate of 50 ns per packet.
[1] http://netsniff-ng.org/
Changelog since v1:
- Change ordering in the ICW (Paolo Abeni)
Cc: Luigi Rizzo <lrizzo@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are many places where 2 annotations are not enough. This patch
adds INDIRECT_CALL_3 and INDIRECT_CALL_4 to cover such cases.
Signed-off-by: Brian Vazquez <brianvv@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, if the clang-bpf-co-re feature is not available, the build
fails with e.g.
CC prog.o
prog.c:1462:10: fatal error: profiler.skel.h: No such file or directory
1462 | #include "profiler.skel.h"
| ^~~~~~~~~~~~~~~~~
This is due to the fact that the BPFTOOL_WITHOUT_SKELETONS macro is not
defined, despite BUILD_BPF_SKELS not being set. Fix this by correctly
evaluating $(BUILD_BPF_SKELS) when deciding on whether to add
-DBPFTOOL_WITHOUT_SKELETONS to CFLAGS.
Fixes: 05aca6da3b5a ("tools/bpftool: Generalize BPF skeleton support and generate vmlinux.h")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200623103710.10370-1-tklauser@distanz.ch
|
|
Extend original variable-length tests with a case to catch a common
existing pattern of testing for < 0 for errors. Note because
verifier also tracks upper bounds and we know it can not be greater
than MAX_LEN here we can skip upper bound check.
In ALU64 enabled compilation converting from long->int return types
in probe helpers results in extra instruction pattern, <<= 32, s >>= 32.
The trade-off is the non-ALU64 case works. If you really care about
every extra insn (XDP case?) then you probably should be using original
int type.
In addition adding a sext insn to bpf might help the verifier in the
general case to avoid these types of tricks.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200623032224.4020118-3-andriin@fb.com
|
|
Add selftest that validates variable-length data reading and concatentation
with one big shared data array. This is a common pattern in production use for
monitoring and tracing applications, that potentially can read a lot of data,
but overall read much less. Such pattern allows to determine precisely what
amount of data needs to be sent over perfbuf/ringbuf and maximize efficiency.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200623032224.4020118-2-andriin@fb.com
|
|
Switch most of BPF helper definitions from returning int to long. These
definitions are coming from comments in BPF UAPI header and are used to
generate bpf_helper_defs.h (under libbpf) to be later included and used from
BPF programs.
In actual in-kernel implementation, all the helpers are defined as returning
u64, but due to some historical reasons, most of them are actually defined as
returning int in UAPI (usually, to return 0 on success, and negative value on
error).
This actually causes Clang to quite often generate sub-optimal code, because
compiler believes that return value is 32-bit, and in a lot of cases has to be
up-converted (usually with a pair of 32-bit bit shifts) to 64-bit values,
before they can be used further in BPF code.
Besides just "polluting" the code, these 32-bit shifts quite often cause
problems for cases in which return value matters. This is especially the case
for the family of bpf_probe_read_str() functions. There are few other similar
helpers (e.g., bpf_read_branch_records()), in which return value is used by
BPF program logic to record variable-length data and process it. For such
cases, BPF program logic carefully manages offsets within some array or map to
read variable-length data. For such uses, it's crucial for BPF verifier to
track possible range of register values to prove that all the accesses happen
within given memory bounds. Those extraneous zero-extending bit shifts,
inserted by Clang (and quite often interleaved with other code, which makes
the issues even more challenging and sometimes requires employing extra
per-variable compiler barriers), throws off verifier logic and makes it mark
registers as having unknown variable offset. We'll study this pattern a bit
later below.
Another common pattern is to check return of BPF helper for non-zero state to
detect error conditions and attempt alternative actions in such case. Even in
this simple and straightforward case, this 32-bit vs BPF's native 64-bit mode
quite often leads to sub-optimal and unnecessary extra code. We'll look at
this pattern as well.
Clang's BPF target supports two modes of code generation: ALU32, in which it
is capable of using lower 32-bit parts of registers, and no-ALU32, in which
only full 64-bit registers are being used. ALU32 mode somewhat mitigates the
above described problems, but not in all cases.
This patch switches all the cases in which BPF helpers return 0 or negative
error from returning int to returning long. It is shown below that such change
in definition leads to equivalent or better code. No-ALU32 mode benefits more,
but ALU32 mode doesn't degrade or still gets improved code generation.
Another class of cases switched from int to long are bpf_probe_read_str()-like
helpers, which encode successful case as non-negative values, while still
returning negative value for errors.
In all of such cases, correctness is preserved due to two's complement
encoding of negative values and the fact that all helpers return values with
32-bit absolute value. Two's complement ensures that for negative values
higher 32 bits are all ones and when truncated, leave valid negative 32-bit
value with the same value. Non-negative values have upper 32 bits set to zero
and similarly preserve value when high 32 bits are truncated. This means that
just casting to int/u32 is correct and efficient (and in ALU32 mode doesn't
require any extra shifts).
To minimize the chances of regressions, two code patterns were investigated,
as mentioned above. For both patterns, BPF assembly was analyzed in
ALU32/NO-ALU32 compiler modes, both with current 32-bit int return type and
new 64-bit long return type.
Case 1. Variable-length data reading and concatenation. This is quite
ubiquitous pattern in tracing/monitoring applications, reading data like
process's environment variables, file path, etc. In such case, many pieces of
string-like variable-length data are read into a single big buffer, and at the
end of the process, only a part of array containing actual data is sent to
user-space for further processing. This case is tested in test_varlen.c
selftest (in the next patch). Code flow is roughly as follows:
void *payload = &sample->payload;
u64 len;
len = bpf_probe_read_kernel_str(payload, MAX_SZ1, &source_data1);
if (len <= MAX_SZ1) {
payload += len;
sample->len1 = len;
}
len = bpf_probe_read_kernel_str(payload, MAX_SZ2, &source_data2);
if (len <= MAX_SZ2) {
payload += len;
sample->len2 = len;
}
/* and so on */
sample->total_len = payload - &sample->payload;
/* send over, e.g., perf buffer */
There could be two variations with slightly different code generated: when len
is 64-bit integer and when it is 32-bit integer. Both variations were analysed.
BPF assembly instructions between two successive invocations of
bpf_probe_read_kernel_str() were used to check code regressions. Results are
below, followed by short analysis. Left side is using helpers with int return
type, the right one is after the switch to long.
ALU32 + INT ALU32 + LONG
=========== ============
64-BIT (13 insns): 64-BIT (10 insns):
------------------------------------ ------------------------------------
17: call 115 17: call 115
18: if w0 > 256 goto +9 <LBB0_4> 18: if r0 > 256 goto +6 <LBB0_4>
19: w1 = w0 19: r1 = 0 ll
20: r1 <<= 32 21: *(u64 *)(r1 + 0) = r0
21: r1 s>>= 32 22: r6 = 0 ll
22: r2 = 0 ll 24: r6 += r0
24: *(u64 *)(r2 + 0) = r1 00000000000000c8 <LBB0_4>:
25: r6 = 0 ll 25: r1 = r6
27: r6 += r1 26: w2 = 256
00000000000000e0 <LBB0_4>: 27: r3 = 0 ll
28: r1 = r6 29: call 115
29: w2 = 256
30: r3 = 0 ll
32: call 115
32-BIT (11 insns): 32-BIT (12 insns):
------------------------------------ ------------------------------------
17: call 115 17: call 115
18: if w0 > 256 goto +7 <LBB1_4> 18: if w0 > 256 goto +8 <LBB1_4>
19: r1 = 0 ll 19: r1 = 0 ll
21: *(u32 *)(r1 + 0) = r0 21: *(u32 *)(r1 + 0) = r0
22: w1 = w0 22: r0 <<= 32
23: r6 = 0 ll 23: r0 >>= 32
25: r6 += r1 24: r6 = 0 ll
00000000000000d0 <LBB1_4>: 26: r6 += r0
26: r1 = r6 00000000000000d8 <LBB1_4>:
27: w2 = 256 27: r1 = r6
28: r3 = 0 ll 28: w2 = 256
30: call 115 29: r3 = 0 ll
31: call 115
In ALU32 mode, the variant using 64-bit length variable clearly wins and
avoids unnecessary zero-extension bit shifts. In practice, this is even more
important and good, because BPF code won't need to do extra checks to "prove"
that payload/len are within good bounds.
32-bit len is one instruction longer. Clang decided to do 64-to-32 casting
with two bit shifts, instead of equivalent `w1 = w0` assignment. The former
uses extra register. The latter might potentially lose some range information,
but not for 32-bit value. So in this case, verifier infers that r0 is [0, 256]
after check at 18:, and shifting 32 bits left/right keeps that range intact.
We should probably look into Clang's logic and see why it chooses bitshifts
over sub-register assignments for this.
NO-ALU32 + INT NO-ALU32 + LONG
============== ===============
64-BIT (14 insns): 64-BIT (10 insns):
------------------------------------ ------------------------------------
17: call 115 17: call 115
18: r0 <<= 32 18: if r0 > 256 goto +6 <LBB0_4>
19: r1 = r0 19: r1 = 0 ll
20: r1 >>= 32 21: *(u64 *)(r1 + 0) = r0
21: if r1 > 256 goto +7 <LBB0_4> 22: r6 = 0 ll
22: r0 s>>= 32 24: r6 += r0
23: r1 = 0 ll 00000000000000c8 <LBB0_4>:
25: *(u64 *)(r1 + 0) = r0 25: r1 = r6
26: r6 = 0 ll 26: r2 = 256
28: r6 += r0 27: r3 = 0 ll
00000000000000e8 <LBB0_4>: 29: call 115
29: r1 = r6
30: r2 = 256
31: r3 = 0 ll
33: call 115
32-BIT (13 insns): 32-BIT (13 insns):
------------------------------------ ------------------------------------
17: call 115 17: call 115
18: r1 = r0 18: r1 = r0
19: r1 <<= 32 19: r1 <<= 32
20: r1 >>= 32 20: r1 >>= 32
21: if r1 > 256 goto +6 <LBB1_4> 21: if r1 > 256 goto +6 <LBB1_4>
22: r2 = 0 ll 22: r2 = 0 ll
24: *(u32 *)(r2 + 0) = r0 24: *(u32 *)(r2 + 0) = r0
25: r6 = 0 ll 25: r6 = 0 ll
27: r6 += r1 27: r6 += r1
00000000000000e0 <LBB1_4>: 00000000000000e0 <LBB1_4>:
28: r1 = r6 28: r1 = r6
29: r2 = 256 29: r2 = 256
30: r3 = 0 ll 30: r3 = 0 ll
32: call 115 32: call 115
In NO-ALU32 mode, for the case of 64-bit len variable, Clang generates much
superior code, as expected, eliminating unnecessary bit shifts. For 32-bit
len, code is identical.
So overall, only ALU-32 32-bit len case is more-or-less equivalent and the
difference stems from internal Clang decision, rather than compiler lacking
enough information about types.
Case 2. Let's look at the simpler case of checking return result of BPF helper
for errors. The code is very simple:
long bla;
if (bpf_probe_read_kenerl(&bla, sizeof(bla), 0))
return 1;
else
return 0;
ALU32 + CHECK (9 insns) ALU32 + CHECK (9 insns)
==================================== ====================================
0: r1 = r10 0: r1 = r10
1: r1 += -8 1: r1 += -8
2: w2 = 8 2: w2 = 8
3: r3 = 0 3: r3 = 0
4: call 113 4: call 113
5: w1 = w0 5: r1 = r0
6: w0 = 1 6: w0 = 1
7: if w1 != 0 goto +1 <LBB2_2> 7: if r1 != 0 goto +1 <LBB2_2>
8: w0 = 0 8: w0 = 0
0000000000000048 <LBB2_2>: 0000000000000048 <LBB2_2>:
9: exit 9: exit
Almost identical code, the only difference is the use of full register
assignment (r1 = r0) vs half-registers (w1 = w0) in instruction #5. On 32-bit
architectures, new BPF assembly might be slightly less optimal, in theory. But
one can argue that's not a big issue, given that use of full registers is
still prevalent (e.g., for parameter passing).
NO-ALU32 + CHECK (11 insns) NO-ALU32 + CHECK (9 insns)
==================================== ====================================
0: r1 = r10 0: r1 = r10
1: r1 += -8 1: r1 += -8
2: r2 = 8 2: r2 = 8
3: r3 = 0 3: r3 = 0
4: call 113 4: call 113
5: r1 = r0 5: r1 = r0
6: r1 <<= 32 6: r0 = 1
7: r1 >>= 32 7: if r1 != 0 goto +1 <LBB2_2>
8: r0 = 1 8: r0 = 0
9: if r1 != 0 goto +1 <LBB2_2> 0000000000000048 <LBB2_2>:
10: r0 = 0 9: exit
0000000000000058 <LBB2_2>:
11: exit
NO-ALU32 is a clear improvement, getting rid of unnecessary zero-extension bit
shifts.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200623032224.4020118-1-andriin@fb.com
|
|
This patch fixes a spelling typo in spectrum_dcb.c
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Alexander Lobakin says:
====================
net: qed/qede: various stability fixes
This set addresses several near-critical issues that were observed
and reproduced on different test and production configurations.
v2:
- don't split the "Fixes:" tag across several lines in patch 9;
- no functional changes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Variable 'abs_ppfid' in qed_dev.c:qed_llh_add_mac_filter() always gets
printed, but is initialized only under 'ref_cnt == 1' condition. This
results in:
In file included from ./include/linux/kernel.h:15:0,
from ./include/asm-generic/bug.h:19,
from ./arch/x86/include/asm/bug.h:86,
from ./include/linux/bug.h:5,
from ./include/linux/io.h:11,
from drivers/net/ethernet/qlogic/qed/qed_dev.c:35:
drivers/net/ethernet/qlogic/qed/qed_dev.c: In function 'qed_llh_add_mac_filter':
./include/linux/printk.h:358:2: warning: 'abs_ppfid' may be used uninitialized
in this function [-Wmaybe-uninitialized]
printk(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~
drivers/net/ethernet/qlogic/qed/qed_dev.c:983:17: note: 'abs_ppfid' was declared
here
u8 filter_idx, abs_ppfid;
^~~~~~~~~
...under W=1+.
Fix this by initializing it with zero.
Fixes: 79284adeb99e ("qed: Add llh ppfid interface and 100g support for offload protocols")
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Sizes of all ILT blocks must be reset before ILT recomputing when
disabling clients, or memory allocation may exceed ILT shadow array
and provoke system crashes.
Fixes: 1408cc1fa48c ("qed: Introduce VFs")
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Set edev->cdev pointer to NULL after calling remove() callback to avoid
using of already freed object.
Fixes: ccc67ef50b90 ("qede: Error recovery process")
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently PTP cyclecounter and timecounter are initialized only on
the first probing and are cleaned up during removal. This means that
PTP becomes non-functional after device recovery.
Fix this by unconditional PTP initialization on probing and clearing
Tx pending bit on exiting.
Fixes: ccc67ef50b90 ("qede: Error recovery process")
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is likely a copy'n'paste mistake. The amount of ILT lines to
reserve for a single VF was being multiplied by the total VFs count.
This led to a huge redundancy in reservation and potential lines
drainouts.
Fixes: 1408cc1fa48c ("qed: Introduce VFs")
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
25ms sleep cycles in waiting for PF response are excessive and may lead
to different timeout failures.
Start to wait with short udelays, and in most cases polling will end
here. If the time was not sufficient, switch to msleeps.
usleep_range() may go far beyond 100us depending on platform and tick
configuration, hence atomic udelays for consistency.
Also add explicit DMA barriers since 'done' always comes from a shared
request-response DMA pool, and note that in the comment nearby.
Fixes: 1408cc1fa48c ("qed: Introduce VFs")
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Set rdma_wq pointer to NULL after destroying the workqueue and check
for it when adding new events to fix crashes on driver unload.
Fixes: cee9fbd8e2e9 ("qede: Add qedr framework")
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
qed_spq_unregister_async_cb() should be called before
qed_rdma_info_free() to avoid crash-spawning uses-after-free.
Instead of calling it from each subsystem exit code, do it in one place
on PF down.
Fixes: 291d57f67d24 ("qed: Fix rdma_info structure allocation")
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
qed_chain_get_element_left{,_u32} returned 0 when the difference
between producer and consumer page count was equal to the total
page count.
Fix this by conditional expanding of producer value (vs
unconditional). This allowed to eliminate normalizaton against
total page count, which was the cause of this bug.
Misc: replace open-coded constants with common defines.
Fixes: a91eb52abb50 ("qed: Revisit chain implementation")
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit e585f2363637 ("udp: Changes to udp_offload to support remote
checksum offload") added new GSO type and a corresponding netdev
feature, but missed Ethtool's 'netdev_features_strings' table.
Give it a name so it will be exposed to userspace and become available
for manual configuration.
v3:
- decouple from "netdev_features_strings[] cleanup" series;
- no functional changes.
v2:
- don't split the "Fixes:" tag across lines;
- no functional changes.
Fixes: e585f2363637 ("udp: Changes to udp_offload to support remote checksum offload")
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Jason A. Donenfeld says:
====================
wireguard fixes for 5.8-rc3
This series contains two fixes, one cosmetic and one quite important:
1) Avoid the `if ((x = f()) == y)` pattern, from Frank
Werner-Krippendorf.
2) Mitigate a potential memory leak by creating circular netns
references, while also making the netns semantics a bit more
robust.
Patch (2) has a "Fixes:" line and should be backported to stable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Before, we took a reference to the creating netns if the new netns was
different. This caused issues with circular references, with two
wireguard interfaces swapping namespaces. The solution is to rather not
take any extra references at all, but instead simply invalidate the
creating netns pointer when that netns is deleted.
In order to prevent this from happening again, this commit improves the
rough object leak tracking by allowing it to account for created and
destroyed interfaces, aside from just peers and keys. That then makes it
possible to check for the object leak when having two interfaces take a
reference to each others' namespaces.
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes an error condition reported by checkpatch.pl which caused by
assigning a variable in an if condition in wg_noise_handshake_consume_
initiation().
Signed-off-by: Frank Werner-Krippendorf <mail@hb9fxq.ch>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Horatiu Vultur says:
====================
bridge: mrp: Update MRP_PORT_ROLE
This patch series does the following:
- fixes the enum br_mrp_port_role_type. It removes the port role none(0x2)
because this is in conflict with the standard. The standard defines the
interconnect port role as value 0x2.
- adds checks regarding current defined port roles: primary(0x0) and
secondary(0x1).
v2:
- add the validation code when setting the port role.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds specific checks for primary(0x0) and secondary(0x1) when
setting the port role. For any other value the function
'br_mrp_set_port_role' will return -EINVAL.
Fixes: 20f6a05ef63594 ("bridge: mrp: Rework the MRP netlink interface")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently the MRP_PORT_ROLE_NONE has the value 0x2 but this is in conflict
with the IEC 62439-2 standard. The standard defines the following port
roles: primary (0x0), secondary(0x1), interconnect(0x2).
Therefore remove the port role none.
Fixes: 4714d13791f831 ("bridge: uapi: mrp: Add mrp attributes.")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Keepalived can set global static ip routes or virtual ip routes dynamically
following VRRP protocol states. Using a dedicated rtm_protocol will help
keeping track of it.
Changes in v2:
- fix tab/space indenting
Signed-off-by: Alexandre Cassen <acassen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull kvm fixes from Paolo Bonzini:
"All bugfixes except for a couple cleanup patches"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: VMX: Remove vcpu_vmx's defunct copy of host_pkru
KVM: x86: allow TSC to differ by NTP correction bounds without TSC scaling
KVM: X86: Fix MSR range of APIC registers in X2APIC mode
KVM: VMX: Stop context switching MSR_IA32_UMWAIT_CONTROL
KVM: nVMX: Plumb L2 GPA through to PML emulation
KVM: x86/mmu: Avoid mixing gpa_t with gfn_t in walk_addr_generic()
KVM: LAPIC: ensure APIC map is up to date on concurrent update requests
kvm: lapic: fix broken vcpu hotplug
Revert "KVM: VMX: Micro-optimize vmexit time when not exposing PMU"
KVM: VMX: Add helpers to identify interrupt type from intr_info
kvm/svm: disable KCSAN for svm_vcpu_run()
KVM: MIPS: Fix a build error for !CPU_LOONGSON64
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A number of fixes, located in two areas, one performance fix and one
fixup for better integration with another patchset.
- bug fixes in nowait aio:
- fix snapshot creation hang after nowait-aio was used
- fix failure to write to prealloc extent past EOF
- don't block when extent range is locked
- block group fixes:
- relocation failure when scrub runs in parallel
- refcount fix when removing fails
- fix race between removal and creation
- space accounting fixes
- reinstante fast path check for log tree at unlink time, fixes
performance drop up to 30% in REAIM
- kzfree/kfree fixup to ease treewide patchset renaming kzfree"
* tag 'for-5.8-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: use kfree() in btrfs_ioctl_get_subvol_info()
btrfs: fix RWF_NOWAIT writes blocking on extent locks and waiting for IO
btrfs: fix RWF_NOWAIT write not failling when we need to cow
btrfs: fix failure of RWF_NOWAIT write into prealloc extent beyond eof
btrfs: fix hang on snapshot creation after RWF_NOWAIT write
btrfs: check if a log root exists before locking the log_mutex on unlink
btrfs: fix bytes_may_use underflow when running balance and scrub in parallel
btrfs: fix data block group relocation failure due to concurrent scrub
btrfs: fix race between block group removal and block group creation
btrfs: fix a block group ref counter leak after failure to remove block group
|
|
Currently the ring buffer makes events that happen in interrupts that preempt
another event have a delta of zero. (Hopefully we can change this soon). But
this is to deal with the races of updating a global counter with lockless
and nesting functions updating deltas.
With the addition of absolute time stamps, the time extend didn't follow
this rule. A time extend can happen if two events happen longer than 2^27
nanoseconds appart, as the delta time field in each event is only 27 bits.
If that happens, then a time extend is injected with 2^59 bits of
nanoseconds to use (18 years). But if the 2^27 nanoseconds happen between
two events, and as it is writing the event, an interrupt triggers, it will
see the 2^27 difference as well and inject a time extend of its own. But a
recent change made the time extend logic not take into account the nesting,
and this can cause two time extend deltas to happen moving the time stamp
much further ahead than the current time. This gets all reset when the ring
buffer moves to the next page, but that can cause time to appear to go
backwards.
This was observed in a trace-cmd recording, and since the data is saved in a
file, with trace-cmd report --debug, it was possible to see that this indeed
did happen!
bash-52501 110d... 81778.908247: sched_switch: bash:52501 [120] S ==> swapper/110:0 [120] [12770284:0x2e8:64]
<idle>-0 110d... 81778.908757: sched_switch: swapper/110:0 [120] R ==> bash:52501 [120] [509947:0x32c:64]
TIME EXTEND: delta:306454770 length:0
bash-52501 110.... 81779.215212: sched_swap_numa: src_pid=52501 src_tgid=52388 src_ngid=52501 src_cpu=110 src_nid=2 dst_pid=52509 dst_tgid=52388 dst_ngid=52501 dst_cpu=49 dst_nid=1 [0:0x378:48]
TIME EXTEND: delta:306458165 length:0
bash-52501 110dNh. 81779.521670: sched_wakeup: migration/110:565 [0] success=1 CPU:110 [0:0x3b4:40]
and at the next page, caused the time to go backwards:
bash-52504 110d... 81779.685411: sched_switch: bash:52504 [120] S ==> swapper/110:0 [120] [8347057:0xfb4:64]
CPU:110 [SUBBUFFER START] [81779379165886:0x1320000]
<idle>-0 110dN.. 81779.379166: sched_wakeup: bash:52504 [120] success=1 CPU:110 [0:0x10:40]
<idle>-0 110d... 81779.379167: sched_switch: swapper/110:0 [120] R ==> bash:52504 [120] [1168:0x3c:64]
Link: https://lkml.kernel.org/r/20200622151815.345d1bf5@oasis.local.home
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: stable@vger.kernel.org
Fixes: dc4e2801d400b ("ring-buffer: Redefine the unimplemented RINGBUF_TYPE_TIME_STAMP")
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Introduce new resource dump segments - PRM_QUERY_QP,
PRM_QUERY_CQ and PRM_QUERY_MKEY. These segments contains the resource
dump in PRM query format.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
|
Export some of the resource dump API. mlx5_ib driver will use
it in downstream patches.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
|
We've found Samsung USBC Headset (AKG) (VID: 0x04e8, PID: 0xa051)
need a tiny delay after each class compliant request.
Otherwise the device might not be able to be recognized each times.
Signed-off-by: Chihhao Chen <chihhao.chen@mediatek.com>
Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1592910203-24035-1-git-send-email-macpaul.lin@mediatek.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
When specifying insanely large debug buffers a kernel warning is
printed. The debug code does handle the error gracefully, though.
Instead of duplicating the check let us silence the warning to
avoid crashes when panic_on_warn is used.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
|
Currently if early_pgm_check_handler is called it ends up in pgm check
loop. The problem is that early_pgm_check_handler is instrumented by
KASAN but executed without DAT flag enabled which leads to addressing
exception when KASAN checks try to access shadow memory.
Fix that by executing early handlers with DAT flag on under KASAN as
expected.
Reported-and-tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
|
When single stepping an svc instruction on s390, the kernel is entered
with a PER program check interruption. The program check handler than
jumps to the system call handler by reloading the PSW. The code didn't
set GPR13 to the thread pointer in struct task_struct. This made the
kernel access invalid memory while trying to fetch the syscall function
address. Fix this by always assigned GPR13 after .Lsysc_per.
Fixes: 0b0ed657fe00 ("s390: remove critical section cleanup from entry.S")
Reported-and-tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
|
Similar to the Kingston HyperX AMP, the Kingston HyperX Cloud
Alpha S (0951:0x16ea) uses two interfaces, but only the second
interface contains the capture stream. This patch delays the
registration until the second interface appears.
Signed-off-by: Christoffer Nielsen <cn@obviux.dk>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/CAOtG2YHOM3zy+ed9KS-J4HkZo_QGzcUG9MigSp4e4_-13r6B=Q@mail.gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Remove vcpu_vmx.host_pkru, which got left behind when PKRU support was
moved to common x86 code.
No functional change intended.
Fixes: 37486135d3a7b ("KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200617034123.25647-1-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The Linux TSC calibration procedure is subject to small variations
(its common to see +-1 kHz difference between reboots on a given CPU, for example).
So migrating a guest between two hosts with identical processor can fail, in case
of a small variation in calibrated TSC between them.
Without TSC scaling, the current kernel interface will either return an error
(if user_tsc_khz <= tsc_khz) or enable TSC catchup mode.
This change enables the following TSC tolerance check to
accept KVM_SET_TSC_KHZ within tsc_tolerance_ppm (which is 250ppm by default).
/*
* Compute the variation in TSC rate which is acceptable
* within the range of tolerance and decide if the
* rate being applied is within that bounds of the hardware
* rate. If so, no scaling or compensation need be done.
*/
thresh_lo = adjust_tsc_khz(tsc_khz, -tsc_tolerance_ppm);
thresh_hi = adjust_tsc_khz(tsc_khz, tsc_tolerance_ppm);
if (user_tsc_khz < thresh_lo || user_tsc_khz > thresh_hi) {
pr_debug("kvm: requested TSC rate %u falls outside tolerance [%u,%u]\n", user_tsc_khz, thresh_lo, thresh_hi);
use_scaling = 1;
}
NTP daemon in the guest can correct this difference (NTP can correct upto 500ppm).
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Message-Id: <20200616114741.GA298183@fuller.cnet>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Only MSR address range 0x800 through 0x8ff is architecturally reserved
and dedicated for accessing APIC registers in x2APIC mode.
Fixes: 0105d1a52640 ("KVM: x2apic interface to lapic")
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20200616073307.16440-1-xiaoyao.li@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Fix a typo in gue.h
Signed-off-by: Aiden Leong <aiden.leong@aibsd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Igor Russkikh says:
====================
net: atlantic: additional A2 features
This patchset adds more features to A2:
* half duplex rates;
* EEE;
* flow control;
* link partner capabilities reporting;
* phy loopback.
Feature-wise A2 is almost on-par with A1 save for WoL and filtering, which
will be submitted as separate follow-up patchset(s).
====================
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds the phy loopback support on A2.
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds link partner capabilities reporting support on A2.
In particular, the following capabilities are available for reporting:
* link rate;
* EEE;
* flow control.
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds flow control support on A2.
Co-developed-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: Dmitry Bogdanov <dbogdanov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds EEE support on A2.
Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Co-developed-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch removes 2.5G baseX wrong usage/reporting, since it shouldn't have
been mixed with baseT.
Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Signed-off-by: Mark Starovoytov <mstarovoitov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds support for 10M/100M/1G half duplex rates, which are
supported by A2 in additional to full duplex rates supported by A1.
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In RFC 8684, we don't need to send sndr_key in SYN package anymore, so drop
it.
Fixes: cc7972ea1932 ("mptcp: parse and emit MP_CAPABLE option according to v1 spec")
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
arg cannot be NULL since its already being dereferenced
before. Remove the redundant NULL check.
Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix check in ethtool_rx_flow_rule_create
Fixes: eca4205f9ec3 ("ethtool: add ethtool_rx_flow_spec to flow_rule structure translator")
Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|