Age | Commit message (Collapse) | Author |
|
Check for whether PTP is enabled or not at the caller and perform
locking/unlocking at the caller.
Fixes following sparse warning:
sge.c:1641:26: warning: context imbalance in 'cxgb4_eth_xmit' -
different lock contexts for basic block
Fixes: a456950445a0 ("cxgb4: time stamping interface for PTP")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move code handling L2T ARP failures to the only caller.
Fixes following sparse warning:
skbuff.h:2091:29: warning: context imbalance in
'handle_failed_resolution' - unexpected unlock
Fixes: 749cb5fe48bb ("cxgb4: Replace arpq_head/arpq_tail with SKB double link-list code")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ido Schimmel says:
====================
mlxsw: Bump firmware version to XX.2007.1168
Petr says:
In patch #1, bump the firmware version required by the driver to
XX.2007.1168. This version fixes several issues observed in the
offloaded datapath.
In patch #2, add support for requiring FW version on Spectrum-3 (so far
only Spectrum-1 and Spectrum-2 have had this requirement). Demand the
same version as mentioned above.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In a fashion similar to the other Spectrum systems, enforce a specific
firmware version for Spectrum-3 so that the driver and firmware are
always in sync with regards to new features.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This version comes with fixes to the following problems, among others:
- Wrong shaper configuration on Spectrum-1
- Bogus temperature reading on Spectrum-2
- Problems in setting egress buffer size after MTU change on Spectrum-2
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It was reported that a considerable amount of cycles were spent on the
expensive indirect calls on fib6_rule_lookup. This patch introduces an
inline helper called pol_route_func that uses the indirect_call_wrappers
to avoid the indirect calls.
This patch saves around 50ns per call.
Performance was measured on the receiver by checking the amount of
syncookies that server was able to generate under a synflood load.
Traffic was generated using trafgen[1] which was pushing around 1Mpps on
a single queue. Receiver was using only one rx queue which help to
create a bottle neck and make the experiment rx-bounded.
These are the syncookies generated over 10s from the different runs:
Whithout the patch:
TcpExtSyncookiesSent 3553749 0.0
TcpExtSyncookiesSent 3550895 0.0
TcpExtSyncookiesSent 3553845 0.0
TcpExtSyncookiesSent 3541050 0.0
TcpExtSyncookiesSent 3539921 0.0
TcpExtSyncookiesSent 3557659 0.0
TcpExtSyncookiesSent 3526812 0.0
TcpExtSyncookiesSent 3536121 0.0
TcpExtSyncookiesSent 3529963 0.0
TcpExtSyncookiesSent 3536319 0.0
With the patch:
TcpExtSyncookiesSent 3611786 0.0
TcpExtSyncookiesSent 3596682 0.0
TcpExtSyncookiesSent 3606878 0.0
TcpExtSyncookiesSent 3599564 0.0
TcpExtSyncookiesSent 3601304 0.0
TcpExtSyncookiesSent 3609249 0.0
TcpExtSyncookiesSent 3617437 0.0
TcpExtSyncookiesSent 3608765 0.0
TcpExtSyncookiesSent 3620205 0.0
TcpExtSyncookiesSent 3601895 0.0
Without the patch the average is 354263 pkt/s or 2822 ns/pkt and with
the patch the average is 360738 pkt/s or 2772 ns/pkt which gives an
estimate of 50 ns per packet.
[1] http://netsniff-ng.org/
Changelog since v1:
- Change ordering in the ICW (Paolo Abeni)
Cc: Luigi Rizzo <lrizzo@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are many places where 2 annotations are not enough. This patch
adds INDIRECT_CALL_3 and INDIRECT_CALL_4 to cover such cases.
Signed-off-by: Brian Vazquez <brianvv@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make sure we pass through ret label to unlock the mutex.
Signed-off-by: John van der Kamp <sjonny@suffe.me.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
Currently, if the clang-bpf-co-re feature is not available, the build
fails with e.g.
CC prog.o
prog.c:1462:10: fatal error: profiler.skel.h: No such file or directory
1462 | #include "profiler.skel.h"
| ^~~~~~~~~~~~~~~~~
This is due to the fact that the BPFTOOL_WITHOUT_SKELETONS macro is not
defined, despite BUILD_BPF_SKELS not being set. Fix this by correctly
evaluating $(BUILD_BPF_SKELS) when deciding on whether to add
-DBPFTOOL_WITHOUT_SKELETONS to CFLAGS.
Fixes: 05aca6da3b5a ("tools/bpftool: Generalize BPF skeleton support and generate vmlinux.h")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200623103710.10370-1-tklauser@distanz.ch
|
|
Extend original variable-length tests with a case to catch a common
existing pattern of testing for < 0 for errors. Note because
verifier also tracks upper bounds and we know it can not be greater
than MAX_LEN here we can skip upper bound check.
In ALU64 enabled compilation converting from long->int return types
in probe helpers results in extra instruction pattern, <<= 32, s >>= 32.
The trade-off is the non-ALU64 case works. If you really care about
every extra insn (XDP case?) then you probably should be using original
int type.
In addition adding a sext insn to bpf might help the verifier in the
general case to avoid these types of tricks.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200623032224.4020118-3-andriin@fb.com
|
|
Add selftest that validates variable-length data reading and concatentation
with one big shared data array. This is a common pattern in production use for
monitoring and tracing applications, that potentially can read a lot of data,
but overall read much less. Such pattern allows to determine precisely what
amount of data needs to be sent over perfbuf/ringbuf and maximize efficiency.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200623032224.4020118-2-andriin@fb.com
|
|
Switch most of BPF helper definitions from returning int to long. These
definitions are coming from comments in BPF UAPI header and are used to
generate bpf_helper_defs.h (under libbpf) to be later included and used from
BPF programs.
In actual in-kernel implementation, all the helpers are defined as returning
u64, but due to some historical reasons, most of them are actually defined as
returning int in UAPI (usually, to return 0 on success, and negative value on
error).
This actually causes Clang to quite often generate sub-optimal code, because
compiler believes that return value is 32-bit, and in a lot of cases has to be
up-converted (usually with a pair of 32-bit bit shifts) to 64-bit values,
before they can be used further in BPF code.
Besides just "polluting" the code, these 32-bit shifts quite often cause
problems for cases in which return value matters. This is especially the case
for the family of bpf_probe_read_str() functions. There are few other similar
helpers (e.g., bpf_read_branch_records()), in which return value is used by
BPF program logic to record variable-length data and process it. For such
cases, BPF program logic carefully manages offsets within some array or map to
read variable-length data. For such uses, it's crucial for BPF verifier to
track possible range of register values to prove that all the accesses happen
within given memory bounds. Those extraneous zero-extending bit shifts,
inserted by Clang (and quite often interleaved with other code, which makes
the issues even more challenging and sometimes requires employing extra
per-variable compiler barriers), throws off verifier logic and makes it mark
registers as having unknown variable offset. We'll study this pattern a bit
later below.
Another common pattern is to check return of BPF helper for non-zero state to
detect error conditions and attempt alternative actions in such case. Even in
this simple and straightforward case, this 32-bit vs BPF's native 64-bit mode
quite often leads to sub-optimal and unnecessary extra code. We'll look at
this pattern as well.
Clang's BPF target supports two modes of code generation: ALU32, in which it
is capable of using lower 32-bit parts of registers, and no-ALU32, in which
only full 64-bit registers are being used. ALU32 mode somewhat mitigates the
above described problems, but not in all cases.
This patch switches all the cases in which BPF helpers return 0 or negative
error from returning int to returning long. It is shown below that such change
in definition leads to equivalent or better code. No-ALU32 mode benefits more,
but ALU32 mode doesn't degrade or still gets improved code generation.
Another class of cases switched from int to long are bpf_probe_read_str()-like
helpers, which encode successful case as non-negative values, while still
returning negative value for errors.
In all of such cases, correctness is preserved due to two's complement
encoding of negative values and the fact that all helpers return values with
32-bit absolute value. Two's complement ensures that for negative values
higher 32 bits are all ones and when truncated, leave valid negative 32-bit
value with the same value. Non-negative values have upper 32 bits set to zero
and similarly preserve value when high 32 bits are truncated. This means that
just casting to int/u32 is correct and efficient (and in ALU32 mode doesn't
require any extra shifts).
To minimize the chances of regressions, two code patterns were investigated,
as mentioned above. For both patterns, BPF assembly was analyzed in
ALU32/NO-ALU32 compiler modes, both with current 32-bit int return type and
new 64-bit long return type.
Case 1. Variable-length data reading and concatenation. This is quite
ubiquitous pattern in tracing/monitoring applications, reading data like
process's environment variables, file path, etc. In such case, many pieces of
string-like variable-length data are read into a single big buffer, and at the
end of the process, only a part of array containing actual data is sent to
user-space for further processing. This case is tested in test_varlen.c
selftest (in the next patch). Code flow is roughly as follows:
void *payload = &sample->payload;
u64 len;
len = bpf_probe_read_kernel_str(payload, MAX_SZ1, &source_data1);
if (len <= MAX_SZ1) {
payload += len;
sample->len1 = len;
}
len = bpf_probe_read_kernel_str(payload, MAX_SZ2, &source_data2);
if (len <= MAX_SZ2) {
payload += len;
sample->len2 = len;
}
/* and so on */
sample->total_len = payload - &sample->payload;
/* send over, e.g., perf buffer */
There could be two variations with slightly different code generated: when len
is 64-bit integer and when it is 32-bit integer. Both variations were analysed.
BPF assembly instructions between two successive invocations of
bpf_probe_read_kernel_str() were used to check code regressions. Results are
below, followed by short analysis. Left side is using helpers with int return
type, the right one is after the switch to long.
ALU32 + INT ALU32 + LONG
=========== ============
64-BIT (13 insns): 64-BIT (10 insns):
------------------------------------ ------------------------------------
17: call 115 17: call 115
18: if w0 > 256 goto +9 <LBB0_4> 18: if r0 > 256 goto +6 <LBB0_4>
19: w1 = w0 19: r1 = 0 ll
20: r1 <<= 32 21: *(u64 *)(r1 + 0) = r0
21: r1 s>>= 32 22: r6 = 0 ll
22: r2 = 0 ll 24: r6 += r0
24: *(u64 *)(r2 + 0) = r1 00000000000000c8 <LBB0_4>:
25: r6 = 0 ll 25: r1 = r6
27: r6 += r1 26: w2 = 256
00000000000000e0 <LBB0_4>: 27: r3 = 0 ll
28: r1 = r6 29: call 115
29: w2 = 256
30: r3 = 0 ll
32: call 115
32-BIT (11 insns): 32-BIT (12 insns):
------------------------------------ ------------------------------------
17: call 115 17: call 115
18: if w0 > 256 goto +7 <LBB1_4> 18: if w0 > 256 goto +8 <LBB1_4>
19: r1 = 0 ll 19: r1 = 0 ll
21: *(u32 *)(r1 + 0) = r0 21: *(u32 *)(r1 + 0) = r0
22: w1 = w0 22: r0 <<= 32
23: r6 = 0 ll 23: r0 >>= 32
25: r6 += r1 24: r6 = 0 ll
00000000000000d0 <LBB1_4>: 26: r6 += r0
26: r1 = r6 00000000000000d8 <LBB1_4>:
27: w2 = 256 27: r1 = r6
28: r3 = 0 ll 28: w2 = 256
30: call 115 29: r3 = 0 ll
31: call 115
In ALU32 mode, the variant using 64-bit length variable clearly wins and
avoids unnecessary zero-extension bit shifts. In practice, this is even more
important and good, because BPF code won't need to do extra checks to "prove"
that payload/len are within good bounds.
32-bit len is one instruction longer. Clang decided to do 64-to-32 casting
with two bit shifts, instead of equivalent `w1 = w0` assignment. The former
uses extra register. The latter might potentially lose some range information,
but not for 32-bit value. So in this case, verifier infers that r0 is [0, 256]
after check at 18:, and shifting 32 bits left/right keeps that range intact.
We should probably look into Clang's logic and see why it chooses bitshifts
over sub-register assignments for this.
NO-ALU32 + INT NO-ALU32 + LONG
============== ===============
64-BIT (14 insns): 64-BIT (10 insns):
------------------------------------ ------------------------------------
17: call 115 17: call 115
18: r0 <<= 32 18: if r0 > 256 goto +6 <LBB0_4>
19: r1 = r0 19: r1 = 0 ll
20: r1 >>= 32 21: *(u64 *)(r1 + 0) = r0
21: if r1 > 256 goto +7 <LBB0_4> 22: r6 = 0 ll
22: r0 s>>= 32 24: r6 += r0
23: r1 = 0 ll 00000000000000c8 <LBB0_4>:
25: *(u64 *)(r1 + 0) = r0 25: r1 = r6
26: r6 = 0 ll 26: r2 = 256
28: r6 += r0 27: r3 = 0 ll
00000000000000e8 <LBB0_4>: 29: call 115
29: r1 = r6
30: r2 = 256
31: r3 = 0 ll
33: call 115
32-BIT (13 insns): 32-BIT (13 insns):
------------------------------------ ------------------------------------
17: call 115 17: call 115
18: r1 = r0 18: r1 = r0
19: r1 <<= 32 19: r1 <<= 32
20: r1 >>= 32 20: r1 >>= 32
21: if r1 > 256 goto +6 <LBB1_4> 21: if r1 > 256 goto +6 <LBB1_4>
22: r2 = 0 ll 22: r2 = 0 ll
24: *(u32 *)(r2 + 0) = r0 24: *(u32 *)(r2 + 0) = r0
25: r6 = 0 ll 25: r6 = 0 ll
27: r6 += r1 27: r6 += r1
00000000000000e0 <LBB1_4>: 00000000000000e0 <LBB1_4>:
28: r1 = r6 28: r1 = r6
29: r2 = 256 29: r2 = 256
30: r3 = 0 ll 30: r3 = 0 ll
32: call 115 32: call 115
In NO-ALU32 mode, for the case of 64-bit len variable, Clang generates much
superior code, as expected, eliminating unnecessary bit shifts. For 32-bit
len, code is identical.
So overall, only ALU-32 32-bit len case is more-or-less equivalent and the
difference stems from internal Clang decision, rather than compiler lacking
enough information about types.
Case 2. Let's look at the simpler case of checking return result of BPF helper
for errors. The code is very simple:
long bla;
if (bpf_probe_read_kenerl(&bla, sizeof(bla), 0))
return 1;
else
return 0;
ALU32 + CHECK (9 insns) ALU32 + CHECK (9 insns)
==================================== ====================================
0: r1 = r10 0: r1 = r10
1: r1 += -8 1: r1 += -8
2: w2 = 8 2: w2 = 8
3: r3 = 0 3: r3 = 0
4: call 113 4: call 113
5: w1 = w0 5: r1 = r0
6: w0 = 1 6: w0 = 1
7: if w1 != 0 goto +1 <LBB2_2> 7: if r1 != 0 goto +1 <LBB2_2>
8: w0 = 0 8: w0 = 0
0000000000000048 <LBB2_2>: 0000000000000048 <LBB2_2>:
9: exit 9: exit
Almost identical code, the only difference is the use of full register
assignment (r1 = r0) vs half-registers (w1 = w0) in instruction #5. On 32-bit
architectures, new BPF assembly might be slightly less optimal, in theory. But
one can argue that's not a big issue, given that use of full registers is
still prevalent (e.g., for parameter passing).
NO-ALU32 + CHECK (11 insns) NO-ALU32 + CHECK (9 insns)
==================================== ====================================
0: r1 = r10 0: r1 = r10
1: r1 += -8 1: r1 += -8
2: r2 = 8 2: r2 = 8
3: r3 = 0 3: r3 = 0
4: call 113 4: call 113
5: r1 = r0 5: r1 = r0
6: r1 <<= 32 6: r0 = 1
7: r1 >>= 32 7: if r1 != 0 goto +1 <LBB2_2>
8: r0 = 1 8: r0 = 0
9: if r1 != 0 goto +1 <LBB2_2> 0000000000000048 <LBB2_2>:
10: r0 = 0 9: exit
0000000000000058 <LBB2_2>:
11: exit
NO-ALU32 is a clear improvement, getting rid of unnecessary zero-extension bit
shifts.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200623032224.4020118-1-andriin@fb.com
|
|
This patch fixes a spelling typo in spectrum_dcb.c
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Alexander Lobakin says:
====================
net: qed/qede: various stability fixes
This set addresses several near-critical issues that were observed
and reproduced on different test and production configurations.
v2:
- don't split the "Fixes:" tag across several lines in patch 9;
- no functional changes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Variable 'abs_ppfid' in qed_dev.c:qed_llh_add_mac_filter() always gets
printed, but is initialized only under 'ref_cnt == 1' condition. This
results in:
In file included from ./include/linux/kernel.h:15:0,
from ./include/asm-generic/bug.h:19,
from ./arch/x86/include/asm/bug.h:86,
from ./include/linux/bug.h:5,
from ./include/linux/io.h:11,
from drivers/net/ethernet/qlogic/qed/qed_dev.c:35:
drivers/net/ethernet/qlogic/qed/qed_dev.c: In function 'qed_llh_add_mac_filter':
./include/linux/printk.h:358:2: warning: 'abs_ppfid' may be used uninitialized
in this function [-Wmaybe-uninitialized]
printk(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~
drivers/net/ethernet/qlogic/qed/qed_dev.c:983:17: note: 'abs_ppfid' was declared
here
u8 filter_idx, abs_ppfid;
^~~~~~~~~
...under W=1+.
Fix this by initializing it with zero.
Fixes: 79284adeb99e ("qed: Add llh ppfid interface and 100g support for offload protocols")
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Sizes of all ILT blocks must be reset before ILT recomputing when
disabling clients, or memory allocation may exceed ILT shadow array
and provoke system crashes.
Fixes: 1408cc1fa48c ("qed: Introduce VFs")
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Set edev->cdev pointer to NULL after calling remove() callback to avoid
using of already freed object.
Fixes: ccc67ef50b90 ("qede: Error recovery process")
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently PTP cyclecounter and timecounter are initialized only on
the first probing and are cleaned up during removal. This means that
PTP becomes non-functional after device recovery.
Fix this by unconditional PTP initialization on probing and clearing
Tx pending bit on exiting.
Fixes: ccc67ef50b90 ("qede: Error recovery process")
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is likely a copy'n'paste mistake. The amount of ILT lines to
reserve for a single VF was being multiplied by the total VFs count.
This led to a huge redundancy in reservation and potential lines
drainouts.
Fixes: 1408cc1fa48c ("qed: Introduce VFs")
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
25ms sleep cycles in waiting for PF response are excessive and may lead
to different timeout failures.
Start to wait with short udelays, and in most cases polling will end
here. If the time was not sufficient, switch to msleeps.
usleep_range() may go far beyond 100us depending on platform and tick
configuration, hence atomic udelays for consistency.
Also add explicit DMA barriers since 'done' always comes from a shared
request-response DMA pool, and note that in the comment nearby.
Fixes: 1408cc1fa48c ("qed: Introduce VFs")
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Set rdma_wq pointer to NULL after destroying the workqueue and check
for it when adding new events to fix crashes on driver unload.
Fixes: cee9fbd8e2e9 ("qede: Add qedr framework")
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
qed_spq_unregister_async_cb() should be called before
qed_rdma_info_free() to avoid crash-spawning uses-after-free.
Instead of calling it from each subsystem exit code, do it in one place
on PF down.
Fixes: 291d57f67d24 ("qed: Fix rdma_info structure allocation")
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
qed_chain_get_element_left{,_u32} returned 0 when the difference
between producer and consumer page count was equal to the total
page count.
Fix this by conditional expanding of producer value (vs
unconditional). This allowed to eliminate normalizaton against
total page count, which was the cause of this bug.
Misc: replace open-coded constants with common defines.
Fixes: a91eb52abb50 ("qed: Revisit chain implementation")
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit e585f2363637 ("udp: Changes to udp_offload to support remote
checksum offload") added new GSO type and a corresponding netdev
feature, but missed Ethtool's 'netdev_features_strings' table.
Give it a name so it will be exposed to userspace and become available
for manual configuration.
v3:
- decouple from "netdev_features_strings[] cleanup" series;
- no functional changes.
v2:
- don't split the "Fixes:" tag across lines;
- no functional changes.
Fixes: e585f2363637 ("udp: Changes to udp_offload to support remote checksum offload")
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Jason A. Donenfeld says:
====================
wireguard fixes for 5.8-rc3
This series contains two fixes, one cosmetic and one quite important:
1) Avoid the `if ((x = f()) == y)` pattern, from Frank
Werner-Krippendorf.
2) Mitigate a potential memory leak by creating circular netns
references, while also making the netns semantics a bit more
robust.
Patch (2) has a "Fixes:" line and should be backported to stable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Before, we took a reference to the creating netns if the new netns was
different. This caused issues with circular references, with two
wireguard interfaces swapping namespaces. The solution is to rather not
take any extra references at all, but instead simply invalidate the
creating netns pointer when that netns is deleted.
In order to prevent this from happening again, this commit improves the
rough object leak tracking by allowing it to account for created and
destroyed interfaces, aside from just peers and keys. That then makes it
possible to check for the object leak when having two interfaces take a
reference to each others' namespaces.
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes an error condition reported by checkpatch.pl which caused by
assigning a variable in an if condition in wg_noise_handshake_consume_
initiation().
Signed-off-by: Frank Werner-Krippendorf <mail@hb9fxq.ch>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Horatiu Vultur says:
====================
bridge: mrp: Update MRP_PORT_ROLE
This patch series does the following:
- fixes the enum br_mrp_port_role_type. It removes the port role none(0x2)
because this is in conflict with the standard. The standard defines the
interconnect port role as value 0x2.
- adds checks regarding current defined port roles: primary(0x0) and
secondary(0x1).
v2:
- add the validation code when setting the port role.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds specific checks for primary(0x0) and secondary(0x1) when
setting the port role. For any other value the function
'br_mrp_set_port_role' will return -EINVAL.
Fixes: 20f6a05ef63594 ("bridge: mrp: Rework the MRP netlink interface")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently the MRP_PORT_ROLE_NONE has the value 0x2 but this is in conflict
with the IEC 62439-2 standard. The standard defines the following port
roles: primary (0x0), secondary(0x1), interconnect(0x2).
Therefore remove the port role none.
Fixes: 4714d13791f831 ("bridge: uapi: mrp: Add mrp attributes.")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Keepalived can set global static ip routes or virtual ip routes dynamically
following VRRP protocol states. Using a dedicated rtm_protocol will help
keeping track of it.
Changes in v2:
- fix tab/space indenting
Signed-off-by: Alexandre Cassen <acassen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
John Stultz reported that commit f9288fcc5c615 ("i2c: designware: Move
ACPI parts into common module") caused a regression on the HiKey board
where adv7511 HDMI bridge driver wasn't probing anymore due the I2C bus
failed to start.
It seems the change caused the bus speed being zero when CONFIG_ACPI
not set and neither speed based on "clock-frequency" device property
or default fast mode is set.
Fix this by splitting i2c_dw_acpi_adjust_bus_speed() to
i2c_dw_acpi_round_bus_speed() and i2c_dw_adjust_bus_speed(), where
the latter one has the code that runs independently of ACPI.
Fixes: f9288fcc5c615 ("i2c: designware: Move ACPI parts into common module")
Reported-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Tested-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
Pull kvm fixes from Paolo Bonzini:
"All bugfixes except for a couple cleanup patches"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: VMX: Remove vcpu_vmx's defunct copy of host_pkru
KVM: x86: allow TSC to differ by NTP correction bounds without TSC scaling
KVM: X86: Fix MSR range of APIC registers in X2APIC mode
KVM: VMX: Stop context switching MSR_IA32_UMWAIT_CONTROL
KVM: nVMX: Plumb L2 GPA through to PML emulation
KVM: x86/mmu: Avoid mixing gpa_t with gfn_t in walk_addr_generic()
KVM: LAPIC: ensure APIC map is up to date on concurrent update requests
kvm: lapic: fix broken vcpu hotplug
Revert "KVM: VMX: Micro-optimize vmexit time when not exposing PMU"
KVM: VMX: Add helpers to identify interrupt type from intr_info
kvm/svm: disable KCSAN for svm_vcpu_run()
KVM: MIPS: Fix a build error for !CPU_LOONGSON64
|
|
When the user consumes and generates sqe at a fast rate,
io_sqring_entries can always get sqe, and ret will not be equal to -EBUSY,
so that io_sq_thread will never call cond_resched or schedule, and then
we will get the following system error prompt:
rcu: INFO: rcu_sched self-detected stall on CPU
or
watchdog: BUG: soft lockup-CPU#23 stuck for 112s! [io_uring-sq:1863]
This patch checks whether need to call cond_resched() by checking
the need_resched() function every cycle.
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The main DRM device is actually a virtual device so it doesn't have the
iommus property, which is instead on the DMA masters, in this case the
mixers.
Add a call to of_dma_configure with the mixers DT node but on the DRM
virtual device to configure it in the same way than the mixers.
Reviewed-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/9a4daf438dd3f2fe07afb23688bfb793a0613d7d.1589378833.git-series.maxime@cerno.tech
(cherry picked from commit b718102dbdfd0285ad559687a30e27cc9124e592)
[Maxime: Applied to -fixes since it missed the merge window and display is
broken without it]
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
|
|
When xfstests generic/451, there is an BUG at mm/memcontrol.c:
page:ffffea000560f2c0 refcount:2 mapcount:0 mapping:000000008544e0ea
index:0xf
mapping->aops:cifs_addr_ops dentry name:"tst-aio-dio-cycle-write.451"
flags: 0x2fffff80000001(locked)
raw: 002fffff80000001 ffffc90002023c50 ffffea0005280088 ffff88815cda0210
raw: 000000000000000f 0000000000000000 00000002ffffffff ffff88817287d000
page dumped because: VM_BUG_ON_PAGE(page->mem_cgroup)
page->mem_cgroup:ffff88817287d000
------------[ cut here ]------------
kernel BUG at mm/memcontrol.c:2659!
invalid opcode: 0000 [#1] SMP
CPU: 2 PID: 2038 Comm: xfs_io Not tainted 5.8.0-rc1 #44
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_
073836-buildvm-ppc64le-16.ppc.4
RIP: 0010:commit_charge+0x35/0x50
Code: 0d 48 83 05 54 b2 02 05 01 48 89 77 38 c3 48 c7
c6 78 4a ea ba 48 83 05 38 b2 02 05 01 e8 63 0d9
RSP: 0018:ffffc90002023a50 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff88817287d000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88817ac97ea0 RDI: ffff88817ac97ea0
RBP: ffffea000560f2c0 R08: 0000000000000203 R09: 0000000000000005
R10: 0000000000000030 R11: ffffc900020237a8 R12: 0000000000000000
R13: 0000000000000001 R14: 0000000000000001 R15: ffff88815a1272c0
FS: 00007f5071ab0800(0000) GS:ffff88817ac80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055efcd5ca000 CR3: 000000015d312000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
mem_cgroup_charge+0x166/0x4f0
__add_to_page_cache_locked+0x4a9/0x710
add_to_page_cache_locked+0x15/0x20
cifs_readpages+0x217/0x1270
read_pages+0x29a/0x670
page_cache_readahead_unbounded+0x24f/0x390
__do_page_cache_readahead+0x3f/0x60
ondemand_readahead+0x1f1/0x470
page_cache_async_readahead+0x14c/0x170
generic_file_buffered_read+0x5df/0x1100
generic_file_read_iter+0x10c/0x1d0
cifs_strict_readv+0x139/0x170
new_sync_read+0x164/0x250
__vfs_read+0x39/0x60
vfs_read+0xb5/0x1e0
ksys_pread64+0x85/0xf0
__x64_sys_pread64+0x22/0x30
do_syscall_64+0x69/0x150
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f5071fcb1af
Code: Bad RIP value.
RSP: 002b:00007ffde2cdb8e0 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
RAX: ffffffffffffffda RBX: 00007ffde2cdb990 RCX: 00007f5071fcb1af
RDX: 0000000000001000 RSI: 000055efcd5ca000 RDI: 0000000000000003
RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000001000 R11: 0000000000000293 R12: 0000000000000001
R13: 000000000009f000 R14: 0000000000000000 R15: 0000000000001000
Modules linked in:
---[ end trace 725fa14a3e1af65c ]---
Since commit 3fea5a499d57 ("mm: memcontrol: convert page cache to a new
mem_cgroup_charge() API") not cancel the page charge, the pages maybe
double add to pagecache:
thread1 | thread2
cifs_readpages
readpages_get_pages
add_to_page_cache_locked(head,index=n)=0
| readpages_get_pages
| add_to_page_cache_locked(head,index=n+1)=0
add_to_page_cache_locked(head, index=n+1)=-EEXIST
then, will next loop with list head page's
index=n+1 and the page->mapping not NULL
readpages_get_pages
add_to_page_cache_locked(head, index=n+1)
commit_charge
VM_BUG_ON_PAGE
So, we should not do the next loop when any page add to page cache
failed.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A number of fixes, located in two areas, one performance fix and one
fixup for better integration with another patchset.
- bug fixes in nowait aio:
- fix snapshot creation hang after nowait-aio was used
- fix failure to write to prealloc extent past EOF
- don't block when extent range is locked
- block group fixes:
- relocation failure when scrub runs in parallel
- refcount fix when removing fails
- fix race between removal and creation
- space accounting fixes
- reinstante fast path check for log tree at unlink time, fixes
performance drop up to 30% in REAIM
- kzfree/kfree fixup to ease treewide patchset renaming kzfree"
* tag 'for-5.8-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: use kfree() in btrfs_ioctl_get_subvol_info()
btrfs: fix RWF_NOWAIT writes blocking on extent locks and waiting for IO
btrfs: fix RWF_NOWAIT write not failling when we need to cow
btrfs: fix failure of RWF_NOWAIT write into prealloc extent beyond eof
btrfs: fix hang on snapshot creation after RWF_NOWAIT write
btrfs: check if a log root exists before locking the log_mutex on unlink
btrfs: fix bytes_may_use underflow when running balance and scrub in parallel
btrfs: fix data block group relocation failure due to concurrent scrub
btrfs: fix race between block group removal and block group creation
btrfs: fix a block group ref counter leak after failure to remove block group
|
|
'early_resume_init()' and 'late_resume_init() 'are only called respectively
via 'early_resume_init' and 'late_resume_init'.
They can be marked as __init to save a few bytes of memory.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Add one more bit for OOB (Out Of Band) enabling of P-states.
If OOB handling of P-states is enabled, intel_pstate shouldn't load.
Currently, only "BIT(8) == 1" of the MSR MSR_MISC_PWR_MGMT is
considered as OOB, but "BIT(18) == 1" needs to be taken into
consideration as OOB condition too.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Add an empty code line, edit subject and changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Currently the ring buffer makes events that happen in interrupts that preempt
another event have a delta of zero. (Hopefully we can change this soon). But
this is to deal with the races of updating a global counter with lockless
and nesting functions updating deltas.
With the addition of absolute time stamps, the time extend didn't follow
this rule. A time extend can happen if two events happen longer than 2^27
nanoseconds appart, as the delta time field in each event is only 27 bits.
If that happens, then a time extend is injected with 2^59 bits of
nanoseconds to use (18 years). But if the 2^27 nanoseconds happen between
two events, and as it is writing the event, an interrupt triggers, it will
see the 2^27 difference as well and inject a time extend of its own. But a
recent change made the time extend logic not take into account the nesting,
and this can cause two time extend deltas to happen moving the time stamp
much further ahead than the current time. This gets all reset when the ring
buffer moves to the next page, but that can cause time to appear to go
backwards.
This was observed in a trace-cmd recording, and since the data is saved in a
file, with trace-cmd report --debug, it was possible to see that this indeed
did happen!
bash-52501 110d... 81778.908247: sched_switch: bash:52501 [120] S ==> swapper/110:0 [120] [12770284:0x2e8:64]
<idle>-0 110d... 81778.908757: sched_switch: swapper/110:0 [120] R ==> bash:52501 [120] [509947:0x32c:64]
TIME EXTEND: delta:306454770 length:0
bash-52501 110.... 81779.215212: sched_swap_numa: src_pid=52501 src_tgid=52388 src_ngid=52501 src_cpu=110 src_nid=2 dst_pid=52509 dst_tgid=52388 dst_ngid=52501 dst_cpu=49 dst_nid=1 [0:0x378:48]
TIME EXTEND: delta:306458165 length:0
bash-52501 110dNh. 81779.521670: sched_wakeup: migration/110:565 [0] success=1 CPU:110 [0:0x3b4:40]
and at the next page, caused the time to go backwards:
bash-52504 110d... 81779.685411: sched_switch: bash:52504 [120] S ==> swapper/110:0 [120] [8347057:0xfb4:64]
CPU:110 [SUBBUFFER START] [81779379165886:0x1320000]
<idle>-0 110dN.. 81779.379166: sched_wakeup: bash:52504 [120] success=1 CPU:110 [0:0x10:40]
<idle>-0 110d... 81779.379167: sched_switch: swapper/110:0 [120] R ==> bash:52504 [120] [1168:0x3c:64]
Link: https://lkml.kernel.org/r/20200622151815.345d1bf5@oasis.local.home
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: stable@vger.kernel.org
Fixes: dc4e2801d400b ("ring-buffer: Redefine the unimplemented RINGBUF_TYPE_TIME_STAMP")
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Versions of binutils prior to 2.33.1 don't understand the ELF notes that
are added by modern compilers to indicate the PAC and BTI options used
to build the code. This causes them to emit large numbers of warnings in
the form:
aarch64-linux-gnu-nm: warning: .tmp_vmlinux.kallsyms2: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0000000
during the kernel build which is currently causing quite a bit of
disruption for automated build testing using clang.
In commit 15cd0e675f3f76b (arm64: Kconfig: ptrauth: Add binutils version
check to fix mismatch) we added a dependency on binutils to avoid this
issue when building with versions of GCC that emit the notes but did not
do so for clang as it was believed that the existing check for
.cfi_negate_ra_state was already requiring a new enough binutils. This
does not appear to be the case for some versions of binutils (eg, the
binutils in Debian 10) so instead refactor so we require a new enough
GNU binutils in all cases other than when we are using an old GCC
version that does not emit notes.
Other, more exotic, combinations of tools are possible such as using
clang, lld and gas together are possible and may have further problems
but rather than adding further version checks it looks like the most
robust thing will be to just test that we can build cleanly with the
configured tools but that will require more review and discussion so do
this for now to address the immediate problem disrupting build testing.
Reported-by: KernelCI <bot@kernelci.org>
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1054
Link: https://lore.kernel.org/r/20200619123550.48098-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Suspend to idle was found to not work on Goldmont CPU recently.
The issue happens due to:
1. On Goldmont the CPU in idle can only be woken up via IPIs,
not POLLING mode, due to commit 08e237fa56a1 ("x86/cpu: Add
workaround for MONITOR instruction erratum on Goldmont based
CPUs")
2. When the CPU is entering suspend to idle process, the
_TIF_POLLING_NRFLAG remains on, because cpuidle_enter_s2idle()
doesn't match call_cpuidle() exactly.
3. Commit b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
makes use of _TIF_POLLING_NRFLAG to avoid sending IPIs to idle
CPUs.
4. As a result, some IPIs related functions might not work
well during suspend to idle on Goldmont. For example, one
suspected victim:
tick_unfreeze() -> timekeeping_resume() -> hrtimers_resume()
-> clock_was_set() -> on_each_cpu() might wait forever,
because the IPIs will not be sent to the CPUs which are
sleeping with _TIF_POLLING_NRFLAG set, and Goldmont CPU
could not be woken up by only setting _TIF_NEED_RESCHED
on the monitor address.
To avoid that, clear the _TIF_POLLING_NRFLAG flag before invoking
enter_s2idle_proper() in cpuidle_enter_s2idle() in analogy with the
call_cpuidle() code flow.
Fixes: b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
[ rjw: Subject / changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Introduce new resource dump segments - PRM_QUERY_QP,
PRM_QUERY_CQ and PRM_QUERY_MKEY. These segments contains the resource
dump in PRM query format.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
|
Export some of the resource dump API. mlx5_ib driver will use
it in downstream patches.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
|
We've found Samsung USBC Headset (AKG) (VID: 0x04e8, PID: 0xa051)
need a tiny delay after each class compliant request.
Otherwise the device might not be able to be recognized each times.
Signed-off-by: Chihhao Chen <chihhao.chen@mediatek.com>
Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/1592910203-24035-1-git-send-email-macpaul.lin@mediatek.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
The sigreturn code in the compat vDSO is unused. Remove it.
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The 32-bit sigreturn trampoline in the compat sigpage matches the binary
representation of the arch/arm/ sigpage exactly. This is important for
debuggers (e.g. GDB) and unwinders (e.g. libunwind) since they rely
on matching the instruction sequence in order to identify that they are
unwinding through a signal. The same cannot be said for the sigreturn
trampoline in the compat vDSO, which defeats the unwinder heuristics and
instead attempts to use unwind directives for the unwinding. This is in
contrast to arch/arm/, which never uses the vDSO for sigreturn.
Ensure compatibility with arch/arm/ and existing unwinders by always
using the sigpage for the sigreturn trampoline, regardless of the
presence of the compat vDSO.
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
In preparation for removing the signal trampoline from the compat vDSO,
allow the sigpage and the compat vDSO to co-exist.
For the moment the vDSO signal trampoline will still be used when built.
Subsequent patches will move to the sigpage consistently.
Acked-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Commit 7e9f5e6629f6 ("arm64: vdso: Add --eh-frame-hdr to ldflags") results
in a .eh_frame_hdr section for the vDSO, which in turn causes the libgcc
unwinder to unwind out of signal handlers using the .eh_frame information
populated by our .cfi directives. In conjunction with a4eb355a3fda
("arm64: vdso: Fix CFI directives in sigreturn trampoline"), this has
been shown to cause segmentation faults originating from within the
unwinder during thread cancellation:
| Thread 14 "virtio-net-rx" received signal SIGSEGV, Segmentation fault.
| 0x0000000000435e24 in uw_frame_state_for ()
| (gdb) bt
| #0 0x0000000000435e24 in uw_frame_state_for ()
| #1 0x0000000000436e88 in _Unwind_ForcedUnwind_Phase2 ()
| #2 0x00000000004374d8 in _Unwind_ForcedUnwind ()
| #3 0x0000000000428400 in __pthread_unwind (buf=<optimized out>) at unwind.c:121
| #4 0x0000000000429808 in __do_cancel () at ./pthreadP.h:304
| #5 sigcancel_handler (sig=32, si=0xffff33c743f0, ctx=<optimized out>) at nptl-init.c:200
| #6 sigcancel_handler (sig=<optimized out>, si=0xffff33c743f0, ctx=<optimized out>) at nptl-init.c:165
| #7 <signal handler called>
| #8 futex_wait_cancelable (private=0, expected=0, futex_word=0x3890b708) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
After considerable bashing of heads, it appears that our CFI directives
for unwinding out of the sigreturn trampoline are only processed by libgcc
when both a .eh_frame_hdr section is present *and* the mysterious NOP is
covered by an entry in .eh_frame. With both of these now in place, it has
highlighted that our CFI directives are not comprehensive enough to
restore the stack pointer of the interrupted context. This results in libgcc
falling back to an arm64-specific unwinder after computing a bogus PC value
from the unwind tables. The unwinder promptly dereferences this bogus address
in an attempt to see if the pointed-to instruction sequence looks like
the sigreturn trampoline.
Restore the old unwind behaviour, which relied solely on heuristics in
the unwinder, by removing the .eh_frame_hdr section from the vDSO and
commenting out the insufficient CFI directives for now. Add comments to
explain the current, miserable state of affairs.
Cc: Tamas Zsoldos <tamas.zsoldos@arm.com>
Cc: Szabolcs Nagy <szabolcs.nagy@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Kiss <daniel.kiss@arm.com>
Acked-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reported-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Commit 16358542f32f ("hwmon: (pmbus) Implement multi-phase support")
added support for multi-phase pmbus devices. However, when calling
pmbus_add_sensor() for fans, the patch swapped the `page` and `reg`
attributes. As a result, the fan speeds were reported as 0 RPM on my device.
Signed-off-by: Jan Kundrát <jan.kundrat@cesnet.cz>
Fixes: 16358542f32f ("hwmon: (pmbus) Implement multi-phase support")
Cc: stable@vger.kernel.org # v5.7+
Link: https://lore.kernel.org/r/449bc9e6c0e4305581e45905ce9d043b356a9932.1592904387.git.jan.kundrat@cesnet.cz
[groeck: Fixed references to offending commit]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|