Age | Commit message (Collapse) | Author |
|
For IPV4 packets sent on the wire the rxe driver calls ip_local_out()
which immediately calls __ip_local_out() which sets iph->tot_len and calls
ip_send_check(). This code is duplicated in prepare4(). On the loopback
path the IP header checksum and tot_len fields are not used so they do not
need to be set.
Remove this redundant code.
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20210618045742.204195-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
In send_atomic_ack() in rxe_resp.c there is code copying ack_pkt into the
skb->cb[]. This doesn't do anything useful because the cb[] is not used in
the transmit path by the rxe driver.
Remove this code.
Fixes: 4c93496f18ce ("IB/rxe: do not copy extra stack memory to skb")
Link: https://lore.kernel.org/r/20210618045742.204195-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
ULP can get more error information of CQ through verbs instead of prints.
Link: https://lore.kernel.org/r/1624362836-11631-1-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Xin Long says:
====================
sctp: implement RFC8899: Packetization Layer Path MTU Discovery for SCTP transport
Overview(From RFC8899):
In contrast to PMTUD, Packetization Layer Path MTU Discovery
(PLPMTUD) [RFC4821] introduces a method that does not rely upon
reception and validation of PTB messages. It is therefore more
robust than Classical PMTUD. This has become the recommended
approach for implementing discovery of the PMTU [BCP145].
It uses a general strategy in which the PL sends probe packets to
search for the largest size of unfragmented datagram that can be sent
over a network path. Probe packets are sent to explore using a
larger packet size. If a probe packet is successfully delivered (as
determined by the PL), then the PLPMTU is raised to the size of the
successful probe. If a black hole is detected (e.g., where packets
of size PLPMTU are consistently not received), the method reduces the
PLPMTU.
SCTP Probe Packets:
As the RFC suggested, the probe packets consist of an SCTP common header
followed by a HEARTBEAT chunk and a PAD chunk. The PAD chunk is used to
control the length of the probe packet. The HEARTBEAT chunk is used to
trigger the sending of a HEARTBEAT ACK chunk to confirm this probe on
the HEARTBEAT sender.
The HEARTBEAT chunk also carries a Heartbeat Information parameter that
includes the probe size to help an implementation associate a HEARTBEAT
ACK with the size of probe that was sent. The sender use the nonce and
the probe size to verify the information returned.
Detailed Implementation on SCTP:
+------+
+------->| Base |-----------------+ Connectivity
| +------+ | or BASE_PLPMTU
| | | confirmation failed
| | v
| | Connectivity +-------+
| | and BASE_PLPMTU | Error |
| | confirmed +-------+
| | | Consistent
| v | connectivity
Black Hole | +--------+ | and BASE_PLPMTU
detected | | Search |<---------------+ confirmed
| +--------+
| ^ |
| | |
| Raise | | Search
| timer | | algorithm
| expired | | completed
| | |
| | v
| +-----------------+
+---| Search Complete |
+-----------------+
When PLPMTUD is enabled, it's in Base state, and starts to probe with
BASE_PLPMTU (1200). If this probe succeeds, it goes to Search state;
If this probe fails, it goes to Error state under which pl.pmtu goes
down to MIN_PLPMTU (512) and keeps probing with BASE_PLPMTU until it
succeeds and goes to Search state.
During the Search state, the probe size is growing by a Big step (32)
every time when the last probe succeeds at the beginning. Once a probe
(such as 1420) fails after trying MAX_PROBES (3) times, the probe_size
goes back to the last one (1420 - 32 = 1388), meanwhile 'probe_high'
is set to 1420 and the growing step becomes a Small one (4). Then the
probe is continuing with a Small step grown each round. Until it gets
the optimal size (such as 1400) when probe with its next probe size
(1404) fails, it sync this size to pathmtu and goes to Complete state.
In Complete state, it will only does a probe check for the pathmtu just
set, if it fails, which means a Black Hole is detected and it goes back
to Base state. If it succeeds, it goes back to Search state again, and
probe is continuing with growing a Small step (1400 + 4). If this probe
fails, probe_high is set and goes back to 1388 and then Complete state,
which is kind of a loop normally. However if the env's pathmtu changes
to a big size somehow, this probe will succeed and then probe continues
with growing a Big step (1400 + 32) each round until another probe fails.
PTB Messages Process:
PLPMTUD doesn't rely on these package to find the pmtu, and shouldn't
trust it either. When processing them, it only changes the probe_size
to PL_PTB_SIZE(info - hlen) if 'pl.pmtu < PL_PTB_SIZE < the current
probe_size' druing Search state. As this could help probe_size to get
to the optimal size faster, for exmaple:
pl.pmtu = 1388, probe_size = 1420, while the env's pathmtu = 1400.
When probe_size is 1420, a Toobig packet with 1400 comes back. If probe
size changes to use 1400, it will save quite a few rounds to get there.
But of course after having this value, PLPMTUD will still verify it on
its own before using it.
Patches:
- Patch 1-6: introduce some new constants/variables from the RFC, systcl
and members in transport, APIs for the following patches, chunks and
a timer for the probe sending and some codes for the probe receiving.
- Patch 7-9: implement the state transition on the tx path, rx path and
toobig ICMP packet processing. This is the main algorithm part.
- Patch 10: activate this feature
- Patch 11-14: improve the process for ICMP packets for SCTP over UDP,
so that it can also be covered by this feature.
Tests:
- do sysctl and setsockopt tests for this feature's enabling and disabling.
- get these pr_debug points for this feature by
# cat /sys/kernel/debug/dynamic_debug/control | grep PLP
and enable them on kernel dynamic debug, then play with the pathmtu and
check if the state transition and plpmtu change match the RFC.
- do the above tests for SCTP over IPv4/IPv6 and SCTP over UDP.
v1->v2:
- See Patch 06/14.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Previously, sctp over udp was using udp tunnel's icmp err process, which
only does sk lookup on sctp side. However for sctp's icmp error process,
there are more things to do, like syncing assoc pmtu/retransmit packets
for toobig type err, and starting proto_unreach_timer for unreach type
err etc.
Now after adding PLPMTUD, which also requires to process toobig type err
on sctp side. This patch is to process icmp err on sctp side by parsing
the type/code/info in .encap_err_lookup and call sctp's icmp processing
functions. Note as the 'redirect' err process needs to know the outer
ip(v6) header's, we have to leave it to udp(v6)_err to handle it.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch is to extract sctp_v4_err_handle() from sctp_v4_err() to
only handle the icmp err after the sock lookup, and it also makes
the code clearer.
sctp_v4_err_handle() will be used in sctp over udp's err handling
in the following patch.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch is to extract sctp_v6_err_handle() from sctp_v6_err() to
only handle the icmp err after the sock lookup, and it also makes
the code clearer.
sctp_v6_err_handle() will be used in sctp over udp's err handling
in the following patch.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Same as in tcp_v6_err() and __udp6_lib_err(), there's no need to
hold idev in sctp_v6_err(), so just call __in6_dev_get() instead.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
sctp_transport_pl_reset() is called whenever any of these 3 members in
transport is changed:
- probe_interval
- param_flags & SPP_PMTUD_ENABLE
- state == ACTIVE
If all are true, start the PLPMTUD when it's not yet started. If any of
these is false, stop the PLPMTUD when it's already running.
sctp_transport_pl_update() is called when the transport dst has changed.
It will restart the PLPMTUD probe. Again, the pathmtu won't change but
use the dst's mtu until the Search phase is done.
Note that after using PLPMTUD, the pathmtu is only initialized with the
dst mtu when the transport dst changes. At other time it is updated by
pl.pmtu. So sctp_transport_pmtu_check() will be called only when PLPMTUD
is disabled in sctp_packet_config().
After this patch, the PLPMTUD feature from RFC8899 will be activated
and can be used by users.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
PLPMTUD will short-circuit the old process for icmp TOOBIG packets.
This part is described in rfc8899#section-4.6.2 (PL_PTB_SIZE =
PTB_SIZE - other_headers_len). Note that from rfc8899#section-5.2
State Machine, each case below is for some specific states only:
a) PL_PTB_SIZE < MIN_PLPMTU || PL_PTB_SIZE >= PROBED_SIZE,
discard it, for any state
b) MIN_PLPMTU < PL_PTB_SIZE < BASE_PLPMTU,
Base -> Error, for Base state
c) BASE_PLPMTU <= PL_PTB_SIZE < PLPMTU,
Search -> Base or Complete -> Base, for Search and Complete states.
d) PLPMTU < PL_PTB_SIZE < PROBED_SIZE,
set pl.probe_size to PL_PTB_SIZE then verify it, for Search state.
The most important one is case d), which will help find the optimal
fast during searching. Like when pathmtu = 1392 for SCTP over IPv4,
the search will be (20 is iphdr_len):
1. probe with 1200 - 20
2. probe with 1232 - 20
3. probe with 1264 - 20
...
7. probe with 1388 - 20
8. probe with 1420 - 20
When sending the probe with 1420 - 20, TOOBIG may come with PL_PTB_SIZE =
1392 - 20. Then it matches case d), and saves some rounds to try with the
1392 - 20 probe. But of course, PLPMTUD doesn't trust TOOBIG packets, and
it will go back to the common searching once the probe with the new size
can't be verified.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As described in rfc8899#section-5.2, when a probe succeeds, there might
be the following state transitions:
- Base -> Search, occurs when probe succeeds with BASE_PLPMTU,
pl.pmtu is not changing,
pl.probe_size increases by SCTP_PL_BIG_STEP,
- Error -> Search, occurs when probe succeeds with BASE_PLPMTU,
pl.pmtu is changed from SCTP_MIN_PLPMTU to SCTP_BASE_PLPMTU,
pl.probe_size increases by SCTP_PL_BIG_STEP.
- Search -> Search Complete, occurs when probe succeeds with the probe
size SCTP_MAX_PLPMTU less than pl.probe_high,
pl.pmtu is not changing, but update *pathmtu* with it,
pl.probe_size is set back to pl.pmtu to double check it.
- Search Complete -> Search, occurs when probe succeeds with the probe
size equal to pl.pmtu,
pl.pmtu is not changing,
pl.probe_size increases by SCTP_PL_MIN_STEP.
So search process can be described as:
1. When it just enters 'Search' state, *pathmtu* is not updated with
pl.pmtu, and probe_size increases by a big step (SCTP_PL_BIG_STEP)
each round.
2. Until pl.probe_high is set when a probe fails, and probe_size
decreases back to pl.pmtu, as described in the last patch.
3. When the probe with the new size succeeds, probe_size changes to
increase by a small step (SCTP_PL_MIN_STEP) due to pl.probe_high
is set.
4. Until probe_size is next to pl.probe_high, the searching finishes and
it goes to 'Complete' state and updates *pathmtu* with pl.pmtu, and
then probe_size is set to pl.pmtu to confirm by once more probe.
5. This probe occurs after "30 * probe_inteval", a much longer time than
that in Search state. Once it is done it goes to 'Search' state again
with probe_size increased by SCTP_PL_MIN_STEP.
As we can see above, during the searching, pl.pmtu changes while *pathmtu*
doesn't. *pathmtu* is only updated when the search finishes by which it
gets an optimal value for it. A big step is used at the beginning until
it gets close to the optimal value, then it changes to a small step until
it has this optimal value.
The small step is also used in 'Complete' until it goes to 'Search' state
again and the probe with 'pmtu + the small step' succeeds, which means a
higher size could be used. Then probe_size changes to increase by a big
step again until it gets close to the next optimal value.
Note that anytime when black hole is detected, it goes directly to 'Base'
state with pl.pmtu set to SCTP_BASE_PLPMTU, as described in the last patch.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The state transition is described in rfc8899#section-5.2,
PROBE_COUNT == MAX_PROBES means the probe fails for MAX times, and the
state transition includes:
- Base -> Error, occurs when BASE_PLPMTU Confirmation Fails,
pl.pmtu is set to SCTP_MIN_PLPMTU,
probe_size is still SCTP_BASE_PLPMTU;
- Search -> Base, occurs when Black Hole Detected,
pl.pmtu is set to SCTP_BASE_PLPMTU,
probe_size is set back to SCTP_BASE_PLPMTU;
- Search Complete -> Base, occurs when Black Hole Detected
pl.pmtu is set to SCTP_BASE_PLPMTU,
probe_size is set back to SCTP_BASE_PLPMTU;
Note a black hole is encountered when a sender is unaware that packets
are not being delivered to the destination endpoint. So it includes the
probe failures with equal probe_size to pl.pmtu, and definitely not
include that with greater probe_size than pl.pmtu. The later one is the
normal probe failure where probe_size should decrease back to pl.pmtu
and pl.probe_high is set. pl.probe_high would be used on HB ACK recv
path in the next patch.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch does exactly what rfc8899#section-6.2.1.2 says:
The SCTP sender needs to be able to determine the total size of a
probe packet. The HEARTBEAT chunk could carry a Heartbeat
Information parameter that includes, besides the information
suggested in [RFC4960], the probe size to help an implementation
associate a HEARTBEAT ACK with the size of probe that was sent. The
sender could also use other methods, such as sending a nonce and
verifying the information returned also contains the corresponding
nonce. The length of the PAD chunk is computed by reducing the
probing size by the size of the SCTP common header and the HEARTBEAT
chunk.
Note that HB ACK chunk will carry back whatever HB chunk carried, including
the probe_size we put it in; We also check hbinfo->probe_size in the HB ACK
against link->pl.probe_size to validate this HB ACK chunk.
v1->v2:
- Remove the unused 'sp' and add static for sctp_packet_bundle_pad().
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are 3 timers described in rfc8899#section-5.1.1:
PROBE_TIMER, PMTU_RAISE_TIMER, CONFIRMATION_TIMER
This patches adds a 'probe_timer' in transport, and it works as either
PROBE_TIMER or PMTU_RAISE_TIMER. At most time, it works as PROBE_TIMER
and expires every a 'probe_interval' time to send the HB probe packet.
When transport pl enters COMPLETE state, it works as PMTU_RAISE_TIMER
and expires in 'probe_interval * 30' time to go back to SEARCH state
and do searching again.
SCTP HB is an acknowledged packet, CONFIRMATION_TIMER is not needed.
The timer will start when transport pl enters BASE state and stop
when it enters DISABLED state.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These are 4 constants described in rfc8899#section-5.1.2:
MAX_PROBES, MIN_PLPMTU, MAX_PLPMTU, BASE_PLPMTU;
And 2 variables described in rfc8899#section-5.1.3:
PROBED_SIZE, PROBE_COUNT;
And 5 states described in rfc8899#section-5.2:
DISABLED, BASE, SEARCH, SEARCH_COMPLETE, ERROR;
And these 4 APIs are used to reset/update PLPMTUD, check if PLPMTUD is
enabled, and calculate the additional headers length for a transport.
Note the member 'probe_high' in transport will be set to the probe
size when a probe fails with this probe size in the next patches.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With this socket option, users can change probe_interval for
a transport, asoc or sock after it's created.
Note that if the change is for an asoc, also apply the change
to each transport in this asoc.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
PLPMTUD can be enabled by doing 'sysctl -w net.sctp.probe_interval=n'.
'n' is the interval for PLPMTUD probe timer in milliseconds, and it
can't be less than 5000 if it's not 0.
All asoc/transport's PLPMTUD in a new socket will be enabled by default.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This chunk is defined in rfc4820#section-3, and used to pad an
SCTP packet. The receiver must discard this chunk and continue
processing the rest of the chunks in the packet.
Add it now, as it will be bundled with a heartbeat chunk to probe
pmtu in the following patches.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
iwmr->page_size stores the return from ib_umem_find_best_pgsz and maybe
zero when used in ib_umem_num_dma_blocks thus causing a divide by zero
error.
Fix this by erroring out of irdma_reg_user when 0 is returned from
ib_umem_find_best_pgsz.
Link: https://lore.kernel.org/r/20210622175232.439-3-tatyana.e.nikolova@intel.com
Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID: 1505149 ("Integer handling issues")
Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Share the addr_location of 'addr' so that it need not be resolved more than
once.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210621150514.32159-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To make it possible to use filtering with scripts, move filtering before
scripting.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210621150514.32159-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
'orignal' should be 'original'.
Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
Link: https://lore.kernel.org/r/1624011020-16992-11-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The QP type has been checked in check_send_valid(), if it's not RC, it
will process the UD/GSI branch.
Link: https://lore.kernel.org/r/1624011020-16992-10-git-send-email-liweihang@huawei.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The process of flushing CQE can be encapsultated into a function, which
can reduce duplicate code.
Link: https://lore.kernel.org/r/1624011020-16992-9-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
hns_roce_init_qp_table() will only return 0, because this function does
not need to return a value, so it is modified to void type.
Link: https://lore.kernel.org/r/1624011020-16992-8-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Remove unused members in EQ context structure.
Fixes: 782832f25404 ("RDMA/hns: Simplify the function config_eqc()")
Link: https://lore.kernel.org/r/1624011020-16992-7-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
When query_qp is called by userspace, max_send_wr and max_send_sge are set
to 0 by the kernel driver. However, the userspace does not use these two
return values from the kernel driver, but uses its own calculated values.
So there is no need for special treatment.
Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC")
Link: https://lore.kernel.org/r/1624011020-16992-6-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Some kernel ULPs need to use the return value of qp_init_attr, so add
member assignments for qp_init_attr.
Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC")
Link: https://lore.kernel.org/r/1624011020-16992-5-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Remove redundant print and fix a character type mismatch.
Fixes: 0e0ab04b5bbe ("RDMA/hns: Refactor the MTR creation flow")
Link: https://lore.kernel.org/r/1624011020-16992-4-git-send-email-liweihang@huawei.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
A random value will be returned if the condition below is not met, so it
needs to be initialized.
Fixes: 9ea9a53ea93b ("RDMA/hns: Add mapped page count checking for MTR")
Link: https://lore.kernel.org/r/1624011020-16992-3-git-send-email-liweihang@huawei.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
When a non-inline WR reuses a WQE that was used for inline last time, the
remaining inline flag should be cleared.
Fixes: 62490fd5a865 ("RDMA/hns: Avoid unnecessary memset on WQEs in post_send")
Link: https://lore.kernel.org/r/1624011020-16992-2-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Generally, it should be more efficient if filter_cpu() comes before
machine__resolve() because filter_cpu() is much less code than
machine__resolve().
Example:
$ perf record --sample-cpu -- make -C tools/perf >/dev/null
Before:
$ perf stat -- perf script -C 0 >/dev/null
Performance counter stats for 'perf script -C 0':
116.94 msec task-clock # 0.992 CPUs utilized
2 context-switches # 17.103 /sec
0 cpu-migrations # 0.000 /sec
8,187 page-faults # 70.011 K/sec
478,351,812 cycles # 4.091 GHz
564,785,464 instructions # 1.18 insn per cycle
114,341,105 branches # 977.789 M/sec
2,615,495 branch-misses # 2.29% of all branches
0.117840576 seconds time elapsed
0.085040000 seconds user
0.032396000 seconds sys
After:
$ perf stat -- perf script -C 0 >/dev/null
Performance counter stats for 'perf script -C 0':
107.45 msec task-clock # 0.992 CPUs utilized
3 context-switches # 27.919 /sec
0 cpu-migrations # 0.000 /sec
7,964 page-faults # 74.117 K/sec
438,417,260 cycles # 4.080 GHz
522,571,855 instructions # 1.19 insn per cycle
105,187,488 branches # 978.921 M/sec
2,356,261 branch-misses # 2.24% of all branches
0.108282546 seconds time elapsed
0.095935000 seconds user
0.011991000 seconds sys
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210621150514.32159-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Irrespective as to whether CONFIG_MODULE_SIG is configured, specifying
"module.sig_enforce=1" on the boot command line sets "sig_enforce".
Only allow "sig_enforce" to be set when CONFIG_MODULE_SIG is configured.
This patch makes the presence of /sys/module/module/parameters/sig_enforce
dependent on CONFIG_MODULE_SIG=y.
Fixes: fda784e50aac ("module: export module signature enforcement status")
Reported-by: Nayna Jain <nayna@linux.ibm.com>
Tested-by: Mimi Zohar <zohar@linux.ibm.com>
Tested-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Aharon Landau says:
====================
In case device supports only real-time timestamp, the kernel will fail to
create QP despite rdma-core requested such timestamp type.
It is because device returns free-running timestamp, and the conversion
from free-running to real-time is performed in the user space.
This series fixes it, by returning real-time timestamp.
====================
* mlx5_realtime_ts:
RDMA/mlx5: Support real-time timestamp directly from the device
RDMA/mlx5: Refactor get_ts_format functions to simplify code
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
We don't want compiler instrumentation to touch noinstr functions,
which are annotated with the no_profile_instrument_function function
attribute. Add a Kconfig test for this and make GCOV depend on it, and
in the future, PGO.
If an architecture is using noinstr, it should denote that via this
Kconfig value. That makes Kconfigs that depend on noinstr able to express
dependencies in an architecturally agnostic way.
Cc: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/lkml/YMTn9yjuemKFLbws@hirez.programming.kicks-ass.net/
Link: https://lore.kernel.org/lkml/YMcssV%2Fn5IBGv4f0@hirez.programming.kicks-ass.net/
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210621231822.2848305-4-ndesaulniers@google.com
|
|
Since
commit 6ec4476ac825 ("Raise gcc version requirement to 4.9")
we no longer support building the kernel with GCC 4.8; drop the
preprocess checks for __GNUC_MINOR__ version. It's implied that if
__GNUC_MAJOR__ is 4, then the only supported version of __GNUC_MINOR__
left is 9.
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210621231822.2848305-3-ndesaulniers@google.com
|
|
noinstr implies that we would like the compiler to avoid instrumenting a
function. Add support for the compiler attribute
no_profile_instrument_function to compiler_attributes.h, then add
__no_profile to the definition of noinstr.
Link: https://lore.kernel.org/lkml/20210614162018.GD68749@worktop.programming.kicks-ass.net/
Link: https://reviews.llvm.org/D104257
Link: https://reviews.llvm.org/D104475
Link: https://reviews.llvm.org/D104658
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223
Reviewed-by: Fangrui Song <maskray@google.com>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210621231822.2848305-2-ndesaulniers@google.com
|
|
This got added 14 years ago in 324ae4df00fd ("Btrfs: Add block group
pinned accounting back") but it was not ever used. Subsequently its
usage got gradually removed in 8790d502e440 ("Btrfs: Add support for
mirroring across drives") and 11833d66be94 ("Btrfs: improve async block
group caching"). Let's remove it for good!
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
This is a preliminary patch to add hw csum hint support to
mvneta/mvpp2 xdp implementation
Tested-by: Matteo Croce <mcroce@linux.microsoft.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Marcelo Ricardo Leitner says:
====================
tc-testing: add test for ct DNAT tuple collision
That was fixed in 13c62f5371e3 ("net/sched: act_ct: handle DNAT tuple
collision").
For that, it requires that tdc is able to send diverse packets with
scapy, which is then done on the 2nd patch of this series.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When this test fails, /proc/net/nf_conntrack gets only 1 entry:
ipv4 2 tcp 6 119 SYN_SENT src=10.0.0.10 dst=10.0.0.10 sport=5000 dport=10 [UNREPLIED] src=20.0.0.1 dst=10.0.0.10 sport=10 dport=5000 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=2
When it works, it gets 2 entries:
ipv4 2 tcp 6 119 SYN_SENT src=10.0.0.10 dst=10.0.0.20 sport=5000 dport=10 [UNREPLIED] src=20.0.0.1 dst=10.0.0.10 sport=10 dport=58203 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=2
ipv4 2 tcp 6 119 SYN_SENT src=10.0.0.10 dst=10.0.0.10 sport=5000 dport=10 [UNREPLIED] src=20.0.0.1 dst=10.0.0.10 sport=10 dport=5000 mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=0 use=2
The missing entry is because the 2nd packet hits a tuple collusion and the
conntrack entry doesn't get allocated.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It can be worth sending different scapy packets on a given test, as in the
last patch of this series. For that, lets listify the scapy attribute and
simply iterate over it.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
python lists don't have an 'add' method, but 'append'.
Fixes: 14e5175e9e04 ("tc-testing: introduce scapyPlugin for basic traffic")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Having a verbose option will allow shell tests to provide extra failure
details when the fail or skip.
Committer notes:
Keep the 'script' variable at PATH_MAX, as its just something we'll pass
to system(), not really a "path", so being arbitrary, reduce the patch
size by not adding the three extra bytes to the 'script' variable.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: bpf@vger.kernel.org
Link: http://lore.kernel.org/lkml/20210621215648.2991319-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When changing number of TX queues using ethtool:
# ethtool -L eth0 tx 1
[ 135.301047] Unable to handle kernel paging request at virtual address 00000000af5d0000
[...]
[ 135.525128] Call trace:
[ 135.525142] dma_release_from_dev_coherent+0x2c/0xb0
[ 135.525148] dma_free_attrs+0x54/0xe0
[ 135.525156] k3_cppi_desc_pool_destroy+0x50/0xa0
[ 135.525164] am65_cpsw_nuss_remove_tx_chns+0x88/0xdc
[ 135.525171] am65_cpsw_set_channels+0x3c/0x70
[...]
This is because k3_cppi_desc_pool_destroy() which is called after
k3_udma_glue_release_tx_chn() in am65_cpsw_nuss_remove_tx_chns()
references struct device that is unregistered at the end of
k3_udma_glue_release_tx_chn()
Therefore the right order is to call k3_cppi_desc_pool_destroy() and
destroy desc pool before calling k3_udma_glue_release_tx_chn().
Fix this throughout the driver.
Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds maintainer info for drivers/net/wwan subdir, including
WWAN core and drivers. Adding Sergey and myself as maintainers and
Johannes as reviewer.
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Acked-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This makes openvswitch module use the event tracing framework
to log the upcall interface and action execution pipeline. When
using openvswitch as the packet forwarding engine, some types of
debugging are made possible simply by using the ovs-vswitchd's
ofproto/trace command. However, such a command has some
limitations:
1. When trying to trace packets that go through the CT action,
the state of the packet can't be determined, and probably
would be potentially wrong.
2. Deducing problem packets can sometimes be difficult as well
even if many of the flows are known
3. It's possible to use the openvswitch module even without
the ovs-vswitchd (although, not common use).
Introduce the event tracing points here to make it possible for
working through these problems in kernel space. The style is
copied from the mac80211 driver-trace / trace code for
consistency - this creates some checkpatch splats, but the
official 'guide' for adding tracepoints, as well as the existing
examples all add the same splats so it seems acceptable.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The "mdio" variable is never set to false. Also it should be a bool
type instead of int.
Fixes: 30bba69d7db4 ("stmmac: pci: Add dwmac support for Loongson")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Linux 5.13-rc7
Needed for dependencies in following patches. Merge conflict in rxe_cmop.c
resolved by compining both patches.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Ido Schimmel says:
====================
ethtool: Module EEPROM API improvements
This patchset contains various improvements to recently introduced
module EEPROM netlink API. Noticed these while adding module EEPROM
write support.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|