Age | Commit message (Collapse) | Author |
|
The calls up from the napi poll reading the receive ring had many
places where an argument was being recreated. I.e the caller already
had the value and wasn't passing it, then the callee would use
known relationship to determine the same value. Simpler and faster
to just pass arguments needed.
Also, add const in a couple places where message is being only read.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Marcelo Ricardo Leitner says:
====================
sctp: refactor MTU handling
Currently MTU handling is spread over SCTP stack. There are multiple
places doing same/similar calculations and updating them is error prone
as one spot can easily be left out.
This patchset converges it into a more concise and consistent code. In
general, it moves MTU handling from functions with bigger objectives,
such as sctp_assoc_add_peer(), to specific functions.
It's also a preparation for the next patchset, which removes the
duplication between sctp_make_op_error_space and
sctp_make_op_error_fixed and relies on sctp_mtu_payload introduced here.
More details on each patch.
====================
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a CQ is shared by multiple QPs, c4iw_flush_hw_cq() needs to acquire
corresponding QP lock before moving the CQEs into its corresponding SW
queue and accessing the SQ contents for completing a WR.
Ignore CQEs if corresponding QP is already flushed.
Cc: stable@vger.kernel.org
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
RFC 6458 Section 8.1.16 says that setting MAXSEG as 0 means that the user
is not limiting it, and not that it should set to the *current* maximum,
as we are doing.
This patch thus allow setting it as 0, effectively removing the user
limit.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When setting SCTP_MAXSEG sock option, it should consider which kind of
data chunk is being used if the asoc is already available, so that the
limit better reflect reality.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
sctp_sendmsg() could trigger PMTU updates even when PMTU_DISABLED was
set, as pmtu_pending could be set unconditionally during icmp handling
if the socket was in use by the application.
This patch fixes it by checking for PMTU_DISABLED when handling such
deferred updates.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
sctp_transport_route currently is very similar to sctp_transport_pmtu plus
a few other bits.
This patch reuses sctp_transport_pmtu in sctp_transport_route and removes
the duplicated code.
Also, as all calls to sctp_transport_route were forcing the dst release
before calling it, let's just include such release too.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We are now keeping the MTU information synced between asoc, transport
and dst, which makes the check at sctp_packet_config() not needed
anymore. As it was the sole caller to this function, lets remove it.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Which makes sure that the MTU respects the minimum value of
SCTP_DEFAULT_MINSEGMENT and that it is correctly aligned.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
No need for this helper.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
and avoid the open-coded versions of it.
Now sctp_datamsg_from_user can just re-use asoc->frag_point as it will
always be updated.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When given a MTU, this function calculates how much payload we can carry
on it. Without a MTU, it calculates the amount of header overhead we
have.
So that when we have extra overhead, like the one added for IP options
on SELinux patches, it is easier to handle it.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
All changes to asoc PMTU should now go through this wrapper, making it
easier to track them and to do other actions upon it.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As noticed by Xin Long, the if() here is always true as PMTU can never
be 0.
Reported-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There was only one case that sctp_assoc_add_peer couldn't handle, which
is when SPP_PMTUD_DISABLE is set and pathmtu not initialized.
So add this situation to sctp_transport_route and reuse what was
already in there.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This value is not used anywhere in the code. In essence it is a
duplicate of SCTP_DEFAULT_MINSEGMENT, which is used by the stack.
SCTP_MIN_PMTU value makes more sense, but we should not change to it now
as it would risk breaking applications.
So this patch removes SCTP_MIN_PMTU and adjust the comment above it.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A vti6 interface can carry IPv4 packets too.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2018-04-26
This pull request includes fixes for mlx5 core and netdev driver.
Please pull and let me know if there's any problems.
For -stable v4.12
net/mlx5e: TX, Use correct counter in dma_map error flow
For -stable v4.13
net/mlx5: Avoid cleaning flow steering table twice during error flow
For -stable v4.14
net/mlx5e: Allow offloading ipv4 header re-write for icmp
For -stable v4.15
net/mlx5e: DCBNL fix min inline header size for dscp
For -stable v4.16
net/mlx5: Fix mlx5_get_vector_affinity function
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fixes a crash that happens due to access to an
uninitialized DM pointer within the MR object.
The change makes sure the DM pointer in the MR object is set to
NULL during a non-DM MR creation to prevent a false indication
that this MR is related to a DM in the dereg flow.
Fixes: be934cca9e98 ("IB/uverbs: Add device memory registration ioctl support")
Reported-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
This patch adds a check in the ib_uverbs_rereg_mr flow to make
sure there's no attempt to rereg a device memory MR to regular MR.
In such case the command will fail with -EINVAL status.
fixes: be934cca9e98 ("IB/uverbs: Add device memory registration ioctl support")
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Despite being advertised to user space application, the RSS inner
header flag was filtered by checks at the beginning of QP creation
routine.
Cc: <stable@vger.kernel.org> # 4.15
Fixes: 4d02ebd9bbbd ("IB/mlx4: Fix RSS hash fields restrictions")
Fixes: 07d84f7b6adf ("IB/mlx4: Add support to RSS hash for inner headers")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
This patch fixes two spelling errors.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
When generated bad work reqeust, it needs to
report to user. This patch mainly fixes it.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
When posting a work reqeust, it need to update the owner bit of send
wqe. This patch mainly fix the bug when posting multiply work
request.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
This patch update the order of cleaning hem table for trrl_table and irrl_table
as well as mtt_cqe_table and mtt_table.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Only when the IB_QP_PATH_DEST_QPN flag of attr_mask is set
is it valid to assign the dqpn field of qp context
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
This patch deletes some unnecessary attr_mask if condition
in hip08 according to the IB protocol.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Only when the IB_QP_PATH_MTU flag of attr_mask is set
it is valid to assign the mtu field of qp context when
qp type is not GSI and UD.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
According to RoCE protocol, it is possible to
transition from error to error state for modifying
qp in hip08. This patch fix it.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
RDMA read operation is not supported inline data. If user cofigures
issue a RDMA read and use inline data, it will happen a hardware
error.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
During init hem table, type should be used instead of
table->type which is finally initializaed with type.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
When skb is sent, it will pass the following functions in soft roce.
rxe_send [rdma_rxe]
ip_local_out
__ip_local_out
ip_output
ip_finish_output
ip_finish_output2
dev_queue_xmit
__dev_queue_xmit
dev_hard_start_xmit
In the above functions, if error occurs in the above functions or
iptables rules drop skb after ip_local_out, kfree_skb will be called.
So it is not necessary to call kfree_skb in soft roce module again.
Or else crash will occur.
The steps to reproduce:
server client
--------- ---------
|1.1.1.1|<----rxe-channel--->|1.1.1.2|
--------- ---------
On server: rping -s -a 1.1.1.1 -v -C 10000 -S 512
On client: rping -c -a 1.1.1.1 -v -C 10000 -S 512
The kernel configs CONFIG_DEBUG_KMEMLEAK and
CONFIG_DEBUG_OBJECTS are enabled on both server and client.
When rping runs, run the following command in server:
iptables -I OUTPUT -p udp --dport 4791 -j DROP
Without this patch, crash will occur.
CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
w/o RXE_START_MASK, the last_psn of IB_OPCODE_RC_SEND_ONLY_INV
will not be updated in update_wqe_psn, and the corresponding
wqe will not be acked in rxe_completer due to its last_psn is
zero. Finally, the other wqe will also not be able to be acked,
because the wqe of IB_OPCODE_RC_SEND_ONLY_INV with last_psn 0
is still there. This causes large amount of io timeout when
nvmeof is over rxe.
Add RXE_START_MASK for IB_OPCODE_RC_SEND_ONLY_INV to fix this.
Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
In the cases where iwpm_hash_bucket is NULL and where function
get_mapinfo_hash_bucket returns NULL then the map_info is never added
to hash_bucket_head and hence there is a leak of map_info. Fix this
by nullifying hash_bucket_head and if that is null we know that
that map_info was not added to hash_bucket_head and hence map_info
should be free'd.
Detected by CoverityScan, CID#1222481 ("Resource Leak")
Fixes: 30dc5e63d6a5 ("RDMA/core: Add support for iWARP Port Mapper user space service")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Nothing too bad, but the spectre updates to smatch identified a few
places that may need sanitising so we've got those covered.
Details:
- Close some potential spectre-v1 vulnerabilities found by smatch
- Add missing list sentinel for CPUs that don't require KPTI
- Removal of unused 'addr' parameter for I/D cache coherency
- Removal of redundant set_fs(KERNEL_DS) calls in ptrace
- Fix single-stepping state machine handling in response to kernel
traps
- Clang support for 128-bit integers
- Avoid instrumenting our out-of-line atomics in preparation for
enabling LSE atomics by default in 4.18"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: avoid instrumenting atomic_ll_sc.o
KVM: arm/arm64: vgic: fix possible spectre-v1 in vgic_mmio_read_apr()
KVM: arm/arm64: vgic: fix possible spectre-v1 in vgic_get_irq()
arm64: fix possible spectre-v1 in ptrace_hbp_get_event()
arm64: support __int128 with clang
arm64: only advance singlestep for user instruction traps
arm64/kernel: rename module_emit_adrp_veneer->module_emit_veneer_for_adrp
arm64: ptrace: remove addr_limit manipulation
arm64: mm: drop addr parameter from sync icache and dcache
arm64: add sentinel to kpti_safe_list
|
|
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.
Fix this by returning 'netdev_tx_t' in this driver too.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.
Fix this by returning 'netdev_tx_t' in this driver too.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
In tcp_select_initial_window(), we only set rcv_wnd to
tcp_default_init_rwnd() if current mss > (1 << wscale). Otherwise,
rcv_wnd is kept at the full receive space of the socket which is a
value way larger than tcp_default_init_rwnd().
With larger initial rcv_wnd value, receive buffer autotuning logic
takes longer to kick in and increase the receive buffer.
In a TCP throughput test where receiver has rmem[2] set to 125MB
(wscale is 11), we see the connection gets recvbuf limited at the
beginning of the connection and gets less throughput overall.
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ursula Braun says:
====================
smc fixes from 2018-04-17 - v3
in the mean time we challenged the benefit of these CLC handshake
optimizations for the sockopts TCP_NODELAY and TCP_CORK.
We decided to give up on them for now, since SMC still works
properly without.
There is now version 3 of the patch series with patches 2-4 implementing
sockopts that require special handling in SMC.
Version 3 changes
* no deferring of setsockopts TCP_NODELAY and TCP_CORK anymore
* allow fallback for some sockopts eliminating SMC usage
* when setting TCP_NODELAY always enforce data transmission
(not only together with corked data)
Version 2 changes of Patch 2/4 (and 3/4):
* return error -EOPNOTSUPP for TCP_FASTOPEN sockopts
* fix a kernel_setsockopt() usage bug by switching parameter
variable from type "u8" to "int"
* add return code validation when calling kernel_setsockopt()
* propagate a setsockopt error on the internal CLC socket
to the SMC socket.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If sockopt TCP_DEFER_ACCEPT is set, the accept is delayed till
data is available.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Setting sockopt TCP_NODELAY or resetting sockopt TCP_CORK
triggers data transfer.
For a corked SMC socket RDMA writes are deferred, if there is
still sufficient send buffer space available.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Several TCP sockopts do not work for SMC. One example are the
TCP_FASTOPEN sockopts, since SMC-connection setup is based on the TCP
three-way-handshake.
If the SMC socket is still in state SMC_INIT, such sockopts trigger
fallback to TCP. Otherwise an error is returned.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The struct smc_cdc_msg must be defined as packed so the
size is 44 bytes.
And change the structure size check so sizeof is checked.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull modules fix from Jessica Yu:
"Fix display of module section addresses in sysfs, which were getting
hashed with %pK and breaking tools like perf"
* tag 'modules-for-v4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: Fix display of wrong module .text address
|
|
After many years of having a ~30 line copyright and license header to our
source files, we are finally able to reduce that to one line with the
advent of the SPDX identifier.
Also caught a few files missing the SPDX license identifier, so fixed
them up.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are few issues with validation of netdevice and listen id lookup
for IB (IPoIB) while processing incoming CM request as below.
1. While performing lookup of bind_list in cma_ps_find(), net namespace
of the netdevice can get deleted in cma_exit_net(), resulting in use
after free access of idr and/or net namespace structures.
This lookup occurs from the workqueue context (and not userspace
context where net namespace is always valid).
CPU0 CPU1
==== ====
bind_list = cma_ps_find();
move netdevice to new namespace
delete net namespace
cma_exit_net()
idr_destroy(idr);
[..]
cma_find_listener(bind_list, ..);
2. While netdevice is validated for IP address in given net namespace,
netdevice's net namespace and/or ifindex can change in
cma_get_net_dev() and cma_match_net_dev().
Above issues are overcome by using rcu lock along with netdevice
UP/DOWN state as described below.
When a net namespace is getting deleted, netdevice is closed and
shutdown before moving it back to init_net namespace.
change_net_namespace() synchronizes with any existing use of netdevice
before changing the netdev properties such as net or ifindex.
Once netdevice IFF_UP flags is cleared, such fields are not guaranteed
to be valid.
Therefore, rcu lock along with netdevice state check ensures that,
while route lookup and cm_id lookup is in progress, netdevice of
interest won't migrate to any other net namespace.
This ensures that associated net namespace of netdevice won't get
deleted while rcu lock is held for netdevice which is in IFF_UP state.
Fixes: fa20105e09e9 ("IB/cma: Add support for network namespaces")
Fixes: 4be74b42a6d0 ("IB/cma: Separate port allocation to network namespaces")
Fixes: f887f2ac87c2 ("IB/cma: Validate routing of incoming requests")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Pull ceph fixes from Ilya Dryomov:
"A CephFS quota follow-up and fixes for two older issues in the
messenger layer, marked for stable"
* tag 'ceph-for-4.17-rc3' of git://github.com/ceph/ceph-client:
libceph: validate con->state at the top of try_write()
libceph: reschedule a tick in finish_hunting()
libceph: un-backoff on tick when we have a authenticated session
ceph: check if mds create snaprealm when setting quota
|
|
Previously, if a method contained mandatory attributes in a namespace
that wasn't given by the user, these attributes weren't validated.
Fixing this by iterating over all specification namespaces.
Fixes: fac9658cabb9 ("IB/core: Add new ioctl interface")
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
kbuild test robot says:
>coccinelle warnings: (new ones prefixed by >>)
>>> net/core/dev.c:1588:2-3: Unneeded semicolon
So, let's remove it.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The c4iw_rdev_close() logic was not releasing all the hw
resources (PBL and RQT memory) during the device removal
event (driver unload / system reboot). This can cause panic
in gen_pool_destroy().
The module remove function will wait for all the hw
resources to be released during the device removal event.
Fixes c12a67fe(iw_cxgb4: free EQ queue memory on last deref)
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Cc: stable@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
|