Age | Commit message (Collapse) | Author |
|
Maciej W. Rozycki says:
====================
FDDI: DEC FDDIcontroller 700 TURBOchannel adapter support
This is an update to <http://patchwork.ozlabs.org/patch/342737/>. I
believe I have addressed all the requests made in the previous review
round.
There is still one `checkpatch.pl' warning remaining:
WARNING: quoted string split across lines
+ pr_info("%s: ROM rev. %.4s, firmware rev. %.4s, RMC rev. %.4s, "
+ "SMT ver. %u\n", fp->name, rom_rev, fw_rev, rmc_rev, smt_ver);
total: 0 errors, 1 warnings, 2458 lines checked
however I think the value of staying within 80 columns is higher than the
value of having the string on a single line. This is because with all the
formatting specifiers there it is not directly greppable based on the
final output produced to the kernel log on one hand, e.g.:
tc2: ROM rev. 1.0, firmware rev. 1.2, RMC rev. A, SMT ver. 1
while it can be easily tracked down by grepping for an obvious substring
such as "RMC rev" on the other.
The issue with MMIO barriers I discussed in the course of the original
review turned out mostly irrelevant to this driver, because as I have
learnt in a recent Alpha/Linux discussion starting here:
<https://marc.info/?i=alpine.LRH.2.02.1808161556450.13597%20()%20file01%20!%20intranet%20!%20prod%20!%20int%20!%20rdu2%20!%20redhat%20!%20com>
our MMIO API mandates the `readX' and `writeX' accessors to be strongly
ordered with respect to each other, even if that is not implicitly
enforced by hardware.
Consequently I have removed all the explicit ordering barriers and
instead submitted a fix for MIPS MMIO implementation, which currently does
not guarantee strong ordering (the MIPS architecture does not define bus
ordering rules except in terms of SYNC barriers), as recorded here:
<https://patchwork.linux-mips.org/project/linux-mips/list/?series=1538>.
Enforcing strong MMIO ordering can be costly however and is often
unnecessary, e.g. when using PIO to access network frame data in onboard
packet memory. I have therefore retained the information that would be
lost by the removal of barriers, by defining accessor wrappers suffixed by
`_o' and `_u', for accesses that have to be ordered and can be unordered
respectively.
If we ever have an API defined for weakly-ordered MMIO accesses, then
these wrappers can be redefined accordingly. Right now they all expand to
the respective `_relaxed' accessors, because, again, enforcing the
ordering WRT DMA transfers can be costly and we don't need it here except
in one place, where I chose to use explicit `dma_rmb' instead.
Similarly I have replaced the completion barriers with a read back from
the respective MMIO location (all adapter MMIO registers can be read with
no side effects incurred), which will serve its purpose on the basis of
MMIO being strongly ordered (although a read from TURBOchannel is going to
be slower than `iob', making the delay incurred unnecessarily longer).
And last but not least, I have split off the SMT Tx network tap support
to a separate change, 2/2 in this series, so that it does not block the
driver proper and can be discussed separately.
I think it has value in that it makes the view of the outgoing network
traffic complete, as if one actually physically tapped into the outgoing
line of the ring, between the station being examined and its downstream
neighbour. Without this part only traffic passed from applications
through the whole protocol stack can be captured and this is only a part
of the view.
With the `dev_queue_xmit_nit' interface now exported it's only
`ptype_all' that remains private, and to define a properly abstracted API
I propose to provide am exported `dev_nit_active' predicate that tells
whether any taps are active. This predicate is then used accordingly.
NB if there is a long-term maintenance concern about the `dev_nit_active'
predicate, then well, corresponding inline code currently present in
`xmit_one' has to be maintained anyway, and if the resulting changes
require `defza' to be updated accordingly, then I am going to handle it;
after some 20 years with Linux it's not that I am going to disappear
anywhere anytime. And once I am dead, which is inevitably going to happen
sooner or later, then the driver can simply be ripped from the kernel.
Though I suspect that at that point no DECstation Linux users may survive
anymore, even though hardware, being as sturdy as it is, likely will.
I have a patch for `tcpdump' to actually decode SMT frames, which I plan
to upstream sometime. Here's a sample of SMT traffic captured through the
`defza' driver in a small network of 4 stations and no concentrators,
printed in the most verbose mode:
01:16:59.138381 4f 00:60:b0:58:41:e7 00:60:b0:58:41:e7 73: SMT NIF ann vid:1 tid:00000270 sid:00-00-00-60-b0-58-41-e7 len:40: UNA: 00 00 00 06 0d 1a 02 ae StationDescr: 00 01 02 00 StationState: 00 00 30 00 MACFrameStatusFunctions.3: 00 00 00 01
01:17:00.332750 4f 08:00:2b:a3:a3:29 08:00:2b:a3:a3:29 73: SMT NIF ann vid:1 tid:0000013b sid:00-00-08-00-2b-a3-a3-29 len:40: UNA: 00 00 00 06 0d 1a 82 e7 StationDescr: 00 01 02 00 StationState: 00 00 30 00 MACFrameStatusFunctions.3: 00 00 00 01
01:17:00.354479 4f 00:60:b0:58:40:75 00:60:b0:58:40:75 73: SMT NIF ann vid:1 tid:0000029c sid:00-00-00-60-b0-58-40-75 len:40: UNA: 00 00 10 00 d4 74 b6 ae StationDescr: 00 01 02 00 StationState: 00 00 31 00 MACFrameStatusFunctions.3: 00 00 00 01
01:17:00.442175 4f 00:60:b0:58:41:e7 Broadcast 73: SMT NIF req vid:1 tid:00000271 sid:00-00-00-60-b0-58-41-e7 len:40: UNA: 00 00 00 06 0d 1a 02 ae StationDescr: 00 01 02 00 StationState: 00 00 30 00 MACFrameStatusFunctions.3: 00 00 00 01
01:17:00.448657 41 08:00:2b:a3:a3:29 00:60:b0:58:41:e7 73: SMT NIF rsp vid:1 tid:00000271 sid:00-00-08-00-2b-a3-a3-29 len:40: UNA: 00 00 00 06 0d 1a 82 e7 StationDescr: 00 01 02 00 StationState: 00 00 30 00 MACFrameStatusFunctions.3: 00 00 00 01
01:17:01.015152 4f 08:00:2b:a3:a3:29 Broadcast 73: SMT NIF req vid:1 tid:0000013c sid:00-00-08-00-2b-a3-a3-29 len:40: UNA: 00 00 00 06 0d 1a 82 e7 StationDescr: 00 01 02 00 StationState: 00 00 30 00 MACFrameStatusFunctions.3: 00 00 00 01
01:17:01.111644 41 08:00:2b:2e:6d:75 08:00:2b:a3:a3:29 73: SMT NIF rsp vid:1 tid:0000013c sid:00-00-08-00-2b-2e-6d-75 len:40: UNA: 00 00 10 00 d4 c5 c5 94 StationDescr: 00 01 01 00 StationState: 00 00 11 00 MACFrameStatusFunctions.2: 00 00 00 01
01:17:04.814603 4f 08:00:2b:2e:6d:75 Broadcast 73: SMT NIF req vid:1 tid:0000013c sid:00-00-08-00-2b-2e-6d-75 len:40: UNA: 00 00 10 00 d4 c5 c5 94 StationDescr: 00 01 01 00 StationState: 00 00 11 00 MACFrameStatusFunctions.2: 00 00 00 01
01:17:04.814939 4f 08:00:2b:2e:6d:75 Broadcast 73: SMT NIF req vid:1 tid:0000013c sid:00-00-08-00-2b-2e-6d-75 len:40: UNA: 00 00 10 00 d4 c5 c5 94 StationDescr: 00 01 01 00 StationState: 00 00 11 00 MACFrameStatusFunctions.2: 00 00 00 01
01:17:04.820960 4f 08:00:2b:2e:6d:75 08:00:2b:2e:6d:75 73: SMT NIF ann vid:1 tid:0000013b sid:00-00-08-00-2b-2e-6d-75 len:40: UNA: 00 00 10 00 d4 c5 c5 94 StationDescr: 00 01 01 00 StationState: 00 00 11 00 MACFrameStatusFunctions.2: 00 00 00 01
Questions, comments? Otherwise, please apply.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
DEC FDDIcontroller 700 (DEFZA) uses a Tx/Rx queue pair to communicate
SMT frames with adapter's firmware. Any SMT frame received from the RMC
via the Rx queue is queued back by the driver to the SMT Rx queue for
the firmware to process. Similarly the firmware uses the SMT Tx queue
to supply the driver with SMT frames which are queued back to the Tx
queue for the RMC to send to the ring.
When a network tap is attached to an FDDI interface handled by `defza'
any incoming SMT frames captured are queued to our usual processing of
network data received, which in turn delivers them to any listening
taps.
However the outgoing SMT frames produced by the firmware bypass our
network protocol stack and are therefore not delivered to taps. This in
turn means that taps are missing a part of network traffic sent by the
adapter, which may make it more difficult to track down network problems
or do general traffic analysis.
Call `dev_queue_xmit_nit' then in the SMT Tx path, having checked that
a network tap is attached, with a newly-created `dev_nit_active' helper
wrapping the usual condition used in the transmit path.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for the DEC FDDIcontroller 700 (DEFZA), Digital Equipment
Corporation's first-generation FDDI network interface adapter, made for
TURBOchannel and based on a discrete version of what eventually became
Motorola's widely used CAMEL chipset.
The CAMEL chipset is present for example in the DEC FDDIcontroller
TURBOchannel, EISA and PCI adapters (DEFTA/DEFEA/DEFPA) that we support
with the `defxx' driver, however the host bus interface logic and the
firmware API are different in the DEFZA and hence a separate driver is
required.
There isn't much to say about the driver except that it works, but there
is one peculiarity to mention. The adapter implements two Tx/Rx queue
pairs.
Of these one pair is the usual network Tx/Rx queue pair, in this case
used by the adapter to exchange frames with the ring, via the RMC (Ring
Memory Controller) chip. The Tx queue is handled directly by the RMC
chip and resides in onboard packet memory. The Rx queue is maintained
via DMA in host memory by adapter's firmware copying received data
stored by the RMC in onboard packet memory.
The other pair is used to communicate SMT frames with adapter's
firmware. Any SMT frame received from the RMC via the Rx queue must be
queued back by the driver to the SMT Rx queue for the firmware to
process. Similarly the firmware uses the SMT Tx queue to supply the
driver with SMT frames that must be queued back to the Tx queue for the
RMC to send to the ring.
This solution was chosen because the designers ran out of PCB space and
could not squeeze in more logic onto the board that would be required to
handle this SMT frame traffic without the need to involve the driver, as
with the later DEFTA/DEFEA/DEFPA adapters.
Finally the driver does some Frame Control byte decoding, so to avoid
magic numbers some macros are added to <linux/if_fddi.h>.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Configuring generic network device parameters on tun will fail in
presence of IFLA_INFO_KIND attribute in IFLA_LINKINFO nested attribute
since tun_validate() always return failure.
This can be visualized with following ip-link(8) command sequences:
# ip link set dev tun0 group 100
# ip link set dev tun0 group 100 type tun
RTNETLINK answers: Invalid argument
with contrast to dummy and veth drivers:
# ip link set dev dummy0 group 100
# ip link set dev dummy0 type dummy
# ip link set dev veth0 group 100
# ip link set dev veth0 group 100 type veth
Fix by returning zero in tun_validate() when @data is NULL that is
always in case since rtnl_link_ops->maxtype is zero in tun driver.
Fixes: f019a7a594d9 ("tun: Implement ip link del tunXXX")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In dev_ethtool(), the eth command 'ethcmd' is firstly copied from the
use-space buffer 'useraddr' and checked to see whether it is
ETHTOOL_PERQUEUE. If yes, the sub-command 'sub_cmd' is further copied from
the user space. Otherwise, 'sub_cmd' is the same as 'ethcmd'. Next,
according to 'sub_cmd', a permission check is enforced through the function
ns_capable(). For example, the permission check is required if 'sub_cmd' is
ETHTOOL_SCOALESCE, but it is not necessary if 'sub_cmd' is
ETHTOOL_GCOALESCE, as suggested in the comment "Allow some commands to be
done by anyone". The following execution invokes different handlers
according to 'ethcmd'. Specifically, if 'ethcmd' is ETHTOOL_PERQUEUE,
ethtool_set_per_queue() is called. In ethtool_set_per_queue(), the kernel
object 'per_queue_opt' is copied again from the user-space buffer
'useraddr' and 'per_queue_opt.sub_command' is used to determine which
operation should be performed. Given that the buffer 'useraddr' is in the
user space, a malicious user can race to change the sub-command between the
two copies. In particular, the attacker can supply ETHTOOL_PERQUEUE and
ETHTOOL_GCOALESCE to bypass the permission check in dev_ethtool(). Then
before ethtool_set_per_queue() is called, the attacker changes
ETHTOOL_GCOALESCE to ETHTOOL_SCOALESCE. In this way, the attacker can
bypass the permission check and execute ETHTOOL_SCOALESCE.
This patch enforces a check in ethtool_set_per_queue() after the second
copy from 'useraddr'. If the sub-command is different from the one obtained
in the first copy in dev_ethtool(), an error code EINVAL will be returned.
Fixes: f38d138a7da6 ("net/ethtool: support set coalesce per queue")
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In ethtool_get_rxnfc(), the eth command 'cmd' is compared against
'ETHTOOL_GRXFH' to see whether it is necessary to adjust the variable
'info_size'. Then the whole structure of 'info' is copied from the
user-space buffer 'useraddr' with 'info_size' bytes. In the following
execution, 'info' may be copied again from the buffer 'useraddr' depending
on the 'cmd' and the 'info.flow_type'. However, after these two copies,
there is no check between 'cmd' and 'info.cmd'. In fact, 'cmd' is also
copied from the buffer 'useraddr' in dev_ethtool(), which is the caller
function of ethtool_get_rxnfc(). Given that 'useraddr' is in the user
space, a malicious user can race to change the eth command in the buffer
between these copies. By doing so, the attacker can supply inconsistent
data and cause undefined behavior because in the following execution 'info'
will be passed to ops->get_rxnfc().
This patch adds a necessary check on 'info.cmd' and 'cmd' to confirm that
they are still same after the two copies in ethtool_get_rxnfc(). Otherwise,
an error code EINVAL will be returned.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Originally, we have an issue where r8169 MSI-X interrupt is broken after
S3 suspend/resume on RTL8106e of ASUS X441UAR.
02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller [10ec:8136]
(rev 07)
Subsystem: ASUSTeK Computer Inc. RTL810xE PCI Express Fast
Ethernet controller [1043:200f]
Flags: bus master, fast devsel, latency 0, IRQ 16
I/O ports at e000 [size=256]
Memory at ef100000 (64-bit, non-prefetchable) [size=4K]
Memory at e0000000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 01
Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
Capabilities: [d0] Vital Product Data
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Virtual Channel
Capabilities: [160] Device Serial Number 01-00-00-00-36-4c-e0-00
Capabilities: [170] Latency Tolerance Reporting
Kernel driver in use: r8169
Kernel modules: r8169
We found the all of the values in PCI BAR=4 of the ethernet adapter
become 0xFF after system resumes. That breaks the MSI-X interrupt.
Therefore, we can only fall back to MSI interrupt to fix the issue at
that time.
However, there is a commit which resolves the drivers getting nothing in
PCI BAR=4 after system resumes. It is 04cb3ae895d7 "PCI: Reprogram
bridge prefetch registers on resume" by Daniel Drake.
After apply the patch, the ethernet adapter works fine before suspend
and after resume. So, we can revert the workaround after the commit
"PCI: Reprogram bridge prefetch registers on resume" is merged into main
tree.
This patch reverts commit 7bb05b85bc2d1a1b647b91424b2ed4a18e6ecd81
"r8169: don't use MSI-X on RTL8106e".
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=201181
Fixes: 7bb05b85bc2d ("r8169: don't use MSI-X on RTL8106e")
Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 0b9871a3a8cc7234c285b5d9bf66cc6712cfee7c.
Causes crashes with qemu, interacts badly with commit commit
6d0a70a284be ("vsprintf: print OF node name using full_name")
etc.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a way of creating maps from user space. The command takes
as parameters most of the attributes of the map creation system
call command. After map is created its pinned to bpffs. This makes
it possible to easily and dynamically (without rebuilding programs)
test various corner cases related to map creation.
Map type names are taken from bpftool's array used for printing.
In general these days we try to make use of libbpf type names, but
there are no map type names in libbpf as of today.
As with most features I add the motivation is testing (offloads) :)
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
John Fastabend says:
====================
The first patch adds support for attaching programs to maps. This is
needed to support sock{map|hash} use from bpftool. Currently, I carry
around custom code to do this so doing it using standard bpftool will
be great.
The second patch adds a compat mode to ignore non-zero entries in
the map def. This allows using bpftool with maps that have a extra
fields that the user knows can be ignored. This is needed to work
correctly with maps being loaded by other tools or directly via
syscalls.
v3: add bash completion and doc updates for --mapcompat
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Multiple map definition structures exist and user may have non-zero
fields in their definition that are not recognized by bpftool and
libbpf. The normal behavior is to then fail loading the map. Although
this is a good default behavior users may still want to load the map
for debugging or other reasons. This patch adds a --mapcompat flag
that can be used to override the default behavior and allow loading
the map even when it has additional non-zero fields.
For now the only user is 'bpftool prog' we can switch over other
subcommands as needed. The library exposes an API that consumes
a flags field now but I kept the original API around also in case
users of the API don't want to expose this. The flags field is an
int in case we need more control over how the API call handles
errors/features/etc in the future.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Sock map/hash introduce support for attaching programs to maps. To
date I have been doing this with custom tooling but this is less than
ideal as we shift to using bpftool as the single CLI for our BPF uses.
This patch adds new sub commands 'attach' and 'detach' to the 'prog'
command to attach programs to maps and then detach them.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Joe Stringer says:
====================
This series includes a couple of fixups for the IPv6 socket lookup
helper, to make the API more consistent (always supply all arguments in
network byte-order) and to allow its use when IPv6 is compiled as a
module.
====================
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Commit 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF")
mistakenly passed the destination port in network byte-order to the IPv6
TCP/UDP socket lookup functions, which meant that BPF writers would need
to either manually swap the byte-order of this field or otherwise IPv6
sockets could not be located via this helper.
Fix the issue by swapping the byte-order appropriately in the helper.
This also makes the API more consistent with the IPv4 version.
Fixes: 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF")
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This is a more complete fix than d71019b54bff ("net: core: Fix build
with CONFIG_IPV6=m"), so that IPv6 sockets may be looked up if the IPv6
module is loaded (not just if it's compiled in).
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This documentation was inadvertently released under the CC-BY-SA-4.0
license. It was intended to be released under GPL-2.0 or later.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
|
|
The IDA was declared on the stack instead of statically, so lockdep
triggered a warning that it was improperly initialised.
Reported-by: 0day bot
Tested-by: Rong Chen <rong.a.chen@intel.com>
Signed-off-by: Matthew Wilcox <willy@infradead.org>
|
|
Daniel Borkmann says:
====================
This work adds a generic sk_msg layer and converts both sockmap
and later ktls over to make use of it as a common data structure
for application data (similarly as sk_buff for network packets).
With that in place the sk_msg framework spans accross ULP layer
in the kernel and allows for introspection or filtering of L7
data with the help of BPF programs operating on a common input
context.
In a second step, we enable the latter for ktls which was previously
not possible, meaning, ktls and sk_msg verdict programs were
mutually exclusive in the ULP layer which created challenges for
the orchestrator when trying to apply TCP based policy, for
example. Leveraging the prior consolidation we can finally overcome
this limitation.
Note, there's no change in behavior when ktls is not used in
combination with BPF, and also no change in behavior for stand
alone sockmap. The kselftest suites for ktls, sockmap and ktls
with sockmap combined also runs through successfully. For further
details please see individual patches.
Thanks!
v1 -> v2:
- Removed leftover comment spotted by Alexei
- Improved commit messages, rebase
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a MAINTAINERS entry to the skmsg and related files such that
patches, features, bug reports land with the right Cc.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This adds a --ktls option to test_sockmap in order to enable the
combination of ktls and sockmap to run, which makes for another
batch of 648 test cases for both in combination.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This work adds BPF sk_msg verdict program support to kTLS
allowing BPF and kTLS to be combined together. Previously kTLS
and sk_msg verdict programs were mutually exclusive in the
ULP layer which created challenges for the orchestrator when
trying to apply TCP based policy, for example. To resolve this,
leveraging the work from previous patches that consolidates
the use of sk_msg, we can finally enable BPF sk_msg verdict
programs so they continue to run after the kTLS socket is
created. No change in behavior when kTLS is not used in
combination with BPF, the kselftest suite for kTLS also runs
successfully.
Joint work with Daniel.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Instead of re-implementing poll routine use the poll callback to
trigger read from kTLS, we reuse the stream_memory_read callback
which is simpler and achieves the same. This helps to align sockmap
and kTLS so we can more easily embed BPF in kTLS.
Joint work with Daniel.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Convert kTLS over to make use of sk_msg interface for plaintext and
encrypted scattergather data, so it reuses all the sk_msg helpers
and data structure which later on in a second step enables to glue
this to BPF.
This also allows to remove quite a bit of open coded helpers which
are covered by the sk_msg API. Recent changes in kTLs 80ece6a03aaf
("tls: Remove redundant vars from tls record structure") and
4e6d47206c32 ("tls: Add support for inplace records encryption")
changed the data path handling a bit; while we've kept the latter
optimization intact, we had to undo the former change to better
fit the sk_msg model, hence the sg_aead_in and sg_aead_out have
been brought back and are linked into the sk_msg sgs. Now the kTLS
record contains a msg_plaintext and msg_encrypted sk_msg each.
In the original code, the zerocopy_from_iter() has been used out
of TX but also RX path. For the strparser skb-based RX path,
we've left the zerocopy_from_iter() in decrypt_internal() mostly
untouched, meaning it has been moved into tls_setup_from_iter()
with charging logic removed (as not used from RX). Given RX path
is not based on sk_msg objects, we haven't pursued setting up a
dummy sk_msg to call into sk_msg_zerocopy_from_iter(), but it
could be an option to prusue in a later step.
Joint work with John.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a generic sk_msg layer, and convert current sockmap and later
kTLS over to make use of it. While sk_buff handles network packet
representation from netdevice up to socket, sk_msg handles data
representation from application to socket layer.
This means that sk_msg framework spans across ULP users in the
kernel, and enables features such as introspection or filtering
of data with the help of BPF programs that operate on this data
structure.
Latter becomes in particular useful for kTLS where data encryption
is deferred into the kernel, and as such enabling the kernel to
perform L7 introspection and policy based on BPF for TLS connections
where the record is being encrypted after BPF has run and came to
a verdict. In order to get there, first step is to transform open
coding of scatter-gather list handling into a common core framework
that subsystems can use.
The code itself has been split and refactored into three bigger
pieces: i) the generic sk_msg API which deals with managing the
scatter gather ring, providing helpers for walking and mangling,
transferring application data from user space into it, and preparing
it for BPF pre/post-processing, ii) the plain sock map itself
where sockets can be attached to or detached from; these bits
are independent of i) which can now be used also without sock
map, and iii) the integration with plain TCP as one protocol
to be used for processing L7 application data (later this could
e.g. also be extended to other protocols like UDP). The semantics
are the same with the old sock map code and therefore no change
of user facing behavior or APIs. While pursuing this work it
also helped finding a number of bugs in the old sockmap code
that we've fixed already in earlier commits. The test_sockmap
kselftest suite passes through fine as well.
Joint work with John.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
In order to prepare sockmap logic to be used in combination with kTLS
we need to detangle it from ULP, and further split it in later commits
into a generic API.
Joint work with John.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Whenever the ULP data on the socket is mangled, enforce that the
caller has the socket lock held as otherwise things may race with
initialization and cleanup callbacks from ulp ops as both would
mangle internal socket state.
Joint work with John.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The recent patch to fix the afs_server struct leak didn't actually fix the
bug, but rather fixed some of the symptoms. The problem is that an
asynchronous call that holds a resource pointed to by call->reply[0] will
find the pointer cleared in the call destructor, thereby preventing the
resource from being cleaned up.
In the case of the server record leak, the afs_fs_get_capabilities()
function in devel code sets up a call with reply[0] pointing at the server
record that should be altered when the result is obtained, but this was
being cleared before the destructor was called, so the put in the
destructor does nothing and the record is leaked.
Commit f014ffb025c1 removed the additional ref obtained by
afs_install_server(), but the removal of this ref is actually used by the
garbage collector to mark a server record as being defunct after the record
has expired through lack of use.
The offending clearance of call->reply[0] upon completion in
afs_process_async_call() has been there from the origin of the code, but
none of the asynchronous calls actually use that pointer currently, so it
should be safe to remove (note that synchronous calls don't involve this
function).
Fix this by the following means:
(1) Revert commit f014ffb025c1.
(2) Remove the clearance of reply[0] from afs_process_async_call().
Without this, afs_manage_servers() will suffer an assertion failure if it
sees a server record that didn't get used because the usage count is not 1.
Fixes: f014ffb025c1 ("afs: Fix afs_server struct leak")
Fixes: 08e0e7c82eea ("[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC.")
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
In some environments it is common that many hosts share the same lower half
of their IPv6 addresses (in particular ::1). As __xfrm6_addr_hash() and
__xfrm6_daddr_saddr_hash() calculate the hash only from the lower halves,
as much as 1/3 of the hosts ends up in one hashtable chain which harms the
performance.
Use complete IPv6 addresses when calculating the hashes. Rather than just
adding two more words to the xor, use jhash2() for consistency with
__xfrm6_pref_hash() and __xfrm6_dpref_spref_hash().
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
|
|
If we did some signal processing, we have to reload the pt_regs
tstate register because it's value may have changed.
In doing so we also have to extract the %pil value contained in there
anre load that into %l4.
This value is at bit 20 and thus needs to be shifted down before we
later write it into the %pil register.
Most of the time this is harmless as we are returning to userspace
and the %pil is zero for that case.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
So that when it is unset, ie. '-1', userspace can see it
properly.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers-next patches for 4.20
Third set of patches for 4.20. Most notable is finalising ath10k
wcn3990 support, all components should be implemented now.
Major changes:
ath10k
* support NET_DETECT WoWLAN feature
* wcn3990 basic functionality now working after we got QMI support
mt76
* mt76x0e improvements (should be usable now)
* more mt76x0/mt76x2 unification work
brcmsmac
* fix a problem on AP mode with clients using power save mode
iwlwifi
* support for a new scan type: fast balance
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2018-10-14
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix xsk map update and delete operation to not call synchronize_net()
but to piggy back on SOCK_RCU_FREE for sockets instead as we are not
allowed to sleep under RCU, from Björn.
2) Do not change RLIMIT_MEMLOCK in reuseport_bpf selftest if the process
already has unlimited RLIMIT_MEMLOCK, from Eric.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ath.git patches for 4.20. Major changes:
ath10k
* support NET_DETECT WoWLAN feature
* wcn3990 basic functionality now working after we got QMI support
|
|
mt76 patches for 4.20
* mt76x0 fixes
* mt76x0e improvements (should be usable now)
* usb support improvements
* more mt76x0/mt76x2 unification work
* minor fix for aggregation + powersave clients
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Dan writes:
"libnvdimm/dax 4.19-rc8
* Fix a livelock in dax_layout_busy_page() present since v4.18. The
lockup triggers when truncating an actively mapped huge page out of
a mapping pinned for direct-I/O.
* Fix mprotect() clobbers of _PAGE_DEVMAP. Broken since v4.5
mprotect() clears this flag that is needed to communicate the
liveness of device pages to the get_user_pages() path."
* tag 'libnvdimm-fixes-4.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
mm: Preserve _PAGE_DEVMAP across mprotect() calls
filesystem-dax: Fix dax_layout_busy_page() livelock
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Wolfram writes:
"i2c fix for 4.19:
I2C has one documentation bugfix for something we changed during the
v4.19 cycle"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: Fix kerneldoc for renamed i2c dma put function
|
|
Dan Carpenter reports:
The patch 6acc9b432e67: "bpf: Add helper to retrieve socket in BPF"
from Oct 2, 2018, leads to the following Smatch complaint:
net/core/filter.c:4893 bpf_sk_lookup()
error: we previously assumed 'skb->dev' could be null (see line 4885)
Fix this issue by checking skb->dev before using it.
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add WCN3990 QMI client handshakes for Q6 integrated WLAN connectivity
subsystem. This layer is responsible for communicating qmi control
messages to wifi fw QMI service using QMI messaging protocol.
Qualcomm MSM Interface(QMI) is a messaging format used to communicate
between components running between remote processors with underlying
transport layer based on integrated chipset(shared memory) or
discrete chipset(PCI/USB/SDIO/UART).
Signed-off-by: Govind Singh <govinds@codeaurora.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Niklas Cassel <niklas.cassel@linaro.org>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Add debug mask to control debug info of ath10k qmi
messaging layer.
Signed-off-by: Govind Singh <govinds@codeaurora.org>
Acked-by: Niklas Cassel <niklas.cassel@linaro.org>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Add WLAN related VMID's to support wlan driver to set up
the remote's permissions call via TrustZone.
Signed-off-by: Govind Singh <govinds@codeaurora.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Niklas Cassel <niklas.cassel@linaro.org>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Add device tree binding documentation details of msa
memory region for ath10k qmi client for SDM845/APQ8098
SoC into "qcom,ath10k.txt".
Signed-off-by: Govind Singh <govinds@codeaurora.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Niklas Cassel <niklas.cassel@linaro.org>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Add support to create the boardname for non-bmi targets
like WCN3990, which uses qmi for bdf download. This
boardname is used to parse the board data from board-2.bin.
Signed-off-by: Rakesh Pillai <pillair@codeaurora.org>
Signed-off-by: Govind Singh <govinds@codeaurora.org>
Acked-by: Niklas Cassel <niklas.cassel@linaro.org>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
WLAN qmi server running in Q6 exposes host to target
cold boot qmi handshakes. Add WLAN QMI service helpers
for ath10k wcn3990 qmi client.
Signed-off-by: Govind Singh <govinds@codeaurora.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Niklas Cassel <niklas.cassel@linaro.org>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Use DEFINE_DEBUGFS_ATTRIBUTE rather than DEFINE_SIMPLE_ATTRIBUTE
for debugfs files.
Semantic patch information:
Rationale: DEFINE_SIMPLE_ATTRIBUTE + debugfs_create_file()
imposes some significant overhead as compared to
DEFINE_DEBUGFS_ATTRIBUTE + debugfs_create_file_unsafe().
Generated by: scripts/coccinelle/api/debugfs/debugfs_simple_attr.cocci
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
A couple of macros that deal with statistics in ath9k rely on the
declaration of the 'sc' variable, which they dereference.
However, when the statistics are disabled, the new instance in
ath_cmn_process_fft() causes a warning for an unused variable:
drivers/net/wireless/ath/ath9k/common-spectral.c: In function 'ath_cmn_process_fft':
drivers/net/wireless/ath/ath9k/common-spectral.c:474:20: error: unused variable 'sc' [-Werror=unused-variable]
It's better if those macros only operate on their arguments instead of
known variable names, and adding a cast to (void) kills off that warning.
Fixes: 03224678c013 ("ath9k: add counters for good and errorneous FFT/spectral frames")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
We added an unnecessary condition here in commit a904417fc876 ("ath10k:
add extended per sta tx statistics support"). "legacy_rate_idx" is a u8
so it can't be negative. The caller doesn't pass negatives either. I
have deleted this code.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
ath10k_pci_diag_write_mem may allocate big size of the dma memory
based on the parameter nbytes. Take firmware diag download as
example, the biggest size is about 500K. In some systems, the
allocation is likely to fail because it can't acquire such a large
contiguous dma memory.
The fix is to allocate a small size dma memory. In the loop,
driver copies the data to the allocated dma memory and writes to
the destination until all the data is written.
Tested with QCA6174 PCI with
firmware-6.bin_WLAN.RM.4.4.1-00119-QCARMSWP-1, this also affects
QCA9377 PCI.
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Reviewed-by: Brian Norris <briannorris@chomium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
In the noisy environment, if there are packets in the queue and can't
send out, the suspend timing will be more than 5 seconds due to the wait,
flush the queue to optimize the suspend timing, and let the upper layer to
retry the packets after resume.
Tested with QCA6174 PCI with firmware
WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377 PCI.
It's not a regression with new firmware releases.
Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
There is no need to compare *ps_state_enable* with < 0 because
such variable is of type u8 (8 bits, unsigned), making it
impossible to hold a negative value.
Fix this by removing such comparison.
Addresses-Coverity-ID: 1473921 ("Unsigned compared against 0")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|