Age | Commit message (Collapse) | Author |
|
The cmac template is setting its alignmask to that of its underlying
'cipher'. Yet, it doesn't care itself about how its inputs and outputs
are aligned, which is ostensibly the point of the alignmask. Instead,
cmac actually just uses its alignmask itself to runtime-align certain
fields in its tfm and desc contexts appropriately for its underlying
cipher. That is almost entirely pointless too, though, since cmac is
already using the cipher API functions that handle alignment themselves,
and few ciphers set a nonzero alignmask anyway. Also, even without
runtime alignment, an alignment of at least 4 bytes can be guaranteed.
Thus, at best this code is optimizing for the rare case of ciphers that
set an alignmask >= 7, at the cost of hurting the common cases.
Therefore, this patch removes the manual alignment code from cmac and
makes it stop setting an alignmask.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The cbcmac template is aligning a field in its desc context to the
alignmask of its underlying 'cipher', at runtime. This is almost
entirely pointless, since cbcmac is already using the cipher API
functions that handle alignment themselves, and few ciphers set a
nonzero alignmask anyway. Also, even without runtime alignment, an
alignment of at least 4 bytes can be guaranteed.
Thus, at best this code is optimizing for the rare case of ciphers that
set an alignmask >= 7, at the cost of hurting the common cases.
Therefore, remove the manual alignment code from cbcmac.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
This unnecessary explicit setting of cra_alignmask to 0 shows up when
grepping for shash algorithms that set an alignmask. Remove it. No
change in behavior.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
This unnecessary explicit setting of cra_alignmask to 0 shows up when
grepping for shash algorithms that set an alignmask. Remove it. No
change in behavior.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The zynqmp-sha3-384 algorithm sets a nonzero alignmask, but it doesn't
appear to actually need it. Therefore, stop setting it. This will
allow this algorithm to keep being registered after alignmask support is
removed from shash.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The stm32 crc32 algorithms set a nonzero alignmask, but they don't seem
to actually need it. Their ->update function already has code that
handles aligning the data to the same alignment that the alignmask
specifies, their ->setkey function already uses get_unaligned_le32(),
and their ->final function already uses put_unaligned_le32().
Therefore, stop setting the alignmask. This will allow these algorithms
to keep being registered after alignmask support is removed from shash.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
As far as I can tell, "crc32c-sparc64" is the only "shash" algorithm in
the kernel that sets a nonzero alignmask and actually relies on it to
get the crypto API to align the inputs and outputs. This capability is
not really useful, though. To unblock removing the support for
alignmask from shash_alg, this patch updates crc32c-sparc64 to no longer
use the alignmask. This means doing 8-byte alignment of the data when
doing an update, using get_unaligned_le32() when setting a non-default
initial CRC, and using put_unaligned_le32() to output the final CRC.
Partially tested with:
export ARCH=sparc64 CROSS_COMPILE=sparc64-linux-gnu-
make sparc64_defconfig
echo CONFIG_CRYPTO_CRC32C_SPARC64=y >> .config
echo '# CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set' >> .config
echo CONFIG_DEBUG_KERNEL=y >> .config
echo CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y >> .config
make olddefconfig
make -j$(getconf _NPROCESSORS_ONLN)
qemu-system-sparc64 -kernel arch/sparc/boot/image -nographic
However, qemu doesn't actually support the sparc CRC32C instructions, so
for the test I temporarily replaced crc32c_sparc64() with __crc32c_le()
and made sparc64_has_crc32c_opcode() always return true. So essentially
I tested the glue code, not the actual SPARC part which is unchanged.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Most shash algorithms don't have custom ->import and ->export functions,
resulting in the memcpy() based default being used. Yet,
crypto_shash_import() and crypto_shash_export() still make an indirect
call, which is expensive. Therefore, change how the default import and
export are called to make it so that crypto_shash_import() and
crypto_shash_export() don't do an indirect call in this case.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Document SA8775P and SC7280 compatible for the True Random Number
Generator.
Signed-off-by: Om Prakash Singh <quic_omprsing@quicinc.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add a module alias for pkcs1pas so that it can be auto-loaded by
modprobe.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The modular build fails because the self-test code depends on pkcs7
which in turn depends on x509 which contains the self-test.
Split the self-test out into its own module to break the cycle.
Fixes: 3cde3174eb91 ("certs: Add FIPS selftests")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
In a high-load arm64 environment, the pcrypt_aead01 test in LTP can lead
to system UAF (Use-After-Free) issues. Due to the lengthy analysis of
the pcrypt_aead01 function call, I'll describe the problem scenario
using a simplified model:
Suppose there's a user of padata named `user_function` that adheres to
the padata requirement of calling `padata_free_shell` after `serial()`
has been invoked, as demonstrated in the following code:
```c
struct request {
struct padata_priv padata;
struct completion *done;
};
void parallel(struct padata_priv *padata) {
do_something();
}
void serial(struct padata_priv *padata) {
struct request *request = container_of(padata,
struct request,
padata);
complete(request->done);
}
void user_function() {
DECLARE_COMPLETION(done)
padata->parallel = parallel;
padata->serial = serial;
padata_do_parallel();
wait_for_completion(&done);
padata_free_shell();
}
```
In the corresponding padata.c file, there's the following code:
```c
static void padata_serial_worker(struct work_struct *serial_work) {
...
cnt = 0;
while (!list_empty(&local_list)) {
...
padata->serial(padata);
cnt++;
}
local_bh_enable();
if (refcount_sub_and_test(cnt, &pd->refcnt))
padata_free_pd(pd);
}
```
Because of the high system load and the accumulation of unexecuted
softirq at this moment, `local_bh_enable()` in padata takes longer
to execute than usual. Subsequently, when accessing `pd->refcnt`,
`pd` has already been released by `padata_free_shell()`, resulting
in a UAF issue with `pd->refcnt`.
The fix is straightforward: add `refcount_dec_and_test` before calling
`padata_free_pd` in `padata_free_shell`.
Fixes: 07928d9bfc81 ("padata: Remove broken queue flushing")
Signed-off-by: WangJinchao <wangjinchao@xfusion.com>
Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Merge the mmc fixes for v6.6-rc[n] into the next branch, to allow them to
get tested together with the new mmc changes that are targeted for v6.7.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Document the SDHCI Controller on the SM8650 Platform.
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20231025-topic-sm8650-upstream-bindings-sdhci-v2-1-0406fca99033@linaro.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
For the t7 and older SoC families, the CMD_CFG_ERROR has no effect.
Starting from SoC family C3, setting this bit without SG LINK data
address will cause the controller to generate an IRQ and stop working.
To fix it, don't set the bit CMD_CFG_ERROR anymore.
Fixes: 18f92bc02f17 ("mmc: meson-gx: make sure the descriptor is stopped on errors")
Signed-off-by: Rong Chen <rong.chen@amlogic.com>
Reviewed-by: Jerome Brunet <jbrunet@baylibre.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20231026073156.2868310-1-rong.chen@amlogic.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
syzbot reported a data-race in virtnet_poll / virtnet_stats [1]
u64_stats_t infra has very nice accessors that must be used
to avoid potential load-store tearing.
[1]
BUG: KCSAN: data-race in virtnet_poll / virtnet_stats
read-write to 0xffff88810271b1a0 of 8 bytes by interrupt on cpu 0:
virtnet_receive drivers/net/virtio_net.c:2102 [inline]
virtnet_poll+0x6c8/0xb40 drivers/net/virtio_net.c:2148
__napi_poll+0x60/0x3b0 net/core/dev.c:6527
napi_poll net/core/dev.c:6594 [inline]
net_rx_action+0x32b/0x750 net/core/dev.c:6727
__do_softirq+0xc1/0x265 kernel/softirq.c:553
invoke_softirq kernel/softirq.c:427 [inline]
__irq_exit_rcu kernel/softirq.c:632 [inline]
irq_exit_rcu+0x3b/0x90 kernel/softirq.c:644
common_interrupt+0x7f/0x90 arch/x86/kernel/irq.c:247
asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:636
__sanitizer_cov_trace_const_cmp8+0x0/0x80 kernel/kcov.c:306
jbd2_write_access_granted fs/jbd2/transaction.c:1174 [inline]
jbd2_journal_get_write_access+0x94/0x1c0 fs/jbd2/transaction.c:1239
__ext4_journal_get_write_access+0x154/0x3f0 fs/ext4/ext4_jbd2.c:241
ext4_reserve_inode_write+0x14e/0x200 fs/ext4/inode.c:5745
__ext4_mark_inode_dirty+0x8e/0x440 fs/ext4/inode.c:5919
ext4_evict_inode+0xaf0/0xdc0 fs/ext4/inode.c:299
evict+0x1aa/0x410 fs/inode.c:664
iput_final fs/inode.c:1775 [inline]
iput+0x42c/0x5b0 fs/inode.c:1801
do_unlinkat+0x2b9/0x4f0 fs/namei.c:4405
__do_sys_unlink fs/namei.c:4446 [inline]
__se_sys_unlink fs/namei.c:4444 [inline]
__x64_sys_unlink+0x30/0x40 fs/namei.c:4444
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff88810271b1a0 of 8 bytes by task 2814 on cpu 1:
virtnet_stats+0x1b3/0x340 drivers/net/virtio_net.c:2564
dev_get_stats+0x6d/0x860 net/core/dev.c:10511
rtnl_fill_stats+0x45/0x320 net/core/rtnetlink.c:1261
rtnl_fill_ifinfo+0xd0e/0x1120 net/core/rtnetlink.c:1867
rtnl_dump_ifinfo+0x7f9/0xc20 net/core/rtnetlink.c:2240
netlink_dump+0x390/0x720 net/netlink/af_netlink.c:2266
netlink_recvmsg+0x425/0x780 net/netlink/af_netlink.c:1992
sock_recvmsg_nosec net/socket.c:1027 [inline]
sock_recvmsg net/socket.c:1049 [inline]
____sys_recvmsg+0x156/0x310 net/socket.c:2760
___sys_recvmsg net/socket.c:2802 [inline]
__sys_recvmsg+0x1ea/0x270 net/socket.c:2832
__do_sys_recvmsg net/socket.c:2842 [inline]
__se_sys_recvmsg net/socket.c:2839 [inline]
__x64_sys_recvmsg+0x46/0x50 net/socket.c:2839
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x000000000045c334 -> 0x000000000045c376
Fixes: 3fa2a1df9094 ("virtio-net: per cpu 64 bit stats (v2)")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On no-MMU, all futexes are treated as private because there is no need
to map a virtual address to physical to match the futex across
processes. This doesn't quite work though, because private futexes
include the current process's mm_struct as part of their key. This makes
it impossible for one process to wake up a shared futex being waited on
in another process.
Fix this bug by excluding the mm_struct from the key. With
a single address space, the futex address is already a unique key.
Fixes: 784bdf3bb694 ("futex: Assume all mappings are private on !MMU systems")
Signed-off-by: Ben Wolsieffer <ben.wolsieffer@hefring.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: André Almeida <andrealmeid@igalia.com>
Link: https://lore.kernel.org/r/20231019204548.1236437-2-ben.wolsieffer@hefring.com
|
|
Ido Schimmel says:
====================
Add MDB get support
This patchset adds MDB get support, allowing user space to request a
single MDB entry to be retrieved instead of dumping the entire MDB.
Support is added in both the bridge and VXLAN drivers.
Patches #1-#6 are small preparations in both drivers.
Patches #7-#8 add the required uAPI attributes for the new functionality
and the MDB get net device operation (NDO), respectively.
Patches #9-#10 implement the MDB get NDO in both drivers.
Patch #11 registers a handler for RTM_GETMDB messages in rtnetlink core.
The handler derives the net device from the ifindex specified in the
ancillary header and invokes its MDB get NDO.
Patches #12-#13 add selftests by converting tests that use MDB dump with
grep to the new MDB get functionality.
iproute2 changes can be found here [1].
v2:
* Patch #7: Add a comment to describe attributes structure.
* Patch #9: Add a comment above spin_lock_bh().
[1] https://github.com/idosch/iproute2/tree/submit/mdb_get_v1
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Test the new MDB get functionality by converting dump and grep to MDB
get.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Test the new MDB get functionality by converting dump and grep to MDB
get.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that both the bridge and VXLAN drivers implement the MDB get net
device operation, expose the functionality to user space by registering
a handler for RTM_GETMDB messages. Derive the net device from the
ifindex specified in the ancillary header and invoke its MDB get NDO.
Note that unlike other get handlers, the allocation of the skb
containing the response is not performed in the common rtnetlink code as
the size is variable and needs to be determined by the respective
driver.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement support for MDB get operation by looking up a matching MDB
entry, allocating the skb according to the entry's size and then filling
in the response.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement support for MDB get operation by looking up a matching MDB
entry, allocating the skb according to the entry's size and then filling
in the response. The operation is performed under the bridge multicast
lock to ensure that the entry does not change between the time the reply
size is determined and when the reply is filled in.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add MDB net device operation that will be invoked by rtnetlink code in
response to received RTM_GETMDB messages. Subsequent patches will
implement the operation in the bridge and VXLAN drivers.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add MDB get attributes that correspond to the MDB set attributes used in
RTM_NEWMDB messages. Specifically, add 'MDBA_GET_ENTRY' which will hold
a 'struct br_mdb_entry' and 'MDBA_GET_ENTRY_ATTRS' which will hold
'MDBE_ATTR_*' attributes that are used as indexes (source IP and source
VNI).
An example request will look as follows:
[ struct nlmsghdr ]
[ struct br_port_msg ]
[ MDBA_GET_ENTRY ]
struct br_mdb_entry
[ MDBA_GET_ENTRY_ATTRS ]
[ MDBE_ATTR_SOURCE ]
struct in_addr / struct in6_addr
[ MDBE_ATTR_SRC_VNI ]
u32
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, netlink notifications are sent for individual remote entries
and not for the entire MDB entry itself.
Subsequent patches are going to add MDB get support which will require
the VXLAN driver to reply with an entire MDB entry.
Therefore, as a preparation, factor out a helper to calculate the size
of an individual remote entry. When determining the size of the reply
this helper will be invoked for each remote entry in the MDB entry.
No functional changes intended.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adjust the function's arguments and rename it to allow it to be reused
by future call sites that only have access to 'struct
vxlan_mdb_entry_key', but not to 'struct vxlan_mdb_config'.
No functional changes intended.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current name is going to conflict with the upcoming net device
operation for the MDB get operation.
Rename the function to br_mdb_entry_skb_get(). No functional changes
intended.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, netlink notifications are sent for individual port group
entries and not for the entire MDB entry itself.
Subsequent patches are going to add MDB get support which will require
the bridge driver to reply with an entire MDB entry.
Therefore, as a preparation, factor out an helper to calculate the size
of an individual port group entry. When determining the size of the
reply this helper will be invoked for each port group entry in the MDB
entry.
No functional changes intended.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The 'MDBA_MDB' and 'MDBA_MDB_ENTRY' nest attributes are not accounted
for when calculating the size of MDB notifications. Add them along with
comments for existing attributes.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, the bridge driver does not dump MDB entries when multicast
snooping is disabled although the entries are present in the kernel:
# bridge mdb add dev br0 port swp1 grp 239.1.1.1 permanent
# bridge mdb show dev br0
dev br0 port swp1 grp 239.1.1.1 permanent
dev br0 port br0 grp ff02::6a temp
dev br0 port br0 grp ff02::1:ff9d:e61b temp
# ip link set dev br0 type bridge mcast_snooping 0
# bridge mdb show dev br0
# ip link set dev br0 type bridge mcast_snooping 1
# bridge mdb show dev br0
dev br0 port swp1 grp 239.1.1.1 permanent
dev br0 port br0 grp ff02::6a temp
dev br0 port br0 grp ff02::1:ff9d:e61b temp
This behavior differs from other netlink dump interfaces that dump
entries regardless if they are used or not. For example, VLANs are
dumped even when VLAN filtering is disabled:
# ip link set dev br0 type bridge vlan_filtering 0
# bridge vlan show dev swp1
port vlan-id
swp1 1 PVID Egress Untagged
Remove the check and always dump MDB entries:
# bridge mdb add dev br0 port swp1 grp 239.1.1.1 permanent
# bridge mdb show dev br0
dev br0 port swp1 grp 239.1.1.1 permanent
dev br0 port br0 grp ff02::6a temp
dev br0 port br0 grp ff02::1:ffeb:1a4d temp
# ip link set dev br0 type bridge mcast_snooping 0
# bridge mdb show dev br0
dev br0 port swp1 grp 239.1.1.1 permanent
dev br0 port br0 grp ff02::6a temp
dev br0 port br0 grp ff02::1:ffeb:1a4d temp
# ip link set dev br0 type bridge mcast_snooping 1
# bridge mdb show dev br0
dev br0 port swp1 grp 239.1.1.1 permanent
dev br0 port br0 grp ff02::6a temp
dev br0 port br0 grp ff02::1:ffeb:1a4d temp
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Dmitry Safonov says:
====================
net/tcp: Add TCP-AO support
This is version 16 of TCP-AO support. It addresses the build warning
in the middle of patch set, reported by kernel test robot.
There's one Sparse warning introduced by tcp_sigpool_start():
__cond_acquires() seems to currently being broken. I've described
the reasoning for it on v9 cover letter. Also, checkpatch.pl warnings
were addressed, but yet I've left the ones that are more personal
preferences (i.e. 80 columns limit). Please, ping me if you have
a strong feeling about one of them.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It has Frequently Asked Questions (FAQ) on RFC 5925 - I found it very
useful answering those before writing the actual code. It provides answers
to common questions that arise on a quick read of the RFC, as well as how
they were answered. There's also comparison to TCP-MD5 option,
evaluation of per-socket vs in-kernel-DB approaches and description of
uAPI provided.
Hopefully, it will be as useful for reviewing the code as it was for writing.
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add TCP_AO_REPAIR setsockopt(), getsockopt(). They let a user to repair
TCP-AO ISNs/SNEs. Also let the user hack around when (tp->repair) is on
and add ao_info on a socket in any supported state.
As SNEs now can be read/written at any moment, use
WRITE_ONCE()/READ_ONCE() to set/read them.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Similarly how TCP_MD5SIG_FLAG_IFINDEX works for TCP-MD5,
TCP_AO_KEYF_IFINDEX is an AO-key flag that binds that MKT to a specified
by L3 ifinndex. Similarly, without this flag the key will work in
the default VRF l3index = 0 for connections.
To prevent AO-keys from overlapping, it's restricted to add key B for a
socket that has key A, which have the same sndid/rcvid and one of
the following is true:
- !(A.keyflags & TCP_AO_KEYF_IFINDEX) or !(B.keyflags & TCP_AO_KEYF_IFINDEX)
so that any key is non-bound to a VRF
- A.l3index == B.l3index
both want to work for the same VRF
Additionally, it's restricted to match TCP-MD5 keys for the same peer
the following way:
|--------------|--------------------|----------------|---------------|
| | MD5 key without | MD5 key | MD5 key |
| | l3index | l3index=0 | l3index=N |
|--------------|--------------------|----------------|---------------|
| TCP-AO key | | | |
| without | reject | reject | reject |
| l3index | | | |
|--------------|--------------------|----------------|---------------|
| TCP-AO key | | | |
| l3index=0 | reject | reject | allow |
|--------------|--------------------|----------------|---------------|
| TCP-AO key | | | |
| l3index=N | reject | allow | reject |
|--------------|--------------------|----------------|---------------|
This is done with the help of tcp_md5_do_lookup_any_l3index() to reject
adding AO key without TCP_AO_KEYF_IFINDEX if there's TCP-MD5 in any VRF.
This is important for case where sysctl_tcp_l3mdev_accept = 1
Similarly, for TCP-AO lookups tcp_ao_do_lookup() may be used with
l3index < 0, so that __tcp_ao_key_cmp() will match TCP-AO key in any VRF.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Similarly to TCP-MD5, add a static key to TCP-AO that is patched out
when there are no keys on a machine and dynamically enabled with the
first setsockopt(TCP_AO) adds a key on any socket. The static key is as
well dynamically disabled later when the socket is destructed.
The lifetime of enabled static key here is the same as ao_info: it is
enabled on allocation, passed over from full socket to twsk and
destructed when ao_info is scheduled for destruction.
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Delete becomes very, very fast - almost free, but after setsockopt()
syscall returns, the key is still alive until next RCU grace period.
Which is fine for listen sockets as userspace needs to be aware of
setsockopt(TCP_AO) and accept() race and resolve it with verification
by getsockopt() after TCP connection was accepted.
The benchmark results (on non-loaded box, worse with more RCU work pending):
> ok 33 Worst case delete 16384 keys: min=5ms max=10ms mean=6.93904ms stddev=0.263421
> ok 34 Add a new key 16384 keys: min=1ms max=4ms mean=2.17751ms stddev=0.147564
> ok 35 Remove random-search 16384 keys: min=5ms max=10ms mean=6.50243ms stddev=0.254999
> ok 36 Remove async 16384 keys: min=0ms max=0ms mean=0.0296107ms stddev=0.0172078
Co-developed-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Co-developed-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduce getsockopt(TCP_AO_GET_KEYS) that lets a user get TCP-AO keys
and their properties from a socket. The user can provide a filter
to match the specific key to be dumped or ::get_all = 1 may be
used to dump all keys in one syscall.
Add another getsockopt(TCP_AO_INFO) for providing per-socket/per-ao_info
stats: packet counters, Current_key/RNext_key and flags like
::ao_required and ::accept_icmps.
Co-developed-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Co-developed-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Provide setsockopt() key flag that makes TCP-AO exclude hashing TCP
header for peers that match the key. This is needed for interraction
with middleboxes that may change TCP options, see RFC5925 (9.2).
Co-developed-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Co-developed-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Similarly to IPsec, RFC5925 prescribes:
">> A TCP-AO implementation MUST default to ignore incoming ICMPv4
messages of Type 3 (destination unreachable), Codes 2-4 (protocol
unreachable, port unreachable, and fragmentation needed -- ’hard
errors’), and ICMPv6 Type 1 (destination unreachable), Code 1
(administratively prohibited) and Code 4 (port unreachable) intended
for connections in synchronized states (ESTABLISHED, FIN-WAIT-1, FIN-
WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT) that match MKTs."
A selftest (later in patch series) verifies that this attack is not
possible in this TCP-AO implementation.
Co-developed-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Co-developed-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a helper for logging connection-detailed messages for failed TCP
hash verification (both MD5 and AO).
Co-developed-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Co-developed-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add Sequence Number Extension (SNE) for TCP-AO.
This is needed to protect long-living TCP-AO connections from replaying
attacks after sequence number roll-over, see RFC5925 (6.2).
Co-developed-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Co-developed-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduce segment counters that are useful for troubleshooting/debugging
as well as for writing tests.
Now there are global snmp counters as well as per-socket and per-key.
Co-developed-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Co-developed-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now there is a common function to verify signature on TCP segments:
tcp_inbound_hash(). It has checks for all possible cross-interactions
with MD5 signs as well as with unsigned segments.
The rules from RFC5925 are:
(1) Any TCP segment can have at max only one signature.
(2) TCP connections can't switch between using TCP-MD5 and TCP-AO.
(3) TCP-AO connections can't stop using AO, as well as unsigned
connections can't suddenly start using AO.
Co-developed-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Co-developed-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Similarly to RST segments, wire SYN-ACKs to TCP-AO.
tcp_rsk_used_ao() is handy here to check if the request socket used AO
and needs a signature on the outgoing segments.
Co-developed-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Co-developed-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now when the new request socket is created from the listening socket,
it's recorded what MKT was used by the peer. tcp_rsk_used_ao() is
a new helper for checking if TCP-AO option was used to create the
request socket.
tcp_ao_copy_all_matching() will copy all keys that match the peer on the
request socket, as well as preparing them for the usage (creating
traffic keys).
Co-developed-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Co-developed-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for sockets in time-wait state.
ao_info as well as all keys are inherited on transition to time-wait
socket. The lifetime of ao_info is now protected by ref counter, so
that tcp_ao_destroy_sock() will destruct it only when the last user is
gone.
Co-developed-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Co-developed-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Wire up sending resets to TCP-AO hashing.
Co-developed-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Co-developed-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduce a helper that:
(1) shares the common code with TCP-MD5 header options parsing
(2) looks for hash signature only once for both TCP-MD5 and TCP-AO
(3) fails with -EEXIST if any TCP sign option is present twice, see
RFC5925 (2.2):
">> A single TCP segment MUST NOT have more than one TCP-AO in its
options sequence. When multiple TCP-AOs appear, TCP MUST discard
the segment."
Co-developed-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Co-developed-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Using precalculated traffic keys, sign TCP segments as prescribed by
RFC5925. Per RFC, TCP header options are included in sign calculation:
"The TCP header, by default including options, and where the TCP
checksum and TCP-AO MAC fields are set to zero, all in network-
byte order." (5.1.3)
tcp_ao_hash_header() has exclude_options parameter to optionally exclude
TCP header from hash calculation, as described in RFC5925 (9.1), this is
needed for interaction with middleboxes that may change "some TCP
options". This is wired up to AO key flags and setsockopt() later.
Similarly to TCP-MD5 hash TCP segment fragments.
From this moment a user can start sending TCP-AO signed segments with
one of crypto ahash algorithms from supported by Linux kernel. It can
have a user-specified MAC length, to either save TCP option header space
or provide higher protection using a longer signature.
The inbound segments are not yet verified, TCP-AO option is ignored and
they are accepted.
Co-developed-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Co-developed-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Salam Noureddine <noureddine@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|