Age | Commit message (Collapse) | Author |
|
Cleanup of init state machine, move state specific
code to separate functions and rewrite the
iavf_init_task() function.
Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Refactor the watchdog state machine implementation.
Add the additional state __IAVF_COMM_FAILED to process
the PF communication fails. Prepare the watchdog state machine
to integrate with init state machine.
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Remove the watchdog timer, instead declare watchdog task
as delayed work and use dedicated workqueue to service driver
tasks. The dedicated driver workqueue iavf_wq is common
for all driver instances.
Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Move the commands processing outside the watchdog_task()
function. This reduce length and complexity of the function
which is mainly designed to process the watchdog state machine.
Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
There was a calculation error in virtchnl regarding the valid
length which was fixed recently and a corresponding change needs
to go into the code while we enable ADq.
Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
iavf_add_vlan() is being called in atomic context
so kzalloc() needs GFP_ATOMIC. This patch fixes it.
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
On some hardware/driver/architecture combinations, it may take longer
than 200msec for all close operations to be completed, causing a
spurious error message to be logged.
Increase the timeout value to 500msec to avoid this erroneous error.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
The counter variable in iavf_clean_tx_irq starts out negative and climbs
to 0. So allocating it as u16 is actually a really bad idea that just
happens to work because the value underflows and overflows consistently
on most architectures.
Replace the u16 with an int so signed math works as expected.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This patch changes how VLAN tag are being populated and programmed into
the HW - Instead of start adding VF VLAN tag from the last member of the
element list, start from the first member of the list, until number of
allowed VLAN tags is exhausted in the HW.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Currently, header inclusion in each file is inconsistent.
For example, "libbpf.h" header is included as multiple ways.
#include "bpf/libbpf.h"
#include "libbpf.h"
Due to commit b552d33c80a9 ("samples/bpf: fix include path
in Makefile"), $(srctree)/tools/lib/bpf/ path had been included
during build, path "bpf/" in header isn't necessary anymore.
This commit removes path "bpf/" in header inclusion.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Due to recent change of include path at commit b552d33c80a9
("samples/bpf: fix include path in Makefile"), some of the
previous include options became unnecessary.
This commit removes duplicated include options in Makefile.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Andrii Nakryiko says:
====================
This patch set implements initial version (as discussed at LSF/MM2019
conference) of a new way to specify BPF maps, relying on BTF type information,
which allows for easy extensibility, preserving forward and backward
compatibility. See details and examples in description for patch #6.
[0] contains an outline of follow up extensions to be added after this basic
set of features lands. They are useful by itself, but also allows to bring
libbpf to feature-parity with iproute2 BPF loader. That should open a path
forward for BPF loaders unification.
Patch #1 centralizes commonly used min/max macro in libbpf_internal.h.
Patch #2 extracts .BTF and .BTF.ext loading loging from elf_collect().
Patch #3 simplifies elf_collect() error-handling logic.
Patch #4 refactors map initialization logic into user-provided maps and global
data maps, in preparation to adding another way (BTF-defined maps).
Patch #5 adds support for map definitions in multiple ELF sections and
deprecates bpf_object__find_map_by_offset() API which doesn't appear to be
used anymore and makes assumption that all map definitions reside in single
ELF section.
Patch #6 splits BTF intialization from sanitization/loading into kernel to
preserve original BTF at the time of map initialization.
Patch #7 adds support for BTF-defined maps.
Patch #8 adds new test for BTF-defined map definition.
Patches #9-11 convert test BPF map definitions to use BTF way.
[0] https://lore.kernel.org/bpf/CAEf4BzbfdG2ub7gCi0OYqBrUoChVHWsmOntWAkJt47=FE+km+A@mail.gmail.com/
v1->v2:
- more BTF-sanity checks in parsing map definitions (Song);
- removed confusing usage of "attribute", switched to "field;
- split off elf_collect() refactor from btf loading refactor (Song);
- split selftests conversion into 3 patches (Stanislav):
1. test already relying on BTF;
2. tests w/ custom types as key/value (so benefiting from BTF);
3. all the rest tests (integers as key/value, special maps w/o BTF support).
- smaller code improvements (Song);
rfc->v1:
- error out on unknown field by default (Stanislav, Jakub, Lorenz);
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Convert a bulk of selftests that have maps with custom (not integer) key
and/or value.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Switch tests that already rely on BTF to BTF-defined map definitions.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Add file test for BTF-defined map definition.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
This patch adds support for a new way to define BPF maps. It relies on
BTF to describe mandatory and optional attributes of a map, as well as
captures type information of key and value naturally. This eliminates
the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
always in sync with the key/value type.
Relying on BTF, this approach allows for both forward and backward
compatibility w.r.t. extending supported map definition features. By
default, any unrecognized attributes are treated as an error, but it's
possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
in the future will need to be optional.
The outline of the new map definition (short, BTF-defined maps) is as follows:
1. All the maps should be defined in .maps ELF section. It's possible to
have both "legacy" map definitions in `maps` sections and BTF-defined
maps in .maps sections. Everything will still work transparently.
2. The map declaration and initialization is done through
a global/static variable of a struct type with few mandatory and
extra optional fields:
- type field is mandatory and specified type of BPF map;
- key/value fields are mandatory and capture key/value type/size information;
- max_entries attribute is optional; if max_entries is not specified or
initialized, it has to be provided in runtime through libbpf API
before loading bpf_object;
- map_flags is optional and if not defined, will be assumed to be 0.
3. Key/value fields should be **a pointer** to a type describing
key/value. The pointee type is assumed (and will be recorded as such
and used for size determination) to be a type describing key/value of
the map. This is done to save excessive amounts of space allocated in
corresponding ELF sections for key/value of big size.
4. As some maps disallow having BTF type ID associated with key/value,
it's possible to specify key/value size explicitly without
associating BTF type ID with it. Use key_size and value_size fields
to do that (see example below).
Here's an example of simple ARRAY map defintion:
struct my_value { int x, y, z; };
struct {
int type;
int max_entries;
int *key;
struct my_value *value;
} btf_map SEC(".maps") = {
.type = BPF_MAP_TYPE_ARRAY,
.max_entries = 16,
};
This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
be of type int and thus key size will be 4 bytes. The value is struct
my_value of size 12 bytes. This map can be used from C code exactly the
same as with existing maps defined through struct bpf_map_def.
Here's an example of STACKMAP definition (which currently disallows BTF type
IDs for key/value):
struct {
__u32 type;
__u32 max_entries;
__u32 map_flags;
__u32 key_size;
__u32 value_size;
} stackmap SEC(".maps") = {
.type = BPF_MAP_TYPE_STACK_TRACE,
.max_entries = 128,
.map_flags = BPF_F_STACK_BUILD_ID,
.key_size = sizeof(__u32),
.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
};
This approach is naturally extended to support map-in-map, by making a value
field to be another struct that describes inner map. This feature is not
implemented yet. It's also possible to incrementally add features like pinning
with full backwards and forward compatibility. Support for static
initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
is also on the roadmap.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Libbpf does sanitization of BTF before loading it into kernel, if kernel
doesn't support some of newer BTF features. This removes some of the
important information from BTF (e.g., DATASEC and VAR description),
which will be used for map construction. This patch splits BTF
processing into initialization step, in which BTF is initialized from
ELF and all the original data is still preserved; and
sanitization/loading step, which ensures that BTF is safe to load into
kernel. This allows to use full BTF information to construct maps, while
still loading valid BTF into older kernels.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
To support maps to be defined in multiple sections, it's important to
identify map not just by offset within its section, but section index as
well. This patch adds tracking of section index.
For global data, we record section index of corresponding
.data/.bss/.rodata ELF section for uniformity, and thus don't need
a special value of offset for those maps.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
User and global data maps initialization has gotten pretty complicated
and unnecessarily convoluted. This patch splits out the logic for global
data map and user-defined map initialization. It also removes the
restriction of pre-calculating how many maps will be initialized,
instead allowing to keep adding new maps as they are discovered, which
will be used later for BTF-defined map definitions.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Simplify ELF parsing logic by exiting early, as there is no common clean
up path to execute. That makes it unnecessary to track when err was set
and when it was cleared. It also reduces nesting in some places.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
As a preparation for adding BTF-based BPF map loading, extract .BTF and
.BTF.ext loading logic.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Multiple files in libbpf redefine their own definitions for min/max.
Let's define them in libbpf_internal.h and use those everywhere.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
When propagating mounts across mount namespaces owned by different user
namespaces it is not possible anymore to move or umount the mount in the
less privileged mount namespace.
Here is a reproducer:
sudo mount -t tmpfs tmpfs /mnt
sudo --make-rshared /mnt
# create unprivileged user + mount namespace and preserve propagation
unshare -U -m --map-root --propagation=unchanged
# now change back to the original mount namespace in another terminal:
sudo mkdir /mnt/aaa
sudo mount -t tmpfs tmpfs /mnt/aaa
# now in the unprivileged user + mount namespace
mount --move /mnt/aaa /opt
Unfortunately, this is a pretty big deal for userspace since this is
e.g. used to inject mounts into running unprivileged containers.
So this regression really needs to go away rather quickly.
The problem is that a recent change falsely locked the root of the newly
added mounts by setting MNT_LOCKED. Fix this by only locking the mounts
on copy_mnt_ns() and not when adding a new mount.
Fixes: 3bd045cc9c4b ("separate copying and locking mount tree on cross-userns copies")
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Tested-by: Christian Brauner <christian@brauner.io>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
sys_fsmount() needs to take a reference to the new mount when adding it
to the anonymous mount namespace. Otherwise the filesystem can be
unmounted while it's still in use, as found by syzkaller.
Reported-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: syzbot+99de05d099a170867f22@syzkaller.appspotmail.com
Reported-by: syzbot+7008b8b8ba7df475fdc8@syzkaller.appspotmail.com
Fixes: 93766fbd2696 ("vfs: syscall: Add fsmount() to create a mount for a superblock")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
We can not hold the GlobalMid_Lock spinlock during the
dfs processing in cifs_reconnect since it invokes things that may sleep
and thus trigger :
BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:23
Thus we need to drop the spinlock during this code block.
RHBZ: 1716743
Cc: stable@vger.kernel.org
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
The bpf_ipv6_fib_lookup function should return BPF_FIB_LKUP_RET_FWD_DISABLED
when forwarding is disabled for the input device. However instead of checking
if forwarding is enabled on the input device, it checked the global
net->ipv6.devconf_all->forwarding flag. Change it to behave as expected.
Fixes: 87f5fc7e48dd ("bpf: Provide helper to do forwarding lookups in kernel FIB table")
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Some servers such as Windows 10 will return STATUS_INSUFFICIENT_RESOURCES
as the number of simultaneous SMB3 requests grows (even though the client
has sufficient credits). Return EAGAIN on STATUS_INSUFFICIENT_RESOURCES
so that we can retry writes which fail with this status code.
This (for example) fixes large file copies to Windows 10 on fast networks.
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
|
|
Currently user is unable to delete the filter. See following example:
$ tc filter add dev ens16np1 ingress pref 1 handle 1 matchall action drop
$ tc filter show dev ens16np1 ingress
filter protocol all pref 1 matchall chain 0
filter protocol all pref 1 matchall chain 0 handle 0x1
in_hw
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1
$ tc filter del dev ens16np1 ingress pref 1 handle 1 matchall action drop
RTNETLINK answers: Operation not supported
Implement tcf_proto_ops->delete() op and allow user to delete the filter.
Reported-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pointer ae_dev is null checked however, prior to that it is dereferenced
when assigned pointer ops. Fix this by assigning pointer ops after ae_dev
has been null checked.
Addresses-Coverity: ("Dereference before null check")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Kevin Darbyshire-Bryant says:
====================
net: sched: act_ctinfo: fixes
This is first attempt at sending a small series. Order is important
because one bug (policy validation) prevents us from encountering the
more important 'OOPS' generating bug in action creation. Fix the OOPS
first.
Confession time: Until very recently, development of this module has
been done on 'net-next' tree to 'clean compile' level with run-time
testing on backports to 4.14 & 4.19 kernels under openwrt. It turns out
that sched: action: based code has been under more active change than I
realised.
During the back & forward porting during development & testing, the
critical ACT_P_CREATED return code got missed despite being in the 4.14
& 4.19 backports. I have now gone through the init functions, using
act_csum as reference with a fine toothed comb and am happy they do the
same things.
This issue hadn't been caught till now due to another issue caused by
new strict nla_parse_nested function failing parsing validation before
action creation.
Thanks to Marcelo Leitner <marcelo.leitner@gmail.com> for flagging
extack deficiency (fixed in 733f0766c3de sched: act_ctinfo: use extack
error reporting) which led to b424e432e770 ("netlink: add validation of
NLA_F_NESTED flag") and 8cb081746c03 ("netlink: make validation more
configurable for future strictness”) which led to the policy validation
fix, which then led to the action creation fix both contained in this
series.
If I ever get to a developer conference please feel free to
tar/feather/apply cone of shame.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix nla_policy definition by specifying an exact length type attribute
to CTINFO action paraneter block structure. Without this change,
netlink parsing will fail validation and the action will not be
instantiated.
8cb081746c03 ("netlink: make validation more configurable for future")
introduced much stricter checking to attributes being passed via
netlink. Existing actions were updated to use less restrictive
deprecated versions of nla_parse_nested.
As a new module, act_ctinfo should be designed to use the strict
checking model otherwise, well, what was the point of implementing it.
Confession time: Until very recently, development of this module has
been done on 'net-next' tree to 'clean compile' level with run-time
testing on backports to 4.14 & 4.19 kernels under openwrt. This is how
I managed to miss the run-time impacts of the new strict
nla_parse_nested function. I hopefully have learned something from this
(glances toward laptop running a net-next kernel)
There is however a still outstanding implication on iproute2 user space
in that it needs to be told to pass nested netlink messages with the
nested attribute actually set. So even with this kernel fix to do
things correctly you still cannot instantiate a new 'strict'
nla_parse_nested based action such as act_ctinfo with iproute2's tc.
Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use correct return value on action creation: ACT_P_CREATED.
The use of incorrect return value could result in a situation where the
system thought a ctinfo module was listening but actually wasn't
instantiated correctly leading to an OOPS in tcf_generic_walker().
Confession time: Until very recently, development of this module has
been done on 'net-next' tree to 'clean compile' level with run-time
testing on backports to 4.14 & 4.19 kernels under openwrt. During the
back & forward porting during development & testing, the critical
ACT_P_CREATED return code got missed despite being in the 4.14 & 4.19
backports. I have now gone through the init functions, using act_csum
as reference with a fine toothed comb. Bonus, no more OOPSes. I
managed to also miss this issue till now due to the new strict
nla_parse_nested function failing validation before action creation.
As an inexperienced developer I've learned that
copy/pasting/backporting/forward porting code correctly is hard. If I
ever get to a developer conference I shall don the cone of shame.
Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A branch is needed here to get the fix into staging-next as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
There are some backward incompatible features pending
for months, mainly due to on-disk format expensions.
However, we should ensure that it cannot be mounted with
old kernels. Otherwise, it will causes unexpected behaviors.
Fixes: ba2b77a82022 ("staging: erofs: add super block operations")
Cc: <stable@vger.kernel.org> # 4.19+
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Vhost_net was known to suffer from HOL[1] issues which is not easy to
fix. Several downstream disable the feature by default. What's more,
the datapath was split and datacopy path got the support of batching
and XDP support recently which makes it faster than zerocopy part for
small packets transmission.
It looks to me that disable zerocopy by default is more
appropriate. It cold be enabled by default again in the future if we
fix the above issues.
[1] https://patchwork.kernel.org/patch/3787671/
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Using a bare block cipher in non-crypto code is almost always a bad idea,
not only for security reasons (and we've seen some examples of this in
the kernel in the past), but also for performance reasons.
In the TCP fastopen case, we call into the bare AES block cipher one or
two times (depending on whether the connection is IPv4 or IPv6). On most
systems, this results in a call chain such as
crypto_cipher_encrypt_one(ctx, dst, src)
crypto_cipher_crt(tfm)->cit_encrypt_one(crypto_cipher_tfm(tfm), ...);
aesni_encrypt
kernel_fpu_begin();
aesni_enc(ctx, dst, src); // asm routine
kernel_fpu_end();
It is highly unlikely that the use of special AES instructions has a
benefit in this case, especially since we are doing the above twice
for IPv6 connections, instead of using a transform which can process
the entire input in one go.
We could switch to the cbcmac(aes) shash, which would at least get
rid of the duplicated overhead in *some* cases (i.e., today, only
arm64 has an accelerated implementation of cbcmac(aes), while x86 will
end up using the generic cbcmac template wrapping the AES-NI cipher,
which basically ends up doing exactly the above). However, in the given
context, it makes more sense to use a light-weight MAC algorithm that
is more suitable for the purpose at hand, such as SipHash.
Since the output size of SipHash already matches our chosen value for
TCP_FASTOPEN_COOKIE_SIZE, and given that it accepts arbitrary input
sizes, this greatly simplifies the code as well.
NOTE: Server farms backing a single server IP for load balancing purposes
and sharing a single fastopen key will be adversely affected by
this change unless all systems in the pool receive their kernel
upgrades at the same time.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus
Jonathan writes:
Second set of IIO fixes for the 5.2 cycle.
* ad7150
- sense of bit for controlling adaptive vs fixed threshold was flipped.
* adt7316
- Fix a build issue due to wrong headers for gpio usage.
* lsm6dsx
- correctly suspend / resume i2c slaves when the host goes to sleep.
* mlx90632
- relax a compatability check to allow for newer devices.
Also one counters fix
* counter/ftm-quaddec
- missing dependencies in Kconfig.
* tag 'iio-fixes-for-5.2b' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
counter/ftm-quaddec: Add missing dependencies in Kconfig
staging: iio: adt7316: Fix build errors when GPIOLIB is not set
iio: temperature: mlx90632 Relax the compatibility check
iio: imu: st_lsm6dsx: fix PM support for st_lsm6dsx i2c controller
staging:iio:ad7150: fix threshold mode config bit
|
|
In patch series, commit 9195948fbf34 ("tipc: improve TIPC throughput by
Gap ACK blocks"), as for simplicity, the repeated retransmit failures'
detection in the function - "tipc_link_retrans()" was kept there for
broadcast retransmissions only.
This commit now reapplies this feature for link unicast retransmissions
that has been done via the function - "tipc_link_advance_transmq()".
Also, the "tipc_link_retrans()" is renamed to "tipc_link_bc_retrans()"
as it is used only for broadcast.
Acked-by: Jon Maloy <jon.maloy@ericsson.se>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Like bond, add ethtool get_link_ksettings to show the total speed.
v2: no update, just repost.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Eric Dumazet says:
====================
tcp: make sack processing more robust
Jonathan Looney brought to our attention multiple problems
in TCP stack at the sender side.
SACK processing can be abused by malicious peers to either
cause overflows, or increase of memory usage.
First two patches fix the immediate problems.
Since the malicious peers abuse senders by advertizing a very
small MSS in their SYN or SYNACK packet, the last two
patches add a new sysctl so that admins can chose a higher
limit for MSS clamping.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix ssbd.c which depends implicitly on asm/ptrace.h including
linux/prctl.h (through for example linux/compat.h, then linux/time.h,
linux/seqlock.h, linux/spinlock.h and linux/irqflags.h), and uses
PR_SPEC* defines.
This is an issue since we'll soon be removing the include from
asm/ptrace.h.
Fixes: 9cdc0108baa8 ("arm64: ssbd: Add prctl interface for per-thread mitigation")
Cc: stable@vger.kernel.org
Signed-off-by: Anisse Astier <aastier@freebox.fr>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Paul Walmsley:
"This contains fixes, defconfig, and DT data changes for the v5.2-rc
series.
The fixes are relatively straightforward:
- Addition of a TLB fence in the vmalloc_fault path, so the CPU
doesn't enter an infinite page fault loop
- Readdition of the pm_power_off export, so device drivers that
reassign it can now be built as modules
- A udelay() fix for RV32, fixing a miscomputation of the delay time
- Removal of deprecated smp_mb__*() barriers
This also adds initial DT data infrastructure for arch/riscv, along
with initial data for the SiFive FU540-C000 SoC and the corresponding
HiFive Unleashed board.
We also update the RV64 defconfig to include some core drivers for the
FU540 in the build"
* tag 'riscv-for-v5.2/fixes-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: remove unused barrier defines
riscv: mm: synchronize MMU after pte change
riscv: dts: add initial board data for the SiFive HiFive Unleashed
riscv: dts: add initial support for the SiFive FU540-C000 SoC
dt-bindings: riscv: convert cpu binding to json-schema
dt-bindings: riscv: sifive: add YAML documentation for the SiFive FU540
arch: riscv: add support for building DTB files from DT source data
riscv: Fix udelay in RV32.
riscv: export pm_power_off again
RISC-V: defconfig: enable clocks, serial console
|
|
When multiple iovecs reference the same page, each get_user_page call
will add a reference to the page. But once we've created the bio that
information gets lost and only a single reference will be dropped after
I/O completion. Use the same_page information returned from
__bio_try_merge_page to drop additional references to pages that were
already present in the bio.
Based on a patch from Ming Lei.
Link: https://lkml.org/lkml/2019/4/23/64
Fixes: 576ed913 ("block: use bio_add_page in bio_iov_iter_get_pages")
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We currently have an input same_page parameter to __bio_try_merge_page
to prohibit merging in the same page. The rationale for that is that
some callers need to account for every page added to a bio. Instead of
letting these callers call twice into the merge code to account for the
new vs existing page cases, just turn the paramter into an output one that
returns if a merge in the same page occured and let them act accordingly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add common functions into nf_synproxy_core.c to prepare for nftables support.
The prototypes of the functions used by {ipt, ip6t}_SYNPROXY are in the new
file nf_synproxy.h
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This is a prerequisite for the infrastructure module NETFILTER_SYNPROXY.
The new module is needed to avoid duplicated code for the SYNPROXY
nftables support.
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This new UAPI file is going to be used by the xt and nft common SYNPROXY
infrastructure. It is needed to avoid duplicated code.
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Jozsef Kadlecsik says:
====================
ipset patches for nf-next
- Remove useless memset() calls, nla_parse_nested/nla_parse
erase the tb array properly, from Florent Fourcot.
- Merge the uadd and udel functions, the code is nicer
this way, also from Florent Fourcot.
- Add a missing check for the return value of a
nla_parse[_deprecated] call, from Aditya Pakki.
- Add the last missing check for the return value
of nla_parse[_deprecated] call.
- Fix error path and release the references properly
in set_target_v3_checkentry().
- Fix memory accounting which is reported to userspace
for hash types on resize, from Stefano Brivio.
- Update my email address to kadlec@netfilter.org.
The patch covers all places in the source tree where
my kadlec@blackhole.kfki.hu address could be found.
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
After the recent series of cleanups in the properties and xattrs modules
that landed in the 5.2 merge window, we ended up with a regression where
after deleting the compression xattr property through the setflags ioctl,
we don't set the BTRFS_INODE_COPY_EVERYTHING flag in the inode anymore.
As a consequence, if the inode was fsync'ed when it had the compression
property set, after deleting the compression property through the setflags
ioctl and fsync'ing again the inode, the log will still contain the
compression xattr, because the inode did not had that bit set, which
made the fsync not delete all xattrs from the log and copy all xattrs
from the subvolume tree to the log tree.
This regression happens due to the fact that that series of cleanups
made btrfs_set_prop() call the old function do_setxattr() (which is now
named btrfs_setxattr()), and not the old version of btrfs_setxattr(),
which is now called btrfs_setxattr_trans().
Fix this by setting the BTRFS_INODE_COPY_EVERYTHING bit in the current
btrfs_setxattr() function and remove it from everywhere else, including
its setup at btrfs_ioctl_setflags(). This is cleaner, avoids similar
regressions in the future, and centralizes the setup of the bit. After
all, the need to setup this bit should only be in the xattrs module,
since it is an implementation of xattrs.
Fixes: 04e6863b19c722 ("btrfs: split btrfs_setxattr calls regarding transaction")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Currently, the /proc/sys/net/bridge folder is only created in the initial
network namespace. This patch ensures that the /proc/sys/net/bridge folder
is available in each network namespace if the module is loaded and
disappears from all network namespaces when the module is unloaded.
In doing so the patch makes the sysctls:
bridge-nf-call-arptables
bridge-nf-call-ip6tables
bridge-nf-call-iptables
bridge-nf-filter-pppoe-tagged
bridge-nf-filter-vlan-tagged
bridge-nf-pass-vlan-input-dev
apply per network namespace. This unblocks some use-cases where users would
like to e.g. not do bridge filtering for bridges in a specific network
namespace while doing so for bridges located in another network namespace.
The netfilter rules are afaict already per network namespace so it should
be safe for users to specify whether bridge devices inside a network
namespace are supposed to go through iptables et al. or not. Also, this can
already be done per-bridge by setting an option for each individual bridge
via Netlink. It should also be possible to do this for all bridges in a
network namespace via sysctls.
Cc: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|