Age | Commit message (Collapse) | Author |
|
Fixes: 701d678599d0c1 ("mm/zsmalloc.c: fix race condition in zs_destroy_pool")
Link: http://lkml.kernel.org/r/201908251039.5oSbEEUT%25lkp@intel.com
Reported-by: kbuild test robot <lkp@intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Adams <jwadams@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
I've noticed that the "slab" value in memory.stat is sometimes 0, even
if some children memory cgroups have a non-zero "slab" value. The
following investigation showed that this is the result of the kmem_cache
reparenting in combination with the per-cpu batching of slab vmstats.
At the offlining some vmstat value may leave in the percpu cache, not
being propagated upwards by the cgroup hierarchy. It means that stats
on ancestor levels are lower than actual. Later when slab pages are
released, the precise number of pages is substracted on the parent
level, making the value negative. We don't show negative values, 0 is
printed instead.
To fix this issue, let's flush percpu slab memcg and lruvec stats on
memcg offlining. This guarantees that numbers on all ancestor levels
are accurate and match the actual number of outstanding slab pages.
Link: http://lkml.kernel.org/r/20190819202338.363363-3-guro@fb.com
Fixes: fb2f2b0adb98 ("mm: memcg/slab: reparent memcg kmem_caches on cgroup removal")
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Spurious warning when loading rules using the physdev match,
from Todd Seidelmann.
2) Fix FTP conntrack helper debugging output, from Thomas Jarosch.
3) Restore per-netns nf_conntrack_{acct,helper,timeout} sysctl knobs,
from Florian Westphal.
4) Clear skbuff timestamp from the flowtable datapath, also from Florian.
5) Fix incorrect byteorder of NFT_META_BRI_IIFVPROTO, from wenxu.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2019-08-31
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix 32-bit zero-extension during constant blinding which
has been causing a regression on ppc64, from Naveen.
2) Fix a latency bug in nfp driver when updating stack index
register, from Jiong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a new function bnxt_get_registered_vfs() to handle the work
of getting the number of registered VFs under #ifdef CONFIG_BNXT_SRIOV.
The main code will call this function and will always work correctly
whether CONFIG_BNXT_SRIOV is set or not.
Fixes: 230d1f0de754 ("bnxt_en: Handle firmware reset.")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Kevin Laatz says:
====================
This patch set adds the ability to use unaligned chunks in the XDP umem.
Currently, all chunk addresses passed to the umem are masked to be chunk
size aligned (max is PAGE_SIZE). This limits where we can place chunks
within the umem as well as limiting the packet sizes that are supported.
The changes in this patch set removes these restrictions, allowing XDP to
be more flexible in where it can place a chunk within a umem. By relaxing
where the chunks can be placed, it allows us to use an arbitrary buffer
size and place that wherever we have a free address in the umem. These
changes add the ability to support arbitrary frame sizes up to 4k
(PAGE_SIZE) and make it easy to integrate with other existing frameworks
that have their own memory management systems, such as DPDK.
In DPDK, for example, there is already support for AF_XDP with zero-copy.
However, with this patch set the integration will be much more seamless.
You can find the DPDK AF_XDP driver at:
https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp
Since we are now dealing with arbitrary frame sizes, we need also need to
update how we pass around addresses. Currently, the addresses can simply be
masked to 2k to get back to the original address. This becomes less trivial
when using frame sizes that are not a 'power of 2' size. This patch set
modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
field for an offset value, leaving the lower 48-bits for the address (this
leaves us with 256 Terabytes, which should be enough!). We only need to use
the upper 16-bits to store the offset when running in unaligned mode.
Rather than adding the offset (headroom etc) to the address, we will store
it in the upper 16-bits of the address field. This way, we can easily add
the offset to the address where we need it, using some bit manipulation and
addition, and we can also easily get the original address wherever we need
it (for example in i40e_zca_free) by simply masking to get the lower
48-bits of the address field.
The patch set was tested with the following set up:
- Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
- Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
- Driver: i40e
- Application: xdpsock with l2fwd (single interface)
- Turbo disabled in BIOS
There are no changes to performance before and after these patches for SKB
mode and Copy mode. Zero-copy mode saw a performance degradation of ~1.5%.
This patch set has been applied against
commit 0bb52b0dfc88 ("tools: bpftool: add 'bpftool map freeze' subcommand")
Structure of the patch set:
Patch 1:
- Remove unnecessary masking and headroom addition during zero-copy Rx
buffer recycling in i40e. This change is required in order for the
buffer recycling to work in the unaligned chunk mode.
Patch 2:
- Remove unnecessary masking and headroom addition during
zero-copy Rx buffer recycling in ixgbe. This change is required in
order for the buffer recycling to work in the unaligned chunk mode.
Patch 3:
- Add infrastructure for unaligned chunks. Since we are dealing with
unaligned chunks that could potentially cross a physical page boundary,
we add checks to keep track of that information. We can later use this
information to correctly handle buffers that are placed at an address
where they cross a page boundary. This patch also modifies the
existing Rx and Tx functions to use the new descriptor format. To
handle addresses correctly, we need to mask appropriately based on
whether we are in aligned or unaligned mode.
Patch 4:
- This patch updates the i40e driver to make use of the new descriptor
format.
Patch 5:
- This patch updates the ixgbe driver to make use of the new descriptor
format.
Patch 6:
- This patch updates the mlx5e driver to make use of the new descriptor
format. These changes are required to handle the new descriptor format
and for unaligned chunks support.
Patch 7:
- This patch allows XSK frames smaller than page size in the mlx5e
driver. Relax the requirements to the XSK frame size to allow it to be
smaller than a page and even not a power of two. The current
implementation can work in this mode, both with Striding RQ and without
it.
Patch 8:
- Add flags for umem configuration to libbpf. Since we increase the size
of the struct by adding flags, we also need to add the ABI versioning
in this patch.
Patch 9:
- Modify xdpsock application to add a command line option for
unaligned chunks
Patch 10:
- Since we can now run the application in unaligned chunk mode, we need
to make sure we recycle the buffers appropriately.
Patch 11:
- Adds hugepage support to the xdpsock application
Patch 12:
- Documentation update to include the unaligned chunk scenario. We need
to explicitly state that the incoming addresses are only masked in the
aligned chunk mode and not the unaligned chunk mode.
v2:
- fixed checkpatch issues
- fixed Rx buffer recycling for unaligned chunks in xdpsock
- removed unused defines
- fixed how chunk_size is calculated in xsk_diag.c
- added some performance numbers to cover letter
- modified descriptor format to make it easier to retrieve original
address
- removed patch adding off_t off to the zero copy allocator. This is no
longer needed with the new descriptor format.
v3:
- added patch for mlx5 driver changes needed for unaligned chunks
- moved offset handling to new helper function
- changed value used for the umem chunk_mask. Now using the new
descriptor format to save us doing the calculations in a number of
places meaning more of the code is left unchanged while adding
unaligned chunk support.
v4:
- reworked the next_pg_contig field in the xdp_umem_page struct. We now
use the low 12 bits of the addr for flags rather than adding an extra
field in the struct.
- modified unaligned chunks flag define
- fixed page_start calculation in __xsk_rcv_memcpy().
- move offset handling to the xdp_umem_get_* functions
- modified the len field in xdp_umem_reg struct. We now use 16 bits from
this for the flags field.
- fixed headroom addition to handle in the mlx5e driver
- other minor changes based on review comments
v5:
- Added ABI versioning in the libbpf patch
- Removed bitfields in the xdp_umem_reg struct. Adding new flags field.
- Added accessors for getting addr and offset.
- Added helper function for adding the offset to the addr.
- Fixed conflicts with 'bpf-af-xdp-wakeup' which was merged recently.
- Fixed typo in mlx driver patch.
- Moved libbpf patch to later in the set (7/11, just before the sample
app changes)
v6:
- Added support for XSK frames smaller than page in mlx5e driver (Maxim
Mikityanskiy <maximmi@mellanox.com).
- Fixed offset handling in xsk_generic_rcv.
- Added check for base address in xskq_is_valid_addr_unaligned.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The addition of unaligned chunks mode, the documentation needs to be
updated to indicate that the incoming addr to the fill ring will only be
masked if the user application is run in the aligned chunk mode. This patch
also adds a line to explicitly indicate that the incoming addr will not be
masked if running the user application in the unaligned chunk mode.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
This patch modifies xdpsock to use mmap instead of posix_memalign. With
this change, we can use hugepages when running the application in unaligned
chunks mode. Using hugepages makes it more likely that we have physically
contiguous memory, which supports the unaligned chunk mode better.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
This patch adds buffer recycling support for unaligned buffers. Since we
don't mask the addr to 2k at umem_reg in unaligned mode, we need to make
sure we give back the correct (original) addr to the fill queue. We achieve
this using the new descriptor format and associated masks. The new format
uses the upper 16-bits for the offset and the lower 48-bits for the addr.
Since we have a field for the offset, we no longer need to modify the
actual address. As such, all we have to do to get back the original address
is mask for the lower 48 bits (i.e. strip the offset and we get the address
on it's own).
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
This patch adds support for the unaligned chunks mode. The addition of the
unaligned chunks option will allow users to run the application with more
relaxed chunk placement in the XDP umem.
Unaligned chunks mode can be used with the '-u' or '--unaligned' command
line options.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
This patch adds a 'flags' field to the umem_config and umem_reg structs.
This will allow for more options to be added for configuring umems.
The first use for the flags field is to add a flag for unaligned chunks
mode. These flags can either be user-provided or filled with a default.
Since we change the size of the xsk_umem_config struct, we need to version
the ABI. This patch includes the ABI versioning for xsk_umem__create. The
Makefile was also updated to handle multiple function versions in
check-abi.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Relax the requirements to the XSK frame size to allow it to be smaller
than a page and even not a power of two. The current implementation can
work in this mode, both with Striding RQ and without it.
The code that checks `mtu + headroom <= XSK frame size` is modified
accordingly. Any frame size between 2048 and PAGE_SIZE is accepted.
Functions that worked with pages only now work with XSK frames, even if
their size is different from PAGE_SIZE.
With XSK queues, regardless of the frame size, Striding RQ uses the
stride size of PAGE_SIZE, and UMR MTTs are posted using starting
addresses of frames, but PAGE_SIZE as page size. MTU guarantees that no
packet data will overlap with other frames. UMR MTT size is made equal
to the stride size of the RQ, because UMEM frames may come in random
order, and we need to handle them one by one. PAGE_SIZE is just a power
of two that is bigger than any allowed XSK frame size, and also it
doesn't require making additional changes to the code.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
With the addition of the unaligned chunks option, we need to make sure we
handle the offsets accordingly based on the mode we are currently running
in. This patch modifies the driver to appropriately mask the address for
each case.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Currently, addresses are chunk size aligned. This means, we are very
restricted in terms of where we can place chunk within the umem. For
example, if we have a chunk size of 2k, then our chunks can only be placed
at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
This patch introduces the ability to use unaligned chunks. With these
changes, we are no longer bound to having to place chunks at a 2k (or
whatever your chunk size is) interval. Since we are no longer dealing with
aligned chunks, they can now cross page boundaries. Checks for page
contiguity have been added in order to keep track of which pages are
followed by a physically contiguous page.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'obi' values
directly without having to mask and add the headroom.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Currently, the dma, addr and handle are modified when we reuse Rx buffers
in zero-copy mode. However, this is not required as the inputs to the
function are copies, not the original values themselves. As we use the
copies within the function, we can use the original 'old_bi' values
directly without having to mask and add the headroom.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
This patch fix a spelling typo in test_offload.py
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
If a SYN cookie is not issued by tcp_v#_gen_syncookie, then the return
value will be exactly 0, rather than <= 0. Let's change the check to
reflect that, especially since mss is an unsigned value and cannot be
negative.
Fixes: 70d66244317e ("bpf: add bpf_tcp_gen_syncookie helper")
Reported-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Petar Penkov <ppenkov@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Jakub Kicinski says:
====================
This set adds a small batching and cache mechanism to the driver.
Map dumps require two operations per element - get next, and
lookup. Each of those needs a round trip to the device, and on
a loaded system scheduling out and in of the dumping process.
This set makes the driver request a number of entries at the same
time, and if no operation which would modify the map happens
from the host side those entries are used to serve lookup
requests for up to 250us, at which point they are considered
stale.
This set has been measured to provide almost 4x dumping speed
improvement, Jaco says:
OLD dump times
500 000 elements: 26.1s
1 000 000 elements: 54.5s
NEW dump times
500 000 elements: 7.6s
1 000 000 elements: 16.5s
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Each get_next and lookup call requires a round trip to the device.
However, the device is capable of giving us a few entries back,
instead of just one.
In this patch we ask for a small yet reasonable number of entries
(4) on every get_next call, and on subsequent get_next/lookup calls
check this little cache for a hit. The cache is only kept for 250us,
and is invalidated on every operation which may modify the map
(e.g. delete or update call). Note that operations may be performed
simultaneously, so we have to keep track of operations in flight.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
If control channel MTU is too low to support map operations a warning
will be printed. This is not enough, we want to make sure probe fails
in such scenario, as this would clearly be a faulty configuration.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Quentin Monnet says:
====================
This set attempts to make it easier to build bpftool, in particular when
passing a specific output directory. This is a follow-up to the
conversation held last month by Lorenz, Ilya and Jakub [0].
The first patch is a minor fix to bpftool's Makefile, regarding the
retrieval of kernel version (which currently prints a non-relevant make
warning on some invocations).
Second patch improves the Makefile commands to support more "make"
invocations, or to fix building with custom output directory. On Jakub's
suggestion, a script is also added to BPF selftests in order to keep track
of the supported build variants.
Building bpftool with "make tools/bpf" from the top of the repository
generates files in "libbpf/" and "feature/" directories under tools/bpf/
and tools/bpf/bpftool/. The third patch ensures such directories are taken
care of on "make clean", and add them to the relevant .gitignore files.
At last, fourth patch is a sligthly modified version of Ilya's fix
regarding libbpf.a appearing twice on the linking command for bpftool.
[0] https://lore.kernel.org/bpf/CACAyw9-CWRHVH3TJ=Tke2x8YiLsH47sLCijdp=V+5M836R9aAA@mail.gmail.com/
v2:
- Return error from check script if one of the make invocations returns
non-zero (even if binary is successfully produced).
- Run "make clean" from bpf/ and not only bpf/bpftool/ in that same script,
when relevant.
- Add a patch to clean up generated "feature/" and "libbpf/" directories.
====================
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
In bpftool's Makefile, $(LIBS) includes $(LIBBPF), therefore the library
is used twice in the linking command. No need to have $(LIBBPF) (from
$^) on that command, let's do with "$(OBJS) $(LIBS)" (but move $(LIBBPF)
_before_ the -l flags in $(LIBS)).
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
When building "tools/bpf" from the top of the Linux repository, the
build system passes a value for the $(OUTPUT) Makefile variable to
tools/bpf/Makefile and tools/bpf/bpftool/Makefile, which results in
generating "libbpf/" (for bpftool) and "feature/" (bpf and bpftool)
directories inside the tree.
This commit adds such directories to the relevant .gitignore files, and
edits the Makefiles to ensure they are removed on "make clean". The use
of "rm" is also made consistent throughout those Makefiles (relies on
the $(RM) variable, use "--" to prevent interpreting
$(OUTPUT)/$(DESTDIR) as options.
v2:
- New patch.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
There are a number of alternative "make" invocations that can be used to
compile bpftool. The following invocations are expected to work:
- through the kbuild system, from the top of the repository
(make tools/bpf)
- by telling make to change to the bpftool directory
(make -C tools/bpf/bpftool)
- by building the BPF tools from tools/
(cd tools && make bpf)
- by running make from bpftool directory
(cd tools/bpf/bpftool && make)
Additionally, setting the O or OUTPUT variables should tell the build
system to use a custom output path, for each of these alternatives.
The following patch fixes the following invocations:
$ make tools/bpf
$ make tools/bpf O=<dir>
$ make -C tools/bpf/bpftool OUTPUT=<dir>
$ make -C tools/bpf/bpftool O=<dir>
$ cd tools/ && make bpf O=<dir>
$ cd tools/bpf/bpftool && make OUTPUT=<dir>
$ cd tools/bpf/bpftool && make O=<dir>
After this commit, the build still fails for two variants when passing
the OUTPUT variable:
$ make tools/bpf OUTPUT=<dir>
$ cd tools/ && make bpf OUTPUT=<dir>
In order to remember and check what make invocations are supposed to
work, and to document the ones which do not, a new script is added to
the BPF selftests. Note that some invocations require the kernel to be
configured, so the script skips them if no .config file is found.
v2:
- In make_and_clean(), set $ERROR to 1 when "make" returns non-zero,
even if the binary was produced.
- Run "make clean" from the correct directory (bpf/ instead of bpftool/,
when relevant).
Reported-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Bpftool calls the toplevel Makefile to get the kernel version for the
sources it is built from. But when the utility is built from the top of
the kernel repository, it may dump the following error message for
certain architectures (including x86):
$ make tools/bpf
[...]
make[3]: *** [checkbin] Error 1
[...]
This does not prevent bpftool compilation, but may feel disconcerting.
The "checkbin" arch-dependent target is not supposed to be called for
target "kernelversion", which is a simple "echo" of the version number.
It turns out this is caused by the make invocation in tools/bpf/bpftool,
which attempts to find implicit rules to apply. Extract from debug
output:
Reading makefiles...
Reading makefile 'Makefile'...
Reading makefile 'scripts/Kbuild.include' (search path) (no ~ expansion)...
Reading makefile 'scripts/subarch.include' (search path) (no ~ expansion)...
Reading makefile 'arch/x86/Makefile' (search path) (no ~ expansion)...
Reading makefile 'scripts/Makefile.kcov' (search path) (no ~ expansion)...
Reading makefile 'scripts/Makefile.gcc-plugins' (search path) (no ~ expansion)...
Reading makefile 'scripts/Makefile.kasan' (search path) (no ~ expansion)...
Reading makefile 'scripts/Makefile.extrawarn' (search path) (no ~ expansion)...
Reading makefile 'scripts/Makefile.ubsan' (search path) (no ~ expansion)...
Updating makefiles....
Considering target file 'scripts/Makefile.ubsan'.
Looking for an implicit rule for 'scripts/Makefile.ubsan'.
Trying pattern rule with stem 'Makefile.ubsan'.
[...]
Trying pattern rule with stem 'Makefile.ubsan'.
Trying implicit prerequisite 'scripts/Makefile.ubsan.o'.
Looking for a rule with intermediate file 'scripts/Makefile.ubsan.o'.
Avoiding implicit rule recursion.
Trying pattern rule with stem 'Makefile.ubsan'.
Trying rule prerequisite 'prepare'.
Trying rule prerequisite 'FORCE'.
Found an implicit rule for 'scripts/Makefile.ubsan'.
Considering target file 'prepare'.
File 'prepare' does not exist.
Considering target file 'prepare0'.
File 'prepare0' does not exist.
Considering target file 'archprepare'.
File 'archprepare' does not exist.
Considering target file 'archheaders'.
File 'archheaders' does not exist.
Finished prerequisites of target file 'archheaders'.
Must remake target 'archheaders'.
Putting child 0x55976f4f6980 (archheaders) PID 31743 on the chain.
To avoid that, pass the -r and -R flags to eliminate the use of make
built-in rules (and while at it, built-in variables) when running
command "make kernelversion" from bpftool's Makefile.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
This adds support for bpf-to-bpf function calls in the s390 JIT
compiler. The JIT compiler converts the bpf call instructions to
native branch instructions. After a round of the usual passes, the
start addresses of the JITed images for the callee functions are
known. Finally, to fixup the branch target addresses, we need to
perform an extra pass.
Because of the address range in which JITed images are allocated on
s390, the offsets of the start addresses of these images from
__bpf_call_base are as large as 64 bits. So, for a function call,
the imm field of the instruction cannot be used to determine the
callee's address. Use bpf_jit_get_func_addr() helper instead.
The patch borrows a lot from:
commit 8c11ea5ce13d ("bpf, arm64: fix getting subprog addr from aux
for calls")
commit e2c95a61656d ("bpf, ppc64: generalize fetching subprog into
bpf_jit_get_func_addr")
commit 8484ce8306f9 ("bpf: powerpc64: add JIT support for
multi-function programs")
(including the commit message).
test_verifier (5.3-rc6 with CONFIG_BPF_JIT_ALWAYS_ON=y):
without patch:
Summary: 1501 PASSED, 0 SKIPPED, 47 FAILED
with patch:
Summary: 1540 PASSED, 0 SKIPPED, 8 FAILED
Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Vlad Buslov says:
====================
Fixes for unlocked cls hardware offload API refactoring
Two fixes for my "Refactor cls hardware offload API to support
rtnl-independent drivers" series.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
New local variable "struct flow_block_offload *f" was added to
mlx5e_setup_tc() in recent rtnl lock removal patches. The variable is used
in code that is only compiled when CONFIG_MLX5_ESWITCH is enabled. This
results compilation warning about unused variable when CONFIG_MLX5_ESWITCH
is not set. Move the variable definition into eswitch-specific code block
from the beginning of mlx5e_setup_tc() function.
Fixes: c9f14470d048 ("net: sched: add API for registering unlocked offload block callbacks")
Reported-by: tanhuazhong <tanhuazhong@huawei.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Recent rtnl lock removal patch changed flow_action infra to require proper
cleanup besides simple memory deallocation. However, matchall classifier
was not updated to call tc_cleanup_flow_action(). Add proper cleanup to
mall_replace_hw_filter() and mall_reoffload().
Fixes: 5a6ff4b13d59 ("net: sched: take reference to action dev before calling offloads")
Reported-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This explicitly clarifies that bbr_bdp() returns the rounded-up value of
the bandwidth-delay product and why in the comments.
Signed-off-by: Luke Hsiao <lukehsiao@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement this callback in order to get the offloaded stats added to the
kernel stats.
Reported-by: Pengfei Liu <pengfeil@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a local endpoint is ceases to be in use, such as when the kafs module
is unloaded, the kernel will emit an assertion failure if there are any
outstanding client connections:
rxrpc: Assertion failed
------------[ cut here ]------------
kernel BUG at net/rxrpc/local_object.c:433!
and even beyond that, will evince other oopses if there are service
connections still present.
Fix this by:
(1) Removing the triggering of connection reaping when an rxrpc socket is
released. These don't actually clean up the connections anyway - and
further, the local endpoint may still be in use through another
socket.
(2) Mark the local endpoint as dead when we start the process of tearing
it down.
(3) When destroying a local endpoint, strip all of its client connections
from the idle list and discard the ref on each that the list was
holding.
(4) When destroying a local endpoint, call the service connection reaper
directly (rather than through a workqueue) to immediately kill off all
outstanding service connections.
(5) Make the service connection reaper reap connections for which the
local endpoint is marked dead.
Only after destroying the connections can we close the socket lest we get
an oops in a workqueue that's looking at a connection or a peer.
Fixes: 3d18cbb7fd0c ("rxrpc: Fix conn expiry timers")
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
rxrpc: Fix use of skb_cow_data()
Here's a series of patches that replaces the use of skb_cow_data() in rxrpc
with skb_unshare() early on in the input process. The problem that is
being seen is that skb_cow_data() indirectly requires that the maximum
usage count on an sk_buff be 1, and it may generate an assertion failure in
pskb_expand_head() if not.
This can occur because rxrpc_input_data() may be still holding a ref when
it has just attached the sk_buff to the rx ring and given that attachment
its own ref. If recvmsg happens fast enough, skb_cow_data() can see the
ref still held by the softirq handler.
Further, a packet may contain multiple subpackets, each of which gets its
own attachment to the ring and its own ref - also making skb_cow_data() go
bang.
Fix this by:
(1) The DATA packet is currently parsed for subpackets twice by the input
routines. Parse it just once instead and make notes in the sk_buff
private data.
(2) Use the notes from (1) when attaching the packet to the ring multiple
times. Once the packet is attached to the ring, recvmsg can see it
and start modifying it, so the softirq handler is not permitted to
look inside it from that point.
(3) Pass the ref from the input code to the ring rather than getting an
extra ref. rxrpc_input_data() uses a ref on the second refcount to
prevent the packet from evaporating under it.
(4) Call skb_unshare() on secured DATA packets in rxrpc_input_packet()
before we take call->input_lock. Other sorts of packets don't get
modified and so can be left.
A trace is emitted if skb_unshare() eats the skb. Note that
skb_share() for our accounting in this regard as we can't see the
parameters in the packet to log in a trace line if it releases it.
(5) Remove the calls to skb_cow_data(). These are then no longer
necessary.
There are also patches to improve the rxrpc_skb tracepoint to make sure
that Tx-derived buffers are identified separately from Rx-derived buffers
in the trace.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes: 190f73ab4c43 ("net: stmmac: setup higher frequency clk support for EHL & TGL")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The devicetree binding lists the phy phy as optional. As such, the
driver should not bail out if it can't find a regulator. Instead it
should just skip the remaining regulator related code and continue
on normally.
Skip the remainder of phy_power_on() if a regulator supply isn't
available. This also gets rid of the bogus return code.
Fixes: 2e12f536635f ("net: stmmac: dwmac-rk: Use standard devicetree property for phy regulator")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In xgbe_mod_init(), we should do cleanup if some error occurs
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: efbaa828330a ("amd-xgbe: Add support to handle device renaming")
Fixes: 47f164deab22 ("amd-xgbe: Add PCI device support")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pointer pkt is being initialized with a value that is never read
and pkt is being re-assigned a little later on. The assignment is
redundant and hence can be removed.
Addresses-Coverity: ("Ununsed value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Michael Chan says:
====================
bnxt_en: health and error recovery.
This patchset implements adapter health and error recovery. The status
is reported through several devlink reporters and the driver will
initiate and complete the recovery process using the devlink infrastructure.
v2: Added 4 patches at the beginning of the patchset to clean up error code
handling related to firmware messages and to convert to use standard
error codes.
Removed the dropping of rtnl_lock in bnxt_close().
Broke up the patches some more for better patch organization and
future bisection.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Health show command example and output:
$ devlink health show pci/0000:af:00.0 reporter fw_fatal
pci/0000:af:00.0:
name fw_fatal
state healthy error 1 recover 1 grace_period 0 auto_recover true
Fatal events from firmware or missing periodic heartbeats will
be reported and recovery will be handled.
We also turn on the support flags when we register with the firmware to
enable this health and recovery feature in the firmware.
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This call will handle fatal firmware errors by forcing a reset on the
firmware. The master function driver will carry out the forced reset.
The sequence will go through the same bnxt_fw_reset_task() workqueue.
This fatal reset differs from the non-fatal reset at the beginning
stages. From the BNXT_FW_RESET_STATE_ENABLE_DEV state onwards where
the firmware is coming out of reset, it is practically identical to the
non-fatal reset.
The next patch will add the periodic heartbeat check and the devlink
reporter to report the fatal event and to initiate the bnxt_fw_exception()
call.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This state handles driver initiated chip reset during error recovery.
Only the master function will perform this step during error recovery.
The next patch will add code to initiate this reset from the master
function.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a flag to mark that the firmware has encountered fatal condition.
The driver will not send any more firmware messages and will return
error to the caller. Fix up some clean up functions to continue
and not abort when the firmware message function returns error.
This is preparation work to fully handle firmware error recovery
under fatal conditions.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Retain the VF MAC address, default VLAN, TX rate control, trust settings
of VFs after firmware reset.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add devlink health reporter for the firmware reset event. Once we get
the notification from firmware about the impending reset, the driver
will report this to devlink and the call to bnxt_fw_reset() will be
initiated to complete the reset sequence.
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add the bnxt_fw_reset() main function to handle firmware reset. This
is triggered by firmware to initiate an orderly reset, for example
when a non-fatal exception condition has been detected. bnxt_fw_reset()
will first wait for all VFs to shutdown and then start the
bnxt_fw_reset_task() work queue to go through the sequence of reset,
re-probe, and re-initialization.
The next patch will add the devlink reporter to start the sequence and
call bnxt_fw_reset().
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This event from firmware signals a coordinated reset initiated by the
firmware. It may be triggered by some error conditions encountered
in the firmware or other orderly reset conditions.
We store the parameters from this event. Subsequent patches will
add logic to handle reset itself using devlink reporters.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Create new FW devlink_health_reporter, to know the current health
status of FW.
Command example and output:
$ devlink health show pci/0000:af:00.0 reporter fw
pci/0000:af:00.0:
name fw
state healthy error 0 recover 0
FW status: Healthy; Reset count: 1
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|