Age | Commit message (Collapse) | Author |
|
This patch remembers which smap triggers the allocation
of a 'struct bpf_local_storage' object. The local_storage is
allocated during the very first selem added to the owner.
The smap pointer is needed when using the bpf_mem_cache_free
in a later patch because it needs to free to the correct
smap's bpf_mem_alloc object.
When a selem is being removed, it needs to check if it is
the selem that triggers the creation of the local_storage.
If it is, the local_storage->smap pointer will be reset to NULL.
This NULL reset is done under the local_storage->lock in
bpf_selem_unlink_storage_nolock() when a selem is being removed.
Also note that the local_storage may not go away even
local_storage->smap is NULL because there may be other
selem still stored in the local_storage.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20230308065936.1550103-6-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch first renames bpf_local_storage_unlink_nolock to
bpf_local_storage_destroy(). It better reflects that it is only
used when the storage's owner (sk/task/cgrp/inode) is being kfree().
All bpf_local_storage_destroy's caller is taking the spin lock and
then free the storage. This patch also moves these two steps into
the bpf_local_storage_destroy.
This is a preparation work for a later patch that uses
bpf_mem_cache_alloc/free in the bpf_local_storage.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20230308065936.1550103-3-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch moves the bpf_local_storage_free_rcu() and
bpf_selem_unlink_map() to static because they are
not used outside of bpf_local_storage.c.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20230308065936.1550103-2-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
State equivalence check and checkpointing performed in is_state_visited()
employs certain heuristics to try to save memory by avoiding state checkpoints
if not enough jumps and instructions happened since last checkpoint. This leads
to unpredictability of whether a particular instruction will be checkpointed
and how regularly. While normally this is not causing much problems (except
inconveniences for predictable verifier tests, which we overcome with
BPF_F_TEST_STATE_FREQ flag), turns out it's not the case for open-coded
iterators.
Checking and saving state checkpoints at iter_next() call is crucial for fast
convergence of open-coded iterator loop logic, so we need to force it. If we
don't do that, is_state_visited() might skip saving a checkpoint, causing
unnecessarily long sequence of not checkpointed instructions and jumps, leading
to exhaustion of jump history buffer, and potentially other undesired outcomes.
It is expected that with correct open-coded iterators convergence will happen
quickly, so we don't run a risk of exhausting memory.
This patch adds, in addition to prune and jump instruction marks, also a
"forced checkpoint" mark, and makes sure that any iter_next() call instruction
is marked as such.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230310060149.625887-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The recent fix for the deferred I/O by the commit
3efc61d95259 ("fbdev: Fix invalid page access after closing deferred I/O devices")
caused a regression when the same fb device is opened/closed while
it's being used. It resulted in a frozen screen even if something
is redrawn there after the close. The breakage is because the patch
was made under a wrong assumption of a single open; in the current
code, fb_deferred_io_release() cleans up the page mapping of the
pageref list and it calls cancel_delayed_work_sync() unconditionally,
where both are no correct behavior for multiple opens.
This patch adds a refcount for the opens of the device, and applies
the cleanup only when all files get closed.
As both fb_deferred_io_open() and _close() are called always in the
fb_info lock (mutex), it's safe to use the normal int for the
refcounting.
Also, a useless BUG_ON() is dropped.
Fixes: 3efc61d95259 ("fbdev: Fix invalid page access after closing deferred I/O devices")
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Reviewed-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230308105012.1845-1-tiwai@suse.de
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:
====================
Netfilter updates for net-next
1. nf_tables 'brouting' support, from Sriram Yagnaraman.
2. Update bridge netfilter and ovs conntrack helpers to handle
IPv6 Jumbo packets properly, i.e. fetch the packet length
from hop-by-hop extension header, from Xin Long.
This comes with a test BIG TCP test case, added to
tools/testing/selftests/net/.
3. Fix spelling and indentation in conntrack, from Jeremy Sowden.
* 'main' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nat: fix indentation of function arguments
netfilter: conntrack: fix typo
selftests: add a selftest for big tcp
netfilter: use nf_ip6_check_hbh_len in nf_ct_skb_network_trim
netfilter: move br_nf_check_hbh_len to utils
netfilter: bridge: move pskb_trim_rcsum out of br_nf_check_hbh_len
netfilter: bridge: check len before accessing more nh data
netfilter: bridge: call pskb_may_pull in br_nf_check_hbh_len
netfilter: bridge: introduce broute meta statement
====================
Link: https://lore.kernel.org/r/20230308193033.13965-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
No users in the tree. Tested with allmodconfig build.
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230308142006.20879-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Documentation/bpf/bpf_devel_QA.rst
b7abcd9c656b ("bpf, doc: Link to submitting-patches.rst for general patch submission info")
d56b0c461d19 ("bpf, docs: Fix link to netdev-FAQ target")
https://lore.kernel.org/all/20230307095812.236eb1be@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After commit c28cd1f3433c ("clk: Mark a fwnode as initialized when using
CLK_OF_DECLARE() macro"), drivers/clk/mvebu/kirkwood.c fails to build:
drivers/clk/mvebu/kirkwood.c:358:1: error: expected identifier or '('
CLK_OF_DECLARE(98dx1135_clk, "marvell,mv98dx1135-core-clock",
^
include/linux/clk-provider.h:1367:21: note: expanded from macro 'CLK_OF_DECLARE'
static void __init name##_of_clk_init_declare(struct device_node *np) \
^
<scratch space>:124:1: note: expanded from here
98dx1135_clk_of_clk_init_declare
^
drivers/clk/mvebu/kirkwood.c:358:1: error: invalid digit 'd' in decimal constant
include/linux/clk-provider.h:1372:34: note: expanded from macro 'CLK_OF_DECLARE'
OF_DECLARE_1(clk, name, compat, name##_of_clk_init_declare)
^
<scratch space>:125:3: note: expanded from here
98dx1135_clk_of_clk_init_declare
^
drivers/clk/mvebu/kirkwood.c:358:1: error: invalid digit 'd' in decimal constant
include/linux/clk-provider.h:1372:34: note: expanded from macro 'CLK_OF_DECLARE'
OF_DECLARE_1(clk, name, compat, name##_of_clk_init_declare)
^
<scratch space>:125:3: note: expanded from here
98dx1135_clk_of_clk_init_declare
^
drivers/clk/mvebu/kirkwood.c:358:1: error: invalid digit 'd' in decimal constant
include/linux/clk-provider.h:1372:34: note: expanded from macro 'CLK_OF_DECLARE'
OF_DECLARE_1(clk, name, compat, name##_of_clk_init_declare)
^
<scratch space>:125:3: note: expanded from here
98dx1135_clk_of_clk_init_declare
^
C function names must start with either an alphabetic letter or an
underscore. To avoid generating invalid function names from clock names,
add two underscores to the beginning of the identifier.
Fixes: c28cd1f3433c ("clk: Mark a fwnode as initialized when using CLK_OF_DECLARE() macro")
Suggested-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20230308-clk_of_declare-fix-v1-1-317b741e2532@kernel.org
Reviewed-by: Saravana Kannan <saravanak@google.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
Commit b8a1a4cd5a98 ("i2c: Provide a temporary .probe_new() call-back
type") introduced a new probe callback to convert i2c init routines to
not take an i2c_device_id parameter. Now that all in-tree drivers are
converted to the temporary .probe_new() callback, .probe() can be
modified to match the desired prototype.
Now that .probe() and .probe_new() have the same semantic, they can be
defined as members of an anonymous union to save some memory and
simplify the core code a bit.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Benjamin Tissoires:
- fix potential out of bound write of zeroes in HID core with a
specially crafted uhid device (Lee Jones)
- fix potential use-after-free in work function in intel-ish-hid (Reka
Norman)
- selftests config fixes (Benjamin Tissoires)
- few device small fixes and support
* tag 'for-linus-2023030901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: intel-ish-hid: ipc: Fix potential use-after-free in work function
HID: logitech-hidpp: Add support for Logitech MX Master 3S mouse
HID: cp2112: Fix driver not registering GPIO IRQ chip as threaded
selftest: hid: fix hid_bpf not set in config
HID: uhid: Over-ride the default maximum data buffer value with our own
HID: core: Provide new max_buffer_size attribute to over-ride the default
|
|
Implement the first open-coded iterator type over a range of integers.
It's public API consists of:
- bpf_iter_num_new() constructor, which accepts [start, end) range
(that is, start is inclusive, end is exclusive).
- bpf_iter_num_next() which will keep returning read-only pointer to int
until the range is exhausted, at which point NULL will be returned.
If bpf_iter_num_next() is kept calling after this, NULL will be
persistently returned.
- bpf_iter_num_destroy() destructor, which needs to be called at some
point to clean up iterator state. BPF verifier enforces that iterator
destructor is called at some point before BPF program exits.
Note that `start = end = X` is a valid combination to setup an empty
iterator. bpf_iter_num_new() will return 0 (success) for any such
combination.
If bpf_iter_num_new() detects invalid combination of input arguments, it
returns error, resets iterator state to, effectively, empty iterator, so
any subsequent call to bpf_iter_num_next() will keep returning NULL.
BPF verifier has no knowledge that returned integers are in the
[start, end) value range, as both `start` and `end` are not statically
known and enforced: they are runtime values.
While the implementation is pretty trivial, some care needs to be taken
to avoid overflows and underflows. Subsequent selftests will validate
correctness of [start, end) semantics, especially around extremes
(INT_MIN and INT_MAX).
Similarly to bpf_loop(), we enforce that no more than BPF_MAX_LOOPS can
be specified.
bpf_iter_num_{new,next,destroy}() is a logical evolution from bounded
BPF loops and bpf_loop() helper and is the basis for implementing
ergonomic BPF loops with no statically known or verified bounds.
Subsequent patches implement bpf_for() macro, demonstrating how this can
be wrapped into something that works and feels like a normal for() loop
in C language.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230308184121.1165081-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Teach verifier about the concept of the open-coded (or inline) iterators.
This patch adds generic iterator loop verification logic, new STACK_ITER
stack slot type to contain iterator state, and necessary kfunc plumbing
for iterator's constructor, destructor and next methods. Next patch
implements first specific iterator (numbers iterator for implementing
for() loop logic). Such split allows to have more focused commits for
verifier logic and separate commit that we could point later to
demonstrating what does it take to add a new kind of iterator.
Each kind of iterator has its own associated struct bpf_iter_<type>,
where <type> denotes a specific type of iterator. struct bpf_iter_<type>
state is supposed to live on BPF program stack, so there will be no way
to change its size later on without breaking backwards compatibility, so
choose wisely! But given this struct is specific to a given <type> of
iterator, this allows a lot of flexibility: simple iterators could be
fine with just one stack slot (8 bytes), like numbers iterator in the
next patch, while some other more complicated iterators might need way
more to keep their iterator state. Either way, such design allows to
avoid runtime memory allocations, which otherwise would be necessary if
we fixed on-the-stack size and it turned out to be too small for a given
iterator implementation.
The way BPF verifier logic is implemented, there are no artificial
restrictions on a number of active iterators, it should work correctly
using multiple active iterators at the same time. This also means you
can have multiple nested iteration loops. struct bpf_iter_<type>
reference can be safely passed to subprograms as well.
General flow is easiest to demonstrate with a simple example using
number iterator implemented in next patch. Here's the simplest possible
loop:
struct bpf_iter_num it;
int *v;
bpf_iter_num_new(&it, 2, 5);
while ((v = bpf_iter_num_next(&it))) {
bpf_printk("X = %d", *v);
}
bpf_iter_num_destroy(&it);
Above snippet should output "X = 2", "X = 3", "X = 4". Note that 5 is
exclusive and is not returned. This matches similar APIs (e.g., slices
in Go or Rust) that implement a range of elements, where end index is
non-inclusive.
In the above example, we see a trio of function:
- constructor, bpf_iter_num_new(), which initializes iterator state
(struct bpf_iter_num it) on the stack. If any of the input arguments
are invalid, constructor should make sure to still initialize it such
that subsequent bpf_iter_num_next() calls will return NULL. I.e., on
error, return error and construct empty iterator.
- next method, bpf_iter_num_next(), which accepts pointer to iterator
state and produces an element. Next method should always return
a pointer. The contract between BPF verifier is that next method will
always eventually return NULL when elements are exhausted. Once NULL is
returned, subsequent next calls should keep returning NULL. In the
case of numbers iterator, bpf_iter_num_next() returns a pointer to an int
(storage for this integer is inside the iterator state itself),
which can be dereferenced after corresponding NULL check.
- once done with the iterator, it's mandated that user cleans up its
state with the call to destructor, bpf_iter_num_destroy() in this
case. Destructor frees up any resources and marks stack space used by
struct bpf_iter_num as usable for something else.
Any other iterator implementation will have to implement at least these
three methods. It is enforced that for any given type of iterator only
applicable constructor/destructor/next are callable. I.e., verifier
ensures you can't pass number iterator state into, say, cgroup
iterator's next method.
It is important to keep the naming pattern consistent to be able to
create generic macros to help with BPF iter usability. E.g., one
of the follow up patches adds generic bpf_for_each() macro to bpf_misc.h
in selftests, which allows to utilize iterator "trio" nicely without
having to code the above somewhat tedious loop explicitly every time.
This is enforced at kfunc registration point by one of the previous
patches in this series.
At the implementation level, iterator state tracking for verification
purposes is very similar to dynptr. We add STACK_ITER stack slot type,
reserve necessary number of slots, depending on
sizeof(struct bpf_iter_<type>), and keep track of necessary extra state
in the "main" slot, which is marked with non-zero ref_obj_id. Other
slots are also marked as STACK_ITER, but have zero ref_obj_id. This is
simpler than having a separate "is_first_slot" flag.
Another big distinction is that STACK_ITER is *always refcounted*, which
simplifies implementation without sacrificing usability. So no need for
extra "iter_id", no need to anticipate reuse of STACK_ITER slots for new
constructors, etc. Keeping it simple here.
As far as the verification logic goes, there are two extensive comments:
in process_iter_next_call() and iter_active_depths_differ() explaining
some important and sometimes subtle aspects. Please refer to them for
details.
But from 10,000-foot point of view, next methods are the points of
forking a verification state, which are conceptually similar to what
verifier is doing when validating conditional jump. We branch out at
a `call bpf_iter_<type>_next` instruction and simulate two outcomes:
NULL (iteration is done) and non-NULL (new element is returned). NULL is
simulated first and is supposed to reach exit without looping. After
that non-NULL case is validated and it either reaches exit (for trivial
examples with no real loop), or reaches another `call bpf_iter_<type>_next`
instruction with the state equivalent to already (partially) validated
one. State equivalency at that point means we technically are going to
be looping forever without "breaking out" out of established "state
envelope" (i.e., subsequent iterations don't add any new knowledge or
constraints to the verifier state, so running 1, 2, 10, or a million of
them doesn't matter). But taking into account the contract stating that
iterator next method *has to* return NULL eventually, we can conclude
that loop body is safe and will eventually terminate. Given we validated
logic outside of the loop (NULL case), and concluded that loop body is
safe (though potentially looping many times), verifier can claim safety
of the overall program logic.
The rest of the patch is necessary plumbing for state tracking, marking,
validation, and necessary further kfunc plumbing to allow implementing
iterator constructor, destructor, and next methods.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230308184121.1165081-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add ability to register kfuncs that implement BPF open-coded iterator
contract and enforce naming and function proto convention. Enforcement
happens at the time of kfunc registration and significantly simplifies
the rest of iterators logic in the verifier.
More details follow in subsequent patches, but we enforce the following
conditions.
All kfuncs (constructor, next, destructor) have to be named consistenly
as bpf_iter_<type>_{new,next,destroy}(), respectively. <type> represents
iterator type, and iterator state should be represented as a matching
`struct bpf_iter_<type>` state type. Also, all iter kfuncs should have
a pointer to this `struct bpf_iter_<type>` as the very first argument.
Additionally:
- Constructor, i.e., bpf_iter_<type>_new(), can have arbitrary extra
number of arguments. Return type is not enforced either.
- Next method, i.e., bpf_iter_<type>_next(), has to return a pointer
type and should have exactly one argument: `struct bpf_iter_<type> *`
(const/volatile/restrict and typedefs are ignored).
- Destructor, i.e., bpf_iter_<type>_destroy(), should return void and
should have exactly one argument, similar to the next method.
- struct bpf_iter_<type> size is enforced to be positive and
a multiple of 8 bytes (to fit stack slots correctly).
Such strictness and consistency allows to build generic helpers
abstracting important, but boilerplate, details to be able to use
open-coded iterators effectively and ergonomically (see bpf_for_each()
in subsequent patches). It also simplifies the verifier logic in some
places. At the same time, this doesn't hurt generality of possible
iterator implementations. Win-win.
Constructor kfunc is marked with a new KF_ITER_NEW flags, next method is
marked with KF_ITER_NEXT (and should also have KF_RET_NULL, of course),
while destructor kfunc is marked as KF_ITER_DESTROY.
Additionally, we add a trivial kfunc name validation: it should be
a valid non-NULL and non-empty string.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230308184121.1165081-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Andrii Nakryiko says:
====================
pull-request: bpf-next 2023-03-08
We've added 23 non-merge commits during the last 2 day(s) which contain
a total of 28 files changed, 414 insertions(+), 104 deletions(-).
The main changes are:
1) Add more precise memory usage reporting for all BPF map types,
from Yafang Shao.
2) Add ARM32 USDT support to libbpf, from Puranjay Mohan.
3) Fix BTF_ID_LIST size causing problems in !CONFIG_DEBUG_INFO_BTF,
from Nathan Chancellor.
4) IMA selftests fix, from Roberto Sassu.
5) libbpf fix in APK support code, from Daniel Müller.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits)
selftests/bpf: Fix IMA test
libbpf: USDT arm arg parsing support
libbpf: Refactor parse_usdt_arg() to re-use code
libbpf: Fix theoretical u32 underflow in find_cd() function
bpf: enforce all maps having memory usage callback
bpf: offload map memory usage
bpf, net: xskmap memory usage
bpf, net: sock_map memory usage
bpf, net: bpf_local_storage memory usage
bpf: local_storage memory usage
bpf: bpf_struct_ops memory usage
bpf: queue_stack_maps memory usage
bpf: devmap memory usage
bpf: cpumap memory usage
bpf: bloom_filter memory usage
bpf: ringbuf memory usage
bpf: reuseport_array memory usage
bpf: stackmap memory usage
bpf: arraymap memory usage
bpf: hashtab memory usage
...
====================
Link: https://lore.kernel.org/r/20230308193533.1671597-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rename br_nf_check_hbh_len() to nf_ip6_check_hbh_len() and move it
to netfilter utils, so that it can be used by other modules, like
ovs and tc.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
Commit 0091bfc81741 ("io_uring/af_unix: defer registered
files gc to io_uring release") added one bit to struct sk_buff.
This structure is critical for networking, and we try very hard
to not add bloat on it, unless absolutely required.
For instance, we can use a specific destructor as a wrapper
around unix_destruct_scm(), to identify skbs that unix_gc()
has to special case.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
enum skb_drop_reason is more generic, we can adopt it instead.
Provide dev_kfree_skb_irq_reason() and dev_kfree_skb_any_reason().
This means drivers can use more precise drop reasons if they want to.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
Link: https://lore.kernel.org/r/20230306204313.10492-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
cond sometimes is (val & MASK) what may result in a false positive
if val is a negative errno. We shouldn't evaluate cond if val < 0.
This has no functional impact here, but it's not nice.
Therefore switch order of the checks.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/6d8274ac-4344-23b4-d9a3-cad4c39517d4@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The current interconnect provider interface is inherently racy as
providers are expected to be added before being fully initialised.
Specifically, nodes are currently not added and the provider data is not
initialised until after registering the provider which can cause racing
DT lookups to fail.
Add a new provider API which will be used to fix up the interconnect
drivers.
The old API is reimplemented using the new interface and will be removed
once all drivers have been fixed.
Fixes: 11f1ceca7031 ("interconnect: Add generic on-chip interconnect API")
Fixes: 87e3031b6fbd ("interconnect: Allow endpoints translation via DT")
Cc: stable@vger.kernel.org # 5.1
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Tested-by: Luca Ceresoli <luca.ceresoli@bootlin.com> # i.MX8MP MSC SM2-MB-EP1 Board
Link: https://lore.kernel.org/r/20230306075651.2449-4-johan+linaro@kernel.org
Signed-off-by: Georgi Djakov <djakov@kernel.org>
|
|
Commit 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask
optimizations") changed cpumask_setall() to use "bitmap_set()" instead
of "bitmap_fill()", because bitmap_fill() would explicitly set all the
bits of a constant sized small bitmap, and that's exactly what we don't
want: we want to only set bits up to 'nr_cpu_ids', which is what
"bitmap_set()" does.
However, Yury correctly points out that while "bitmap_set()" does indeed
only set bits up to the required bitmap size, it doesn't _clear_ bits
above that size, so the upper bits would still not have well-defined
values.
Now, none of this should really matter, since any bits set past
'nr_cpu_ids' should always be ignored in the first place. Yes, the bit
scanning functions might return them as a result, but since users should
always consider the ">= nr_cpu_ids" condition to mean "no more bits",
that shouldn't have any actual effect (see previous commit 8ca09d5fa354
"cpumask: fix incorrect cpumask scanning result checks").
But let's just do it right, the way the code was _intended_ to work. We
have had enough lazy code that works but bites us in the *rse later
(again, see previous commit) that there's no reason to not just do this
properly.
It turns out that "bitmap_fill()" gets this all right for the complex
case, and really only fails for the inlined optimized case that just
fills the whole word. And while we could just fix bitmap_fill() to use
the proper last word mask, there's two issues with that:
- the cpumask case wants to do the _optimization_ based on "NR_CPUS is
a small constant", but then wants to do the actual bit _fill_ based
on "nr_cpu_ids" that isn't necessarily that same constant
- we have lots of non-cpumask users of bitmap_fill(), and while they
hopefully don't care, and probably would want the proper semantics
anyway ("only set bits up to the limit"), I do not want the cpumask
changes to impact other parts
So this ends up just doing the single-word optimization by hand in the
cpumask code. If our cpumask is fundamentally limited to a single word,
just do the proper "fill in that word" exactly. And if it's the more
complex multi-word case, then the generic bitmap_fill() will DTRT.
This is all an example of how our bitmap function optimizations really
are somewhat broken. They conflate the "this is size of the bitmap"
optimizations with the actual bit(s) we want to set.
In many cases we really want to have the two be separate things:
sometimes we base our optimizations on the size of the whole bitmap ("I
know this whole bitmap fits in a single word, so I'll just use
single-word accesses"), and sometimes we base them on the bit we are
looking at ("this is just acting on bits that are in the first word, so
I'll use single-word accesses").
Notice how the end result of the two optimizations are the same, but the
way we get to them are quite different.
And all our cpumask optimization games are really about that fundamental
distinction, and we'd often really want to pass in both the "this is the
bit I'm working on" (which _can_ be a small constant but might be
variable), and "I know it's in this range even if it's variable" (based
on CONFIG_NR_CPUS).
So this cpumask_setall() implementation just makes that explicit. It
checks the "I statically know the size is small" using the known static
size of the cpumask (which is what that 'small_cpumask_bits' is all
about), but then sets the actual bits using the exact number of cpus we
have (ie 'nr_cpumask_bits')
Of course, in a perfect world, the compiler would have done all the
range analysis (possibly with help from us just telling it that
"this value is always in this range"), and would do all of this for us.
But that is not the world we live in.
While we dream of that perfect world, this does that manual logic to
make it all work out. And this was a very long explanation for a small
code change that shouldn't even matter.
Reported-by: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/lkml/ZAV9nGG9e1%2FrV+L%2F@yury-laptop/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
A new helper is introduced to calculate offload map memory usage. But
currently the memory dynamically allocated in netdev dev_ops, like
nsim_map_update_elem, is not counted. Let's just put it aside now.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230305124615.12358-18-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
A new helper is introduced into bpf_local_storage map to calculate the
memory usage. This helper is also used by other maps like
bpf_cgrp_storage, bpf_inode_storage, bpf_task_storage and etc.
Note that currently the dynamically allocated storage elements are not
counted in the usage, since it will take extra runtime overhead in the
elements update or delete path. So let's put it aside now, and implement
it in the future when someone really need it.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230305124615.12358-15-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a new map ops ->map_mem_usage to print the memory usage of a
bpf map.
This is a preparation for the followup change.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230305124615.12358-2-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
After commit 66e3a13e7c2c ("bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr"),
clang builds without CONFIG_DEBUG_INFO_BTF warn:
kernel/bpf/verifier.c:10298:24: warning: array index 16 is past the end of the array (that has type 'u32[16]' (aka 'unsigned int[16]')) [-Warray-bounds]
meta.func_id == special_kfunc_list[KF_bpf_dynptr_slice_rdwr]) {
^ ~~~~~~~~~~~~~~~~~~~~~~~~
kernel/bpf/verifier.c:9150:1: note: array 'special_kfunc_list' declared here
BTF_ID_LIST(special_kfunc_list)
^
include/linux/btf_ids.h:207:27: note: expanded from macro 'BTF_ID_LIST'
#define BTF_ID_LIST(name) static u32 __maybe_unused name[16];
^
1 warning generated.
A warning of this nature was previously addressed by
commit beb3d47d1d3d ("bpf: Fix a BTF_ID_LIST bug with CONFIG_DEBUG_INFO_BTF not set")
but there have been new kfuncs added since then.
Quadruple the size of the CONFIG_DEBUG_INFO_BTF=n definition so that
this problem is unlikely to show up for some time.
Link: https://github.com/ClangBuiltLinux/linux/issues/1810
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20230307-bpf-kfuncs-warray-bounds-v1-1-00ad3191f3a6@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
kunmap_local() + put_page(), as done by e.g. ext2 directory handling.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-03-06
We've added 85 non-merge commits during the last 13 day(s) which contain
a total of 131 files changed, 7102 insertions(+), 1792 deletions(-).
The main changes are:
1) Add skb and XDP typed dynptrs which allow BPF programs for more
ergonomic and less brittle iteration through data and variable-sized
accesses, from Joanne Koong.
2) Bigger batch of BPF verifier improvements to prepare for upcoming BPF
open-coded iterators allowing for less restrictive looping capabilities,
from Andrii Nakryiko.
3) Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF
programs to NULL-check before passing such pointers into kfunc,
from Alexei Starovoitov.
4) Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in
local storage maps, from Kumar Kartikeya Dwivedi.
5) Add BPF verifier support for ST instructions in convert_ctx_access()
which will help new -mcpu=v4 clang flag to start emitting them,
from Eduard Zingerman.
6) Make uprobe attachment Android APK aware by supporting attachment
to functions inside ELF objects contained in APKs via function names,
from Daniel Müller.
7) Add a new flag BPF_F_TIMER_ABS flag for bpf_timer_start() helper
to start the timer with absolute expiration value instead of relative
one, from Tero Kristo.
8) Add a new kfunc bpf_cgroup_from_id() to look up cgroups via id,
from Tejun Heo.
9) Extend libbpf to support users manually attaching kprobes/uprobes
in the legacy/perf/link mode, from Menglong Dong.
10) Implement workarounds in the mips BPF JIT for DADDI/R4000,
from Jiaxun Yang.
11) Enable mixing bpf2bpf and tailcalls for the loongarch BPF JIT,
from Hengqi Chen.
12) Extend BPF instruction set doc with describing the encoding of BPF
instructions in terms of how bytes are stored under big/little endian,
from Jose E. Marchesi.
13) Follow-up to enable kfunc support for riscv BPF JIT, from Pu Lehui.
14) Fix bpf_xdp_query() backwards compatibility on old kernels,
from Yonghong Song.
15) Fix BPF selftest cross compilation with CLANG_CROSS_FLAGS,
from Florent Revest.
16) Improve bpf_cpumask_ma to only allocate one bpf_mem_cache,
from Hou Tao.
17) Fix BPF verifier's check_subprogs to not unnecessarily mark
a subprogram with has_tail_call, from Ilya Leoshkevich.
18) Fix arm syscall regs spec in libbpf's bpf_tracing.h, from Puranjay Mohan.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits)
selftests/bpf: Add test for legacy/perf kprobe/uprobe attach mode
selftests/bpf: Split test_attach_probe into multi subtests
libbpf: Add support to set kprobe/uprobe attach mode
tools/resolve_btfids: Add /libsubcmd to .gitignore
bpf: add support for fixed-size memory pointer returns for kfuncs
bpf: generalize dynptr_get_spi to be usable for iters
bpf: mark PTR_TO_MEM as non-null register type
bpf: move kfunc_call_arg_meta higher in the file
bpf: ensure that r0 is marked scratched after any function call
bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback helper
bpf: clean up visit_insn()'s instruction processing
selftests/bpf: adjust log_fixup's buffer size for proper truncation
bpf: honor env->test_state_freq flag in is_state_visited()
selftests/bpf: enhance align selftest's expected log matching
bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER}
bpf: improve stack slot state printing
selftests/bpf: Disassembler tests for verifier.c:convert_ctx_access()
selftests/bpf: test if pointer type is tracked for BPF_ST_MEM
bpf: allow ctx writes using BPF_ST_MEM instruction
bpf: Use separate RCU callbacks for freeing selem
...
====================
Link: https://lore.kernel.org/r/20230307004346.27578-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We already mark fwnodes as initialized when they are registered as clock
providers. We do this so that fw_devlink can tell when a clock driver
doesn't use the driver core framework to probe/initialize its device.
This ensures fw_devlink doesn't block the consumers of such a clock
provider indefinitely.
However, some users of CLK_OF_DECLARE() macros don't use the same node
that matches the macro as the node for the clock provider, but they
initialize the entire node. To cover these cases, also mark the nodes
that match the macros as initialized when the init callback function is
called.
An example of this is "stericsson,u8500-clks" that's handled using
CLK_OF_DECLARE() and looks something like this:
clocks {
compatible = "stericsson,u8500-clks";
prcmu_clk: prcmu-clock {
#clock-cells = <1>;
};
prcc_pclk: prcc-periph-clock {
#clock-cells = <2>;
};
prcc_kclk: prcc-kernel-clock {
#clock-cells = <2>;
};
prcc_reset: prcc-reset-controller {
#reset-cells = <2>;
};
...
};
This patch makes sure that "clocks" is marked as initialized so that
fw_devlink knows that all nodes under it have been initialized. If the
driver creates struct devices for some of the subnodes, fw_devlink is
smart enough to know to wait for those devices to probe, so no special
handling is required for those cases.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reported-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/lkml/CACRpkdamxDX6EBVjKX5=D3rkHp17f5pwGdBVhzFU90-0MHY6dQ@mail.gmail.com/
Fixes: 4a032827daa8 ("of: property: Simplify of_link_to_phandle()")
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20230302014639.297514-1-saravanak@google.com
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
The never used nr_cpumask_size is just a typo, hence use existing
redefinition that's called nr_cpumask_bits.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit aa47a7c215e7 ("lib/cpumask: deprecate nr_cpumask_bits") resulted
in the cpumask operations potentially becoming hugely less efficient,
because suddenly the cpumask was always considered to be variable-sized.
The optimization was then later added back in a limited form by commit
6f9c07be9d02 ("lib/cpumask: add FORCE_NR_CPUS config option"), but that
FORCE_NR_CPUS option is not useful in a generic kernel and more of a
special case for embedded situations with fixed hardware.
Instead, just re-introduce the optimization, with some changes.
Instead of depending on CPUMASK_OFFSTACK being false, and then always
using the full constant cpumask width, this introduces three different
cpumask "sizes":
- the exact size (nr_cpumask_bits) remains identical to nr_cpu_ids.
This is used for situations where we should use the exact size.
- the "small" size (small_cpumask_bits) is the NR_CPUS constant if it
fits in a single word and the bitmap operations thus end up able
to trigger the "small_const_nbits()" optimizations.
This is used for the operations that have optimized single-word
cases that get inlined, notably the bit find and scanning functions.
- the "large" size (large_cpumask_bits) is the NR_CPUS constant if it
is an sufficiently small constant that makes simple "copy" and
"clear" operations more efficient.
This is arbitrarily set at four words or less.
As a an example of this situation, without this fixed size optimization,
cpumask_clear() will generate code like
movl nr_cpu_ids(%rip), %edx
addq $63, %rdx
shrq $3, %rdx
andl $-8, %edx
callq memset@PLT
on x86-64, because it would calculate the "exact" number of longwords
that need to be cleared.
In contrast, with this patch, using a MAX_CPU of 64 (which is quite a
reasonable value to use), the above becomes a single
movq $0,cpumask
instruction instead, because instead of caring to figure out exactly how
many CPU's the system has, it just knows that the cpumask will be a
single word and can just clear it all.
Note that this does end up tightening the rules a bit from the original
version in another way: operations that set bits in the cpumask are now
limited to the actual nr_cpu_ids limit, whereas we used to do the
nr_cpumask_bits thing almost everywhere in the cpumask code.
But if you just clear bits, or scan for bits, we can use the simpler
compile-time constants.
In the process, remove 'cpumask_complement()' and 'for_each_cpu_not()'
which were not useful, and which fundamentally have to be limited to
'nr_cpu_ids'. Better remove them now than have somebody introduce use
of them later.
Of course, on x86-64 with MAXSMP there is no sane small compile-time
constant for the cpumask sizes, and we end up using the actual CPU bits,
and will generate the above kind of horrors regardless. Please don't
use MAXSMP unless you really expect to have machines with thousands of
cores.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"A set of updates for the interrupt susbsystem:
- Prevent possible NULL pointer derefences in
irq_data_get_affinity_mask() and irq_domain_create_hierarchy()
- Take the per device MSI lock before invoking code which relies on
it being hold
- Make sure that MSI descriptors are unreferenced before freeing
them. This was overlooked when the platform MSI code was converted
to use core infrastructure and results in a fals positive warning
- Remove dead code in the MSI subsystem
- Clarify the documentation for pci_msix_free_irq()
- More kobj_type constification"
* tag 'irq-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/msi, platform-msi: Ensure that MSI descriptors are unreferenced
genirq/msi: Drop dead domain name assignment
irqdomain: Add missing NULL pointer check in irq_domain_create_hierarchy()
genirq/irqdesc: Make kobj_type structures constant
PCI/MSI: Clarify usage of pci_msix_free_irq()
genirq/msi: Take the per-device MSI lock before validating the control structure
genirq/ipi: Fix NULL pointer deref in irq_data_get_affinity_mask()
|
|
include/linux/compiler-intel.h had no update in the past 3 years.
We often forget about the third C compiler to build the kernel.
For example, commit a0a12c3ed057 ("asm goto: eradicate CC_HAS_ASM_GOTO")
only mentioned GCC and Clang.
init/Kconfig defines CC_IS_GCC and CC_IS_CLANG but not CC_IS_ICC,
and nobody has reported any issue.
I guess the Intel Compiler support is broken, and nobody is caring
about it.
Harald Arnesen pointed out ICC (classic Intel C/C++ compiler) is
deprecated:
$ icc -v
icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is
deprecated and will be removed from product release in the second half
of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended
compiler moving forward. Please transition to use this compiler. Use
'-diag-disable=10441' to disable this message.
icc version 2021.7.0 (gcc version 12.1.0 compatibility)
Arnd Bergmann provided a link to the article, "Intel C/C++ compilers
complete adoption of LLVM".
lib/zstd/common/compiler.h and lib/zstd/compress/zstd_fast.c were kept
untouched for better sync with https://github.com/facebook/zstd
Link: https://www.intel.com/content/www/us/en/developer/articles/technical/adoption-of-llvm-complete-icx.html
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"17 hotfixes.
Eight are for MM and seven are for other parts of the kernel. Seven
are cc:stable and eight address post-6.3 issues or were judged
unsuitable for -stable backporting"
* tag 'mm-hotfixes-stable-2023-03-04-13-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mailmap: map Dikshita Agarwal's old address to his current one
mailmap: map Vikash Garodia's old address to his current one
fs/cramfs/inode.c: initialize file_ra_state
fs: hfsplus: fix UAF issue in hfsplus_put_super
panic: fix the panic_print NMI backtrace setting
lib: parser: update documentation for match_NUMBER functions
kasan, x86: don't rename memintrinsics in uninstrumented files
kasan: test: fix test for new meminstrinsic instrumentation
kasan: treat meminstrinsic as builtins in uninstrumented files
kasan: emit different calls for instrumentable memintrinsics
ocfs2: fix non-auto defrag path not working issue
ocfs2: fix defrag path triggering jbd2 ASSERT
mailmap: map Georgi Djakov's old Linaro address to his current one
mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON
lib/zlib: DFLTCC deflate does not write all available bits for Z_NO_FLUSH
mm/damon/paddr: fix missing folio_put()
mm/mremap: fix dup_anon_vma() in vma_merge() case 4
|
|
Pull more io_uring updates from Jens Axboe:
"Here's a set of fixes/changes that didn't make the first cut, either
because they got queued before I sent the early merge request, or
fixes that came in afterwards. In detail:
- Don't set MSG_NOSIGNAL on recv/recvmsg opcodes, as AF_PACKET will
error out (David)
- Fix for spurious poll wakeups (me)
- Fix for a file leak for buffered reads in certain conditions
(Joseph)
- Don't allow registered buffers of mixed types (Pavel)
- Improve handling of huge pages for registered buffers (Pavel)
- Provided buffer ring size calculation fix (Wojciech)
- Minor cleanups (me)"
* tag 'io_uring-6.3-2023-03-03' of git://git.kernel.dk/linux:
io_uring/poll: don't pass in wake func to io_init_poll_iocb()
io_uring: fix fget leak when fs don't support nowait buffered read
io_uring/poll: allow some retries for poll triggering spuriously
io_uring: remove MSG_NOSIGNAL from recvmsg
io_uring/rsrc: always initialize 'folio' to NULL
io_uring/rsrc: optimise registered huge pages
io_uring/rsrc: optimise single entry advance
io_uring/rsrc: disallow multi-source reg buffers
io_uring: remove unused wq_list_merge
io_uring: fix size calculation when registering buf ring
io_uring/rsrc: fix a comment in io_import_fixed()
io_uring: rename 'in_idle' to 'in_cancel'
io_uring: consolidate the put_ref-and-return section of adding work
|
|
Pull block fixes from Jens Axboe:
- NVMe pull request via Christoph:
- Don't access released socket during error recovery (Akinobu
Mita)
- Bring back auto-removal of deleted namespaces during sequential
scan (Christoph Hellwig)
- Fix an error code in nvme_auth_process_dhchap_challenge (Dan
Carpenter)
- Show well known discovery name (Daniel Wagner)
- Add a missing endianess conversion in effects masking (Keith
Busch)
- Fix for a regression introduced in blk-rq-qos during init in this
merge window (Breno)
- Reorder a few fields in struct blk_mq_tag_set, eliminating a few
holes and shrinking it (Christophe)
- Remove redundant bdev_get_queue() NULL checks (Juhyung)
- Add sed-opal single user mode support flag (Luca)
- Remove SQE128 check in ublk as it isn't needed, saving some memory
(Ming)
- Op specific segment checking for cloned requests (Uday)
- Exclusive open partition scan fixes (Yu)
- Loop offset/size checking before assigning them in the device (Zhong)
- Bio polling fixes (me)
* tag 'block-6.3-2023-03-03' of git://git.kernel.dk/linux:
blk-mq: enforce op-specific segment limits in blk_insert_cloned_request
nvme-fabrics: show well known discovery name
nvme-tcp: don't access released socket during error recovery
nvme-auth: fix an error code in nvme_auth_process_dhchap_challenge()
nvme: bring back auto-removal of deleted namespaces during sequential scan
blk-iocost: Pass gendisk to ioc_refresh_params
nvme: fix sparse warning on effects masking
block: be a bit more careful in checking for NULL bdev while polling
block: clear bio->bi_bdev when putting a bio back in the cache
loop: loop_set_status_from_info() check before assignment
ublk: remove check IO_URING_F_SQE128 in ublk_ch_uring_cmd
block: remove more NULL checks after bdev_get_queue()
blk-mq: Reorder fields in 'struct blk_mq_tag_set'
block: fix scan partition for exclusively open device again
block: Revert "block: Do not reread partition table on exclusively open device"
sed-opal: add support flag for SUM in status ioctl
|
|
Martin suggested that instead of using a byte in the hole (which he has
a use for in his future patch) in bpf_local_storage_elem, we can
dispatch a different call_rcu callback based on whether we need to free
special fields in bpf_local_storage_elem data. The free path, described
in commit 9db44fdd8105 ("bpf: Support kptrs in local storage maps"),
only waits for call_rcu callbacks when there are special (kptrs, etc.)
fields in the map value, hence it is necessary that we only access
smap in this case.
Therefore, dispatch different RCU callbacks based on the BPF map has a
valid btf_record, which dereference and use smap's btf_record only when
it is valid.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230303141542.300068-1-memxor@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"A few drivers got some nice cleanups and a new driver are making the
bulk of the changes.
Subsystem:
- allow rtc_read_alarm without read_alarm callback
New driver:
- NXP BBNSM module RTC
Drivers:
- use IRQ flags from fwnode when available
- abx80x: nvmem support
- brcmstb-waketimer: add non-wake alarm support
- ingenic: provide CLK32K clock
- isl12022: cleanups
- moxart: switch to using gpiod API
- pcf85363: allow setting quartz load
- pm8xxx: cleanups and support for setting time
- rv3028, rv3032: add ACPI support"
* tag 'rtc-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (64 commits)
rtc: pm8xxx: add support for nvmem offset
dt-bindings: rtc: qcom-pm8xxx: add nvmem-cell offset
rtc: abx80x: Add nvmem support
rtc: rx6110: Remove unused of_gpio,h
rtc: efi: Avoid spamming the log on RTC read failure
rtc: isl12022: sort header inclusion alphabetically
rtc: isl12022: Join string literals back
rtc: isl12022: Drop unneeded OF guards and of_match_ptr()
rtc: isl12022: Explicitly use __le16 type for ISL12022_REG_TEMP_L
rtc: isl12022: Get rid of unneeded private struct isl12022
rtc: pcf85363: add support for the quartz-load-femtofarads property
dt-bindings: rtc: nxp,pcf8563: move pcf85263/pcf85363 to a dedicated binding
rtc: allow rtc_read_alarm without read_alarm callback
rtc: rv3032: add ACPI support
rtc: rv3028: add ACPI support
rtc: bbnsm: Add the bbnsm rtc support
rtc: jz4740: Register clock provider for the CLK32K pin
rtc: jz4740: Use dev_err_probe()
rtc: jz4740: Use readl_poll_timeout
dt-bindings: rtc: Add #clock-cells property
...
|
|
bpf_rcu_read_lock/unlock() are only available in clang compiled kernels. Lack
of such key mechanism makes it impossible for sleepable bpf programs to use RCU
pointers.
Allow bpf_rcu_read_lock/unlock() in GCC compiled kernels (though GCC doesn't
support btf_type_tag yet) and allowlist certain field dereferences in important
data structures like tast_struct, cgroup, socket that are used by sleepable
programs either as RCU pointer or full trusted pointer (which is valid outside
of RCU CS). Use BTF_TYPE_SAFE_RCU and BTF_TYPE_SAFE_TRUSTED macros for such
tagging. They will be removed once GCC supports btf_type_tag.
With that refactor check_ptr_to_btf_access(). Make it strict in enforcing
PTR_TRUSTED and PTR_UNTRUSTED while deprecating old PTR_TO_BTF_ID without
modifier flags. There is a chance that this strict enforcement might break
existing programs (especially on GCC compiled kernels), but this cleanup has to
start sooner than later. Note PTR_TO_CTX access still yields old deprecated
PTR_TO_BTF_ID. Once it's converted to strict PTR_TRUSTED or PTR_UNTRUSTED the
kfuncs and helpers will be able to default to KF_TRUSTED_ARGS. KF_RCU will
remain as a weaker version of KF_TRUSTED_ARGS where obj refcnt could be 0.
Adjust rcu_read_lock selftest to run on gcc and clang compiled kernels.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230303041446.3630-7-alexei.starovoitov@gmail.com
|
|
The life time of certain kernel structures like 'struct cgroup' is protected by RCU.
Hence it's safe to dereference them directly from __kptr tagged pointers in bpf maps.
The resulting pointer is MEM_RCU and can be passed to kfuncs that expect KF_RCU.
Derefrence of other kptr-s returns PTR_UNTRUSTED.
For example:
struct map_value {
struct cgroup __kptr *cgrp;
};
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(test_cgrp_get_ancestors, struct cgroup *cgrp_arg, const char *path)
{
struct cgroup *cg, *cg2;
cg = bpf_cgroup_acquire(cgrp_arg); // cg is PTR_TRUSTED and ref_obj_id > 0
bpf_kptr_xchg(&v->cgrp, cg);
cg2 = v->cgrp; // This is new feature introduced by this patch.
// cg2 is PTR_MAYBE_NULL | MEM_RCU.
// When cg2 != NULL, it's a valid cgroup, but its percpu_ref could be zero
if (cg2)
bpf_cgroup_ancestor(cg2, level); // safe to do.
}
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230303041446.3630-4-alexei.starovoitov@gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:
- Shrink 'struct instruction', to improve objtool performance & memory
footprint
- Other maximum memory usage reductions - this makes the build both
faster, and fixes kernel build OOM failures on allyesconfig and
similar configs when they try to build the final (large) vmlinux.o
- Fix ORC unwinding when a kprobe (INT3) is set on a stack-modifying
single-byte instruction (PUSH/POP or LEAVE). This requires the
extension of the ORC metadata structure with a 'signal' field
- Misc fixes & cleanups
* tag 'objtool-core-2023-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
objtool: Fix ORC 'signal' propagation
objtool: Remove instruction::list
x86: Fix FILL_RETURN_BUFFER
objtool: Fix overlapping alternatives
objtool: Union instruction::{call_dest,jump_table}
objtool: Remove instruction::reloc
objtool: Shrink instruction::{type,visited}
objtool: Make instruction::alts a single-linked list
objtool: Make instruction::stack_ops a single-linked list
objtool: Change arch_decode_instruction() signature
x86/entry: Fix unwinding from kprobe on PUSH/POP instruction
x86/unwind/orc: Add 'signal' field to ORC metadata
objtool: Optimize layout of struct special_alt
objtool: Optimize layout of struct symbol
objtool: Allocate multiple structures with calloc()
objtool: Make struct check_options static
objtool: Make struct entries[] static and const
objtool: Fix HOSTCC flag usage
objtool: Properly support make V=1
objtool: Install libsubcmd in build
...
|
|
Miquel reported a warning in the MSI core which is triggered when
interrupts are freed via platform_msi_device_domain_free().
This code got reworked to use core functions for freeing the MSI
descriptors, but nothing took care to clear the msi_desc->irq entry, which
then triggers the warning in msi_free_msi_desc() which uses desc->irq to
validate that the descriptor has been torn down. The same issue exists in
msi_domain_populate_irqs().
Up to the point that msi_free_msi_descs() grew a warning for this case,
this went un-noticed.
Provide the counterpart of msi_domain_populate_irqs() and invoke it in
platform_msi_device_domain_free() before freeing the interrupts and MSI
descriptors and also in the error path of msi_domain_populate_irqs().
Fixes: 2f2940d16823 ("genirq/msi: Remove filter from msi_free_descs_free_range()")
Reported-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87mt4wkwnv.ffs@tglx
|
|
Enable support for kptrs in local storage maps by wiring up the freeing
of these kptrs from map value. Freeing of bpf_local_storage_map is only
delayed in case there are special fields, therefore bpf_selem_free_*
path can also only dereference smap safely in that case. This is
recorded using a bool utilizing a hole in bpF_local_storage_elem. It
could have been tagged in the pointer value smap using the lowest bit
(since alignment > 1), but since there was already a hole I went with
the simpler option. Only the map structure freeing is delayed using RCU
barriers, as the buckets aren't used when selem is being freed, so they
can be freed once all readers of the bucket lists can no longer access
it.
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230225154010.391965-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Back in 2008 we extended the capability bits from 32 to 64, and we did
it by extending the single 32-bit capability word from one word to an
array of two words. It was then obfuscated by hiding the "2" behind two
macro expansions, with the reasoning being that maybe it gets extended
further some day.
That reasoning may have been valid at the time, but the last thing we
want to do is to extend the capability set any more. And the array of
values not only causes source code oddities (with loops to deal with
it), but also results in worse code generation. It's a lose-lose
situation.
So just change the 'u32[2]' into a 'u64' and be done with it.
We still have to deal with the fact that the user space interface is
designed around an array of these 32-bit values, but that was the case
before too, since the array layouts were different (ie user space
doesn't use an array of 32-bit values for individual capability masks,
but an array of 32-bit slices of multiple masks).
So that marshalling of data is actually simplified too, even if it does
remain somewhat obscure and odd.
This was all triggered by my reaction to the new "cap_isidentical()"
introduced recently. By just using a saner data structure, it went from
unsigned __capi;
CAP_FOR_EACH_U32(__capi) {
if (a.cap[__capi] != b.cap[__capi])
return false;
}
return true;
to just being
return a.val == b.val;
instead. Which is rather more obvious both to humans and to compilers.
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Two new kfuncs are added, bpf_dynptr_slice and bpf_dynptr_slice_rdwr.
The user must pass in a buffer to store the contents of the data slice
if a direct pointer to the data cannot be obtained.
For skb and xdp type dynptrs, these two APIs are the only way to obtain
a data slice. However, for other types of dynptrs, there is no
difference between bpf_dynptr_slice(_rdwr) and bpf_dynptr_data.
For skb type dynptrs, the data is copied into the user provided buffer
if any of the data is not in the linear portion of the skb. For xdp type
dynptrs, the data is copied into the user provided buffer if the data is
between xdp frags.
If the skb is cloned and a call to bpf_dynptr_data_rdwr is made, then
the skb will be uncloned (see bpf_unclone_prologue()).
Please note that any bpf_dynptr_write() automatically invalidates any prior
data slices of the skb dynptr. This is because the skb may be cloned or
may need to pull its paged buffer into the head. As such, any
bpf_dynptr_write() will automatically have its prior data slices
invalidated, even if the write is to data in the skb head of an uncloned
skb. Please note as well that any other helper calls that change the
underlying packet buffer (eg bpf_skb_pull_data()) invalidates any data
slices of the skb dynptr as well, for the same reasons.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230301154953.641654-10-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add xdp dynptrs, which are dynptrs whose underlying pointer points
to a xdp_buff. The dynptr acts on xdp data. xdp dynptrs have two main
benefits. One is that they allow operations on sizes that are not
statically known at compile-time (eg variable-sized accesses).
Another is that parsing the packet data through dynptrs (instead of
through direct access of xdp->data and xdp->data_end) can be more
ergonomic and less brittle (eg does not need manual if checking for
being within bounds of data_end).
For reads and writes on the dynptr, this includes reading/writing
from/to and across fragments. Data slices through the bpf_dynptr_data
API are not supported; instead bpf_dynptr_slice() and
bpf_dynptr_slice_rdwr() should be used.
For examples of how xdp dynptrs can be used, please see the attached
selftests.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230301154953.641654-9-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add skb dynptrs, which are dynptrs whose underlying pointer points
to a skb. The dynptr acts on skb data. skb dynptrs have two main
benefits. One is that they allow operations on sizes that are not
statically known at compile-time (eg variable-sized accesses).
Another is that parsing the packet data through dynptrs (instead of
through direct access of skb->data and skb->data_end) can be more
ergonomic and less brittle (eg does not need manual if checking for
being within bounds of data_end).
For bpf prog types that don't support writes on skb data, the dynptr is
read-only (bpf_dynptr_write() will return an error)
For reads and writes through the bpf_dynptr_read() and bpf_dynptr_write()
interfaces, reading and writing from/to data in the head as well as from/to
non-linear paged buffers is supported. Data slices through the
bpf_dynptr_data API are not supported; instead bpf_dynptr_slice() and
bpf_dynptr_slice_rdwr() (added in subsequent commit) should be used.
For examples of how skb dynptrs can be used, please see the attached
selftests.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230301154953.641654-8-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Some bpf dynptr functions will be called from places where
if CONFIG_BPF_SYSCALL is not set, then the dynptr function is
undefined. For example, when skb type dynptrs are added in the
next commit, dynptr functions are called from net/core/filter.c
This patch defines no-op implementations of these dynptr functions
so that they do not break compilation by being an undefined reference.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230301154953.641654-5-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This change cleans up process_dynptr_func's flow to be more intuitive
and updates some comments with more context.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230301154953.641654-3-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux
Pull sh updates from John Paul Adrian Glaubitz:
- regression fix in connection with the rtl8169 driver on SuperH boards
that was introduced when the driver was switched to use
devm_clk_get_optional_enabled() to simplify the code (Geert
Uytterhoeven)
- build warning fix to allow the kernel to be built with CONFIG_WERROR
enabled (Michael Karcher)
* tag 'sh-for-v6.3-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
sh: clk: Fix clk_enable() to return 0 on NULL clk
sh: intc: Avoid spurious sizeof-pointer-div warning
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
Pull jffs2, ubi and ubifs updates from Richard Weinberger:
"JFFS2:
- Fix memory corruption in error path
- Spelling and coding style fixes
UBI:
- Switch to BLK_MQ_F_BLOCKING in ubiblock
- Wire up partent device (for sysfs)
- Multiple UAF bugfixes
- Fix for an infinite loop in WL error path
UBIFS:
- Fix for multiple memory leaks in error paths
- Fixes for wrong space accounting
- Minor cleanups
- Spelling and coding style fixes"
* tag 'ubifs-for-linus-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: (36 commits)
ubi: block: Fix a possible use-after-free bug in ubiblock_create()
ubifs: make kobj_type structures constant
mtd: ubi: block: wire-up device parent
mtd: ubi: wire-up parent MTD device
ubi: use correct names in function kernel-doc comments
ubi: block: set BLK_MQ_F_BLOCKING
jffs2: Fix list_del corruption if compressors initialized failed
jffs2: Use function instead of macro when initialize compressors
jffs2: fix spelling mistake "neccecary"->"necessary"
ubifs: Fix kernel-doc
ubifs: Fix some kernel-doc comments
UBI: Fastmap: Fix kernel-doc
ubi: ubi_wl_put_peb: Fix infinite loop when wear-leveling work failed
ubi: Fix UAF wear-leveling entry in eraseblk_count_seq_show()
ubi: fastmap: Fix missed fm_anchor PEB in wear-leveling after disabling fastmap
ubifs: ubifs_releasepage: Remove ubifs_assert(0) to valid this process
ubifs: ubifs_writepage: Mark page dirty after writing inode failed
ubifs: dirty_cow_znode: Fix memleak in error handling path
ubifs: Re-statistic cleaned znode count if commit failed
ubi: Fix permission display of the debugfs files
...
|