Age | Commit message (Collapse) | Author |
|
Chuyi Zhou says:
====================
Add Open-coded task, css_task and css iters
This is version 6 of task, css_task and css iters support.
--- Changelog ---
v5 -> v6:
Patch #3:
* In bpf_iter_task_next, return pos rather than goto out. (Andrii)
Patch #2, #3, #4:
* Add the missing __diag_ignore_all to avoid kernel build warning
Patch #5, #6, #7:
* Add Andrii's ack
Patch #8:
* In BPF prog iter_css_task_for_each, return -EPERM rather than 0, and
ensure stack_mprotect() in iters.c not success. If not, it would cause
the subsequent 'test_lsm' fail, since the 'is_stack' check in
test_int_hook(lsm.c) would not be guaranteed.
(https://github.com/kernel-patches/bpf/actions/runs/6489662214/job/17624665086?pr=5790)
v4 -> v5:https://lore.kernel.org/lkml/20231007124522.34834-1-zhouchuyi@bytedance.com/
Patch 3~4:
* Relax the BUILD_BUG_ON check in bpf_iter_task_new and bpf_iter_css_new to avoid
netdev/build_32bit CI error.
(https://netdev.bots.linux.dev/static/nipa/790929/13412333/build_32bit/stderr)
Patch 8:
* Initialize skel pointer to fix the LLVM-16 build CI error
(https://github.com/kernel-patches/bpf/actions/runs/6462875618/job/17545170863)
v3 -> v4:https://lore.kernel.org/all/20230925105552.817513-1-zhouchuyi@bytedance.com/
* Address all the comments from Andrii in patch-3 ~ patch-6
* Collect Tejun's ack
* Add a extra patch to rename bpf_iter_task.c to bpf_iter_tasks.c
* Seperate three BPF program files for selftests (iters_task.c iters_css_task.c iters_css.c)
v2 -> v3:https://lore.kernel.org/lkml/20230912070149.969939-1-zhouchuyi@bytedance.com/
Patch 1 (cgroup: Prepare for using css_task_iter_*() in BPF)
* Add tj's ack and Alexei's suggest-by.
Patch 2 (bpf: Introduce css_task open-coded iterator kfuncs)
* Use bpf_mem_alloc/bpf_mem_free rather than kzalloc()
* Add KF_TRUSTED_ARGS for bpf_iter_css_task_new (Alexei)
* Move bpf_iter_css_task's definition from uapi/linux/bpf.h to
kernel/bpf/task_iter.c and we can use it from vmlinux.h
* Move bpf_iter_css_task_XXX's declaration from bpf_helpers.h to
bpf_experimental.h
Patch 3 (Introduce task open coded iterator kfuncs)
* Change th API design keep consistent with SEC("iter/task"), support
iterating all threads(BPF_TASK_ITERATE_ALL) and threads of a
specific task (BPF_TASK_ITERATE_THREAD).(Andrii)
* Move bpf_iter_task's definition from uapi/linux/bpf.h to
kernel/bpf/task_iter.c and we can use it from vmlinux.h
* Move bpf_iter_task_XXX's declaration from bpf_helpers.h to
bpf_experimental.h
Patch 4 (Introduce css open-coded iterator kfuncs)
* Change th API design keep consistent with cgroup_iters, reuse
BPF_CGROUP_ITER_DESCENDANTS_PRE/BPF_CGROUP_ITER_DESCENDANTS_POST
/BPF_CGROUP_ITER_ANCESTORS_UP(Andrii)
* Add KF_TRUSTED_ARGS for bpf_iter_css_new
* Move bpf_iter_css's definition from uapi/linux/bpf.h to
kernel/bpf/task_iter.c and we can use it from vmlinux.h
* Move bpf_iter_css_XXX's declaration from bpf_helpers.h to
bpf_experimental.h
Patch 5 (teach the verifier to enforce css_iter and task_iter in RCU CS)
* Add KF flag KF_RCU_PROTECTED to maintain kfuncs which need RCU CS.(Andrii)
* Consider STACK_ITER when using bpf_for_each_spilled_reg.
Patch 6 (Let bpf_iter_task_new accept null task ptr)
* Add this extra patch to let bpf_iter_task_new accept a 'nullable'
* task pointer(Andrii)
Patch 7 (selftests/bpf: Add tests for open-coded task and css iter)
* Add failure testcase(Alexei)
Changes from v1(https://lore.kernel.org/lkml/20230827072057.1591929-1-zhouchuyi@bytedance.com/):
- Add a pre-patch to make some preparations before supporting css_task
iters.(Alexei)
- Add an allowlist for css_task iters(Alexei)
- Let bpf progs do explicit bpf_rcu_read_lock() when using process
iters and css_descendant iters.(Alexei)
---------------------
In some BPF usage scenarios, it will be useful to iterate the process and
css directly in the BPF program. One of the expected scenarios is
customizable OOM victim selection via BPF[1].
Inspired by Dave's task_vma iter[2], this patchset adds three types of
open-coded iterator kfuncs:
1. bpf_task_iters. It can be used to
1) iterate all process in the system, like for_each_forcess() in kernel.
2) iterate all threads in the system.
3) iterate all threads of a specific task
2. bpf_css_iters. It works like css_task_iter_{start, next, end} and would
be used to iterating tasks/threads under a css.
3. css_iters. It works like css_next_descendant_{pre, post} to iterating all
descendant css.
BPF programs can use these kfuncs directly or through bpf_for_each macro.
link[1]: https://lore.kernel.org/lkml/20230810081319.65668-1-zhouchuyi@bytedance.com/
link[2]: https://lore.kernel.org/all/20230810183513.684836-1-davemarchevsky@fb.com/
====================
Link: https://lore.kernel.org/r/20231018061746.111364-1-zhouchuyi@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch adds 4 subtests to demonstrate these patterns and validating
correctness.
subtest1:
1) We use task_iter to iterate all process in the system and search for the
current process with a given pid.
2) We create some threads in current process context, and use
BPF_TASK_ITER_PROC_THREADS to iterate all threads of current process. As
expected, we would find all the threads of current process.
3) We create some threads and use BPF_TASK_ITER_ALL_THREADS to iterate all
threads in the system. As expected, we would find all the threads which was
created.
subtest2:
We create a cgroup and add the current task to the cgroup. In the
BPF program, we would use bpf_for_each(css_task, task, css) to iterate all
tasks under the cgroup. As expected, we would find the current process.
subtest3:
1) We create a cgroup tree. In the BPF program, we use
bpf_for_each(css, pos, root, XXX) to iterate all descendant under the root
with pre and post order. As expected, we would find all descendant and the
last iterating cgroup in post-order is root cgroup, the first iterating
cgroup in pre-order is root cgroup.
2) We wse BPF_CGROUP_ITER_ANCESTORS_UP to traverse the cgroup tree starting
from leaf and root separately, and record the height. The diff of the
hights would be the total tree-high - 1.
subtest4:
Add some failure testcase when using css_task, task and css iters, e.g,
unlock when using task-iters to iterate tasks.
Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Link: https://lore.kernel.org/r/20231018061746.111364-9-zhouchuyi@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The newly-added struct bpf_iter_task has a name collision with a selftest
for the seq_file task iter's bpf skel, so the selftests/bpf/progs file is
renamed in order to avoid the collision.
Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231018061746.111364-8-zhouchuyi@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
When using task_iter to iterate all threads of a specific task, we enforce
that the user must pass a valid task pointer to ensure safety. However,
when iterating all threads/process in the system, BPF verifier still
require a valid ptr instead of "nullable" pointer, even though it's
pointless, which is a kind of surprising from usability standpoint. It
would be nice if we could let that kfunc accept a explicit null pointer
when we are using BPF_TASK_ITER_ALL_{PROCS, THREADS} and a valid pointer
when using BPF_TASK_ITER_THREAD.
Given a trival kfunc:
__bpf_kfunc void FN(struct TYPE_A *obj);
BPF Prog would reject a nullptr for obj. The error info is:
"arg#x pointer type xx xx must point to scalar, or struct with scalar"
reported by get_kfunc_ptr_arg_type(). The reg->type is SCALAR_VALUE and
the btf type of ref_t is not scalar or scalar_struct which leads to the
rejection of get_kfunc_ptr_arg_type.
This patch add "__nullable" annotation:
__bpf_kfunc void FN(struct TYPE_A *obj__nullable);
Here __nullable indicates obj can be optional, user can pass a explicit
nullptr or a normal TYPE_A pointer. In get_kfunc_ptr_arg_type(), we will
detect whether the current arg is optional and register is null, If so,
return a new kfunc_ptr_arg_type KF_ARG_PTR_TO_NULL and skip to the next
arg in check_kfunc_args().
Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231018061746.111364-7-zhouchuyi@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
css_iter and task_iter should be used in rcu section. Specifically, in
sleepable progs explicit bpf_rcu_read_lock() is needed before use these
iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
use them directly.
This patch adds a new a KF flag KF_RCU_PROTECTED for bpf_iter_task_new and
bpf_iter_css_new. It means the kfunc should be used in RCU CS. We check
whether we are in rcu cs before we want to invoke this kfunc. If the rcu
protection is guaranteed, we would let st->type = PTR_TO_STACK | MEM_RCU.
Once user do rcu_unlock during the iteration, state MEM_RCU of regs would
be cleared. is_iter_reg_valid_init() will reject if reg->type is UNTRUSTED.
It is worth noting that currently, bpf_rcu_read_unlock does not
clear the state of the STACK_ITER reg, since bpf_for_each_spilled_reg
only considers STACK_SPILL. This patch also let bpf_for_each_spilled_reg
search STACK_ITER.
Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231018061746.111364-6-zhouchuyi@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This Patch adds kfuncs bpf_iter_css_{new,next,destroy} which allow
creation and manipulation of struct bpf_iter_css in open-coded iterator
style. These kfuncs actually wrapps css_next_descendant_{pre, post}.
css_iter can be used to:
1) iterating a sepcific cgroup tree with pre/post/up order
2) iterating cgroup_subsystem in BPF Prog, like
for_each_mem_cgroup_tree/cpuset_for_each_descendant_pre in kernel.
The API design is consistent with cgroup_iter. bpf_iter_css_new accepts
parameters defining iteration order and starting css. Here we also reuse
BPF_CGROUP_ITER_DESCENDANTS_PRE, BPF_CGROUP_ITER_DESCENDANTS_POST,
BPF_CGROUP_ITER_ANCESTORS_UP enums.
Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20231018061746.111364-5-zhouchuyi@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch adds kfuncs bpf_iter_task_{new,next,destroy} which allow
creation and manipulation of struct bpf_iter_task in open-coded iterator
style. BPF programs can use these kfuncs or through bpf_for_each macro to
iterate all processes in the system.
The API design keep consistent with SEC("iter/task"). bpf_iter_task_new()
accepts a specific task and iterating type which allows:
1. iterating all process in the system (BPF_TASK_ITER_ALL_PROCS)
2. iterating all threads in the system (BPF_TASK_ITER_ALL_THREADS)
3. iterating all threads of a specific task (BPF_TASK_ITER_PROC_THREADS)
Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Link: https://lore.kernel.org/r/20231018061746.111364-4-zhouchuyi@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch adds kfuncs bpf_iter_css_task_{new,next,destroy} which allow
creation and manipulation of struct bpf_iter_css_task in open-coded
iterator style. These kfuncs actually wrapps css_task_iter_{start,next,
end}. BPF programs can use these kfuncs through bpf_for_each macro for
iteration of all tasks under a css.
css_task_iter_*() would try to get the global spin-lock *css_set_lock*, so
the bpf side has to be careful in where it allows to use this iter.
Currently we only allow it in bpf_lsm and bpf iter-s.
Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20231018061746.111364-3-zhouchuyi@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch makes some preparations for using css_task_iter_*() in BPF
Program.
1. Flags CSS_TASK_ITER_* are #define-s and it's not easy for bpf prog to
use them. Convert them to enum so bpf prog can take them from vmlinux.h.
2. In the next patch we will add css_task_iter_*() in common kfuncs which
is not safe. Since css_task_iter_*() does spin_unlock_irq() which might
screw up irq flags depending on the context where bpf prog is running.
So we should use irqsave/irqrestore here and the switching is harmless.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20231018061746.111364-2-zhouchuyi@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
In amdgpu_dma_buf_move_notify reserve fences for the page table updates
in amdgpu_vm_clear_freed and amdgpu_vm_handle_moved. This fixes a BUG_ON
in dma_resv_add_fence when using SDMA for page table updates.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
abo->tbo.resource may be NULL in amdgpu_vm_bo_update.
Fixes: 180253782038 ("drm/ttm: stop allocating dummy resources during BO creation")
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Jakub Kicinski says:
====================
tools: ynl-gen: support full range of min/max checks
YNL code gen currently supports only very simple range checks
within the range of s16. Add support for full range of u64 / s64
which is good to have, and will be even more important with uint / sint.
====================
Link: https://lore.kernel.org/r/20231018163917.2514503-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Support the use of symbolic names like s8-min or u32-max in checks
to make writing specs less painful.
Link: https://lore.kernel.org/r/20231018163917.2514503-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Extend the support to full range of min/max checks.
None of the existing YNL families required complex integer validation.
The support is less than trivial, because we try to keep struct nla_policy
tiny the min/max members it holds in place are s16. Meaning we can only
express checks in range of s16. For larger ranges we need to define
a structure and link it in the policy.
Link: https://lore.kernel.org/r/20231018163917.2514503-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For range validation we'll need to know if any individual
attribute is used on input (i.e. whether we will generate
a policy for it). Track this information.
Link: https://lore.kernel.org/r/20231018163917.2514503-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The ida_alloc_max() function can return up to INT_MAX so this buffer is
not large enough. Also use snprintf() for extra safety.
Fixes: 403376ddb422 ("ptp: add debugfs interface to see applied channel masks")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/d4b1a995-a0cb-4125-aa1d-5fd5044aba1d@moroto.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR.
net/mac80211/key.c
02e0e426a2fb ("wifi: mac80211: fix error path key leak")
2a8b665e6bcc ("wifi: mac80211: remove key_mtx")
7d6904bf26b9 ("Merge wireless into wireless-next")
https://lore.kernel.org/all/20231012113648.46eea5ec@canb.auug.org.au/
Adjacent changes:
drivers/net/ethernet/ti/Kconfig
a602ee3176a8 ("net: ethernet: ti: Fix mixed module-builtin object")
98bdeae9502b ("net: cpmac: remove driver to prepare for platform removal")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, netfilter, WiFi.
Feels like an up-tick in regression fixes, mostly for older releases.
The hfsc fix, tcp_disconnect() and Intel WWAN fixes stand out as
fairly clear-cut user reported regressions. The mlx5 DMA bug was
causing strife for 390x folks. The fixes themselves are not
particularly scary, tho. No open investigations / outstanding reports
at the time of writing.
Current release - regressions:
- eth: mlx5: perform DMA operations in the right locations, make
devices usable on s390x, again
- sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner
curve, previous fix of rejecting invalid config broke some scripts
- rfkill: reduce data->mtx scope in rfkill_fop_open, avoid deadlock
- revert "ethtool: Fix mod state of verbose no_mask bitset", needs
more work
Current release - new code bugs:
- tcp: fix listen() warning with v4-mapped-v6 address
Previous releases - regressions:
- tcp: allow tcp_disconnect() again when threads are waiting, it was
denied to plug a constant source of bugs but turns out .NET depends
on it
- eth: mlx5: fix double-free if buffer refill fails under OOM
- revert "net: wwan: iosm: enable runtime pm support for 7560", it's
causing regressions and the WWAN team at Intel disappeared
- tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a
single skb, fix single-stream perf regression on some devices
Previous releases - always broken:
- Bluetooth:
- fix issues in legacy BR/EDR PIN code pairing
- correctly bounds check and pad HCI_MON_NEW_INDEX name
- netfilter:
- more fixes / follow ups for the large "commit protocol" rework,
which went in as a fix to 6.5
- fix null-derefs on netlink attrs which user may not pass in
- tcp: fix excessive TLP and RACK timeouts from HZ rounding (bless
Debian for keeping HZ=250 alive)
- net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation, prevent
letting frankenstein UDP super-frames from getting into the stack
- net: fix interface altnames when ifc moves to a new namespace
- eth: qed: fix the size of the RX buffers
- mptcp: avoid sending RST when closing the initial subflow"
* tag 'net-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits)
Revert "ethtool: Fix mod state of verbose no_mask bitset"
selftests: mptcp: join: no RST when rm subflow/addr
mptcp: avoid sending RST when closing the initial subflow
mptcp: more conservative check for zero probes
tcp: check mptcp-level constraints for backlog coalescing
selftests: mptcp: join: correctly check for no RST
net: ti: icssg-prueth: Fix r30 CMDs bitmasks
selftests: net: add very basic test for netdev names and namespaces
net: move altnames together with the netdevice
net: avoid UAF on deleted altname
net: check for altname conflicts when changing netdev's netns
net: fix ifname in netlink ntf during netns move
net: ethernet: ti: Fix mixed module-builtin object
net: phy: bcm7xxx: Add missing 16nm EPHY statistics
ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddr
tcp_bpf: properly release resources on error paths
net/sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve
net: mdio-mux: fix C45 access returning -EIO after API change
tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skb
octeon_ep: update BQL sent bytes before ringing doorbell
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai ChenL
"Fix 4-level pagetable building, disable WUC for pgprot_writecombine()
like ioremap_wc(), use correct annotation for exception handlers, and
a trivial cleanup"
* tag 'loongarch-fixes-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: Disable WUC for pgprot_writecombine() like ioremap_wc()
LoongArch: Replace kmap_atomic() with kmap_local_page() in copy_user_highpage()
LoongArch: Export symbol invalid_pud_table for modules building
LoongArch: Use SYM_CODE_* to annotate exception handlers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fix from Vlastimil Babka:
- stable fix to prevent kernel warnings with KASAN_HW_TAGS on arm64
due to improperly resolved kmalloc alignment restrictions (Catalin
Marinas)
* tag 'slab-fixes-for-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm: slab: Do not create kmalloc caches smaller than arch_slab_minalign()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp fix from Kees Cook:
- Fix seccomp_unotify perf benchmark for 32-bit (Jiri Slaby)
* tag 'seccomp-v6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
perf/benchmark: fix seccomp_unotify benchmark for 32-bit
|
|
Starting with commit 5d8edfb900d5 ("iomap: Copy larger chunks from
userspace"), iomap_write_iter() can get into endless loop. This can
be reproduced with LTP writev07 which uses partially valid iovecs:
struct iovec wr_iovec[] = {
{ buffer, 64 },
{ bad_addr, 64 },
{ buffer + 64, 64 },
{ buffer + 64 * 2, 64 },
};
commit bc1bb416bbb9 ("generic_perform_write()/iomap_write_actor():
saner logics for short copy") previously introduced the logic, which
made short copy retry in next iteration with amount of "bytes" it
managed to copy:
if (unlikely(status == 0)) {
/*
* A short copy made iomap_write_end() reject the
* thing entirely. Might be memory poisoning
* halfway through, might be a race with munmap,
* might be severe memory pressure.
*/
if (copied)
bytes = copied;
However, since 5d8edfb900d5 "bytes" is no longer carried into next
iteration, because it is now always initialized at the beginning of
the loop. And for iov_iter_count < PAGE_SIZE, "bytes" ends up with
same value as previous iteration, making the loop retry same copy
over and over, which leads to writev07 testcase hanging.
Make next iteration retry with amount of bytes we managed to copy.
Fixes: 5d8edfb900d5 ("iomap: Copy larger chunks from userspace")
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fix from Christian Brauner:
"An openat() call from io_uring triggering an audit call can apparently
cause the refcount of struct filename to be incremented from multiple
threads concurrently during async execution, triggering a refcount
underflow and hitting a BUG_ON(). That bug has been lurking around
since at least v5.16 apparently.
Switch to an atomic counter to fix that. The underflow check is
downgraded from a BUG_ON() to a WARN_ON_ONCE() but we could easily
remove that check altogether tbh"
* tag 'v6.6-rc7.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
audit,io_uring: io_uring openat triggers audit reference count underflow
|
|
This reverts commit 108a36d07c01edbc5942d27c92494d1c6e4d45a0.
It was reported that this fix breaks the possibility to remove existing WoL
flags. For example:
~$ ethtool lan2
...
Supports Wake-on: pg
Wake-on: d
...
~$ ethtool -s lan2 wol gp
~$ ethtool lan2
...
Wake-on: pg
...
~$ ethtool -s lan2 wol d
~$ ethtool lan2
...
Wake-on: pg
...
This worked correctly before this commit because we were always updating
a zero bitmap (since commit 6699170376ab ("ethtool: fix application of
verbose no_mask bitset"), that is) so that the rest was left zero
naturally. But now the 1->0 change (old_val is true, bit not present in
netlink nest) no longer works.
Reported-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reported-by: Michal Kubecek <mkubecek@suse.cz>
Closes: https://lore.kernel.org/netdev/20231019095140.l6fffnszraeb6iiw@lion.mk-sys.cz/
Cc: stable@vger.kernel.org
Fixes: 108a36d07c01 ("ethtool: Fix mod state of verbose no_mask bitset")
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Link: https://lore.kernel.org/r/20231019-feature_ptp_bitset_fix-v1-1-70f3c429a221@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://github.com/Paragon-Software-Group/linux-ntfs3
Pull ntfs3 fixes from Konstantin Komarov:
- memory leak
- some logic errors, NULL dereferences
- some code was refactored
- more sanity checks
* tag 'ntfs3_for_6.6' of https://github.com/Paragon-Software-Group/linux-ntfs3:
fs/ntfs3: Avoid possible memory leak
fs/ntfs3: Fix directory element type detection
fs/ntfs3: Fix possible null-pointer dereference in hdr_find_e()
fs/ntfs3: Fix OOB read in ntfs_init_from_boot
fs/ntfs3: fix panic about slab-out-of-bounds caused by ntfs_list_ea()
fs/ntfs3: Fix NULL pointer dereference on error in attr_allocate_frame()
fs/ntfs3: Fix possible NULL-ptr-deref in ni_readpage_cmpr()
fs/ntfs3: Do not allow to change label if volume is read-only
fs/ntfs3: Add more info into /proc/fs/ntfs3/<dev>/volinfo
fs/ntfs3: Refactoring and comments
fs/ntfs3: Fix alternative boot searching
fs/ntfs3: Allow repeated call to ntfs3_put_sbi
fs/ntfs3: Use inode_set_ctime_to_ts instead of inode_set_ctime
fs/ntfs3: Fix shift-out-of-bounds in ntfs_fill_super
fs/ntfs3: fix deadlock in mark_as_free_ex
fs/ntfs3: Add more attributes checks in mi_enum_attr()
fs/ntfs3: Use kvmalloc instead of kmalloc(... __GFP_NOWARN)
fs/ntfs3: Write immediately updated ntfs state
fs/ntfs3: Add ckeck in ni_update_parent()
|
|
Mat Martineau says:
====================
mptcp: Fixes for v6.6
Patch 1 corrects the logic for MP_JOIN tests where 0 RSTs are expected.
Patch 2 ensures MPTCP packets are not incorrectly coalesced in the TCP
backlog queue.
Patch 3 avoids a zero-window probe and associated WARN_ON_ONCE() in an
expected MPTCP reinjection scenario.
Patches 4 & 5 allow an initial MPTCP subflow to be closed cleanly
instead of always sending RST. Associated selftest is updated.
====================
Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-0-17ecb002e41d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Recently, we noticed that some RST were wrongly generated when removing
the initial subflow.
This patch makes sure RST are not sent when removing any subflows or any
addresses.
Fixes: c2b2ae3925b6 ("mptcp: handle correctly disconnect() failures")
Cc: stable@vger.kernel.org
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-5-17ecb002e41d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When closing the first subflow, the MPTCP protocol unconditionally
calls tcp_disconnect(), which in turn generates a reset if the subflow
is established.
That is unexpected and different from what MPTCP does with MPJ
subflows, where resets are generated only on FASTCLOSE and other edge
scenarios.
We can't reuse for the first subflow the same code in place for MPJ
subflows, as MPTCP clean them up completely via a tcp_close() call,
while must keep the first subflow socket alive for later re-usage, due
to implementation constraints.
This patch adds a new helper __mptcp_subflow_disconnect() that
encapsulates, a logic similar to tcp_close, issuing a reset only when
the MPTCP_CF_FASTCLOSE flag is set, and performing a clean shutdown
otherwise.
Fixes: c2b2ae3925b6 ("mptcp: handle correctly disconnect() failures")
Cc: stable@vger.kernel.org
Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-4-17ecb002e41d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Christoph reported that the MPTCP protocol can find the subflow-level
write queue unexpectedly not empty while crafting a zero-window probe,
hitting a warning:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 188 at net/mptcp/protocol.c:1312 mptcp_sendmsg_frag+0xc06/0xe70
Modules linked in:
CPU: 0 PID: 188 Comm: kworker/0:2 Not tainted 6.6.0-rc2-g1176aa719d7a #47
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Workqueue: events mptcp_worker
RIP: 0010:mptcp_sendmsg_frag+0xc06/0xe70 net/mptcp/protocol.c:1312
RAX: 47d0530de347ff6a RBX: 47d0530de347ff6b RCX: ffff8881015d3c00
RDX: ffff8881015d3c00 RSI: 47d0530de347ff6b RDI: 47d0530de347ff6b
RBP: 47d0530de347ff6b R08: ffffffff8243c6a8 R09: ffffffff82042d9c
R10: 0000000000000002 R11: ffffffff82056850 R12: ffff88812a13d580
R13: 0000000000000001 R14: ffff88812b375e50 R15: ffff88812bbf3200
FS: 0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000695118 CR3: 0000000115dfc001 CR4: 0000000000170ef0
Call Trace:
<TASK>
__subflow_push_pending+0xa4/0x420 net/mptcp/protocol.c:1545
__mptcp_push_pending+0x128/0x3b0 net/mptcp/protocol.c:1614
mptcp_release_cb+0x218/0x5b0 net/mptcp/protocol.c:3391
release_sock+0xf6/0x100 net/core/sock.c:3521
mptcp_worker+0x6e8/0x8f0 net/mptcp/protocol.c:2746
process_scheduled_works+0x341/0x690 kernel/workqueue.c:2630
worker_thread+0x3a7/0x610 kernel/workqueue.c:2784
kthread+0x143/0x180 kernel/kthread.c:388
ret_from_fork+0x4d/0x60 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:304
</TASK>
The root cause of the issue is that expectations are wrong: e.g. due
to MPTCP-level re-injection we can hit the critical condition.
Explicitly avoid the zero-window probe when the subflow write queue
is not empty and drop the related warnings.
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/444
Fixes: f70cad1085d1 ("mptcp: stop relying on tcp_tx_skb_cache")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-3-17ecb002e41d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The MPTCP protocol can acquire the subflow-level socket lock and
cause the tcp backlog usage. When inserting new skbs into the
backlog, the stack will try to coalesce them.
Currently, we have no check in place to ensure that such coalescing
will respect the MPTCP-level DSS, and that may cause data stream
corruption, as reported by Christoph.
Address the issue by adding the relevant admission check for coalescing
in tcp_add_backlog().
Note the issue is not easy to reproduce, as the MPTCP protocol tries
hard to avoid acquiring the subflow-level socket lock.
Fixes: 648ef4b88673 ("mptcp: Implement MPTCP receive path")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/420
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-2-17ecb002e41d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The commit mentioned below was more tolerant with the number of RST seen
during a test because in some uncontrollable situations, multiple RST
can be generated.
But it was not taking into account the case where no RST are expected:
this validation was then no longer reporting issues for the 0 RST case
because it is not possible to have less than 0 RST in the counter. This
patch fixes the issue by adding a specific condition.
Fixes: 6bf41020b72b ("selftests: mptcp: update and extend fastclose test-cases")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-1-17ecb002e41d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The bitmasks for EMAC_PORT_DISABLE and EMAC_PORT_FORWARD r30 commands are
wrong in the driver.
Update the bitmasks of these commands to the correct ones as used by the
ICSSG firmware. These bitmasks are backwards compatible and work with
any ICSSG firmware version.
Fixes: e9b4ece7d74b ("net: ti: icssg-prueth: Add Firmware config and classification APIs.")
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20231018150715.3085380-1-danishanwar@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Align devlink info versions with ice driver so change 'fw.mgmt'
version to be 2-digit version [major.minor], add 'fw.mgmt.build'
that reports mgmt firmware build number and use '"fw.psid.api'
for NVM format version instead of incorrect '"fw.psid'.
Additionally add missing i40e devlink documentation.
Fixes: 5a423552e0d9 ("i40e: Add handler for devlink .info_get")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231018123558.552453-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rob Herring says:
====================
dt-bindings: net: Child node schema cleanups
This is a series of clean-ups related to ensuring that child node
schemas are constrained to not allow undefined properties. Typically,
that means just adding additionalProperties or unevaluatedProperties as
appropriate. The DSA/switch schemas turned out to be a bit more
involved, so there's some more fixes and a bit of restructuring in them.
====================
Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-0-a525a090b444@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Constraints on 'ethernet-ports' node properties are already defined by the
reference to ethernet-switch.yaml, so they can be dropped from the DSA
schema.
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-8-a525a090b444@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The mscc,vsc7514-switch schema doesn't add any custom port properties,
so it can just reference ethernet-switch.yaml#/$defs/base and
dsa.yaml#/$defs/ethernet-ports instead of the base file and can skip
defining port nodes.
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-7-a525a090b444@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The indentation for the example is completely messed up for
'ethernet-ports'. Fix it.
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-6-a525a090b444@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The name "base" is misleading as the definition is for a complete schema
definition without additional properties allowed, not a "base class".
Align the same to be the same as dsa.yaml. This schema file without any
json pointer path is the base schema which can be extended.
There are not yet any references to $defs/base to update.
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-5-a525a090b444@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The '$defs/ethernet-ports' schema is referenced by schemas defining a
child node 'ethernet-ports', but this schema misses the
'ethernet-ports' node. It would work if referring schemas made a
reference like this:
properties:
ethernet-ports:
$ref: ethernet-switch.yaml#/$defs/ethernet-ports
However, that would be different from how dsa.yaml works. For
consistency, align the schema definition with dsa.yaml and add the
missing level.
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-4-a525a090b444@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
'ethernet-port' node unit-addresses should be in hexadecimal. Some
instances have it correct, but fix the ones that don't.
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-3-a525a090b444@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
What's connected on the MDIO bus is outside the scope of the binding for
ethernet controller's MDIO bus unless it's a fixed internal device, so
drop the node name and reference to ethernet-phy.yaml.
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-2-a525a090b444@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
node schemas
Just as unevaluatedProperties or additionalProperties are required at
the top level of schemas, they should (and will) also be required for
child node schemas. That ensures only documented properties are
present for any node.
Add unevaluatedProperties or additionalProperties as appropriate.
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-1-a525a090b444@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"Fix a bug in chunk size decision that could lead to suboptimal
placement and filling patterns"
* tag 'for-6.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix stripe length calculation for non-zoned data chunk allocation
|
|
The static files placement by `rustdoc` changed in Rust 1.67.0 [1],
but the custom code we have to replace the logo in the generated
HTML files did not get updated.
Thus update it to have the Linux logo again in the output.
Hopefully `rustdoc` will eventually support a custom logo from
a local file [2], so that we do not need to maintain this hack
on our side.
Link: https://github.com/rust-lang/rust/pull/101702 [1]
Link: https://github.com/rust-lang/rfcs/pull/3226 [2]
Fixes: 3ed03f4da06e ("rust: upgrade to Rust 1.68.2")
Cc: stable@vger.kernel.org
Tested-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Andreas Hindborg <a.hindborg@samsung.com>
Link: https://lore.kernel.org/r/20231018155527.1015059-1-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
The Rust code documentation output path moved from `rust/doc` to
`Documentation/output/rust/rustdoc`. The `make cleandocs` target
takes care of cleaning it now since it is integrated with the rest
of the documentation.
Thus remove the old reference.
Fixes: 48fadf440075 ("docs: Move rustdoc output, cross-reference it")
Reviewed-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Andreas Hindborg <a.hindborg@samsung.com>
Link: https://lore.kernel.org/r/20231018160145.1017340-2-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
The Rust code documentation output path moved from `rust/doc` to
`Documentation/output/rust/rustdoc`, thus update the old reference.
Fixes: 48fadf440075 ("docs: Move rustdoc output, cross-reference it")
Reviewed-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Andreas Hindborg <a.hindborg@samsung.com>
Link: https://lore.kernel.org/r/20231018160145.1017340-1-ojeda@kernel.org
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
Since the fixed commits both zdev->iommu_bitmap and zdev->lazy_bitmap
are allocated as vzalloc(zdev->iommu_pages / 8). The problem is that
zdev->iommu_bitmap is a pointer to unsigned long but the above only
yields an allocation that is a multiple of sizeof(unsigned long) which
is 8 on s390x if the number of IOMMU pages is a multiple of 64.
This in turn is the case only if the effective IOMMU aperture is
a multiple of 64 * 4K = 256K. This is usually the case and so didn't
cause visible issues since both the virt_to_phys(high_memory) reduced
limit and hardware limits use nice numbers.
Under KVM, and in particular with QEMU limiting the IOMMU aperture to
the vfio DMA limit (default 65535), it is possible for the reported
aperture not to be a multiple of 256K however. In this case we end up
with an iommu_bitmap whose allocation is not a multiple of
8 causing bitmap operations to access it out of bounds.
Sadly we can't just fix this in the obvious way and use bitmap_zalloc()
because for large RAM systems (tested on 8 TiB) the zdev->iommu_bitmap
grows too large for kmalloc(). So add our own bitmap_vzalloc() wrapper.
This might be a candidate for common code, but this area of code will
be replaced by the upcoming conversion to use the common code DMA API on
s390 so just add a local routine.
Fixes: 224593215525 ("s390/pci: use virtual memory for iommu bitmap")
Fixes: 13954fd6913a ("s390/pci_dma: improve lazy flush for unmap")
Cc: stable@vger.kernel.org
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
When dumping a struct_ops, 2 dictionaries are emitted.
When using `name`, they were already wrapped in an array, but not when
using `id`. Causing `jq` to fail at parsing the payload as it reached
the comma following the first dict.
This change wraps those dictionaries in an array so valid json is emitted.
Before, jq fails to parse the output:
```
$ sudo bpftool struct_ops dump id 1523612 | jq . > /dev/null
parse error: Expected value before ',' at line 19, column 2
```
After, no error parsing the output:
```
sudo ./bpftool struct_ops dump id 1523612 | jq . > /dev/null
```
Signed-off-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20231018230133.1593152-3-chantr4@gmail.com
|
|
When printing a pointer value, "%p" will either print the hexadecimal
value of the pointer (e.g `0x1234`), or `(nil)` when NULL.
Both of those are invalid json "integer" values and need to be wrapped
in quotes.
Before:
```
$ sudo bpftool struct_ops dump name ned_dummy_cca | grep next
"next": (nil),
$ sudo bpftool struct_ops dump name ned_dummy_cca | \
jq '.[1].bpf_struct_ops_tcp_congestion_ops.data.list.next'
parse error: Invalid numeric literal at line 29, column 34
```
After:
```
$ sudo ./bpftool struct_ops dump name ned_dummy_cca | grep next
"next": "(nil)",
$ sudo ./bpftool struct_ops dump name ned_dummy_cca | \
jq '.[1].bpf_struct_ops_tcp_congestion_ops.data.list.next'
"(nil)"
```
Signed-off-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20231018230133.1593152-2-chantr4@gmail.com
|
|
Commit a95aef69a740 ("fanotify: support reporting non-decodeable file
handles") merged in v6.5-rc1, added the ability to use an fanotify group
with FAN_REPORT_FID mode to watch filesystems that do not support nfs
export, but do know how to encode non-decodeable file handles, with the
newly introduced AT_HANDLE_FID flag.
At the time that this commit was merged, there were no filesystems
in-tree with those traits.
Commit 16aac5ad1fa9 ("ovl: support encoding non-decodable file handles"),
merged in v6.6-rc1, added this trait to overlayfs, thus allowing fanotify
watching of overlayfs with FAN_REPORT_FID mode.
In retrospect, allowing an fanotify filesystem/mount mark on such
filesystem in FAN_REPORT_FID mode will result in getting events with
file handles, without the ability to resolve the filesystem objects from
those file handles (i.e. no open_by_handle_at() support).
For v6.6, the safer option would be to allow this mode for inode marks
only, where the caller has the opportunity to use name_to_handle_at() at
the time of setting the mark. In the future we can revise this decision.
Fixes: a95aef69a740 ("fanotify: support reporting non-decodeable file handles")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20231018100000.2453965-2-amir73il@gmail.com>
|