Age | Commit message (Collapse) | Author |
|
With modern NIC drivers shifting to full page allocations per
received frame, we face the following issue:
TCP has one per-netns sysctl used to tweak how to translate
a memory use into an expected payload (RWIN), in RX path.
tcp_win_from_space() implementation is limited to few cases.
For hosts dealing with various MSS, we either under estimate
or over estimate the RWIN we send to the remote peers.
For instance with the default sysctl_tcp_adv_win_scale value,
we expect to store 50% of payload per allocated chunk of memory.
For the typical use of MTU=1500 traffic, and order-0 pages allocations
by NIC drivers, we are sending too big RWIN, leading to potential
tcp collapse operations, which are extremely expensive and source
of latency spikes.
This patch makes sysctl_tcp_adv_win_scale obsolete, and instead
uses a per socket scaling factor, so that we can precisely
adjust the RWIN based on effective skb->len/skb->truesize ratio.
This patch alone can double TCP receive performance when receivers
are too slow to drain their receive queue, or by allowing
a bigger RWIN when MSS is close to PAGE_SIZE.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Link: https://lore.kernel.org/r/20230717152917.751987-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2023-07-17
The 1st patch is by Ziyang Xuan and fixes a possible memory leak in
the receiver handling in the CAN RAW protocol.
YueHaibing contributes a use after free in bcm_proc_show() of the
Broad Cast Manager (BCM) CAN protocol.
The next 2 patches are by me and fix a possible null pointer
dereference in the RX path of the gs_usb driver with activated
hardware timestamps and the candlelight firmware.
The last patch is by Fedor Ross, Marek Vasut and me and targets the
mcp251xfd driver. The polling timeout of __mcp251xfd_chip_set_mode()
is increased to fix bus joining on busy CAN buses and very low bit
rate.
* tag 'linux-can-fixes-for-6.5-20230717' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: mcp251xfd: __mcp251xfd_chip_set_mode(): increase poll timeout
can: gs_usb: fix time stamp counter initialization
can: gs_usb: gs_can_open(): improve error handling
can: bcm: Fix UAF in bcm_proc_show()
can: raw: fix receiver memory leak
====================
Link: https://lore.kernel.org/r/20230717180938.230816-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The code using btf_vmlinux in bpf_tcp_ca is removed by the
commit 9f0265e921de ("bpf: Require only one of cong_avoid() and cong_control() from a TCP CC")
so drop this useless btf_vmlinux declaration.
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Link: https://lore.kernel.org/r/4d38da4eadaba476bd92ffcd7a5a03a5e28745c0.1689582557.git.geliang.tang@suse.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Make rtnl_fill_vf() cancel the vfinfo attribute on error instead of the
inner rtnl_fill_vfinfo(), as it is the function that starts it.
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://lore.kernel.org/r/20230716072440.2372567-1-gal@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
All drivers are now updated for the March 2020 changes, and no longer
make use of the mac_pcs_get_state() or mac_an_restart() operations,
which are now NULL across all DSA drivers. All DSA drivers don't look
at speed, duplex, pause or advertisement in their phylink_mac_config()
method either.
Remove support for these operations from DSA, and stop marking DSA as
a legacy driver by default.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The destination port value in the IPCR control buffer on older
targets is 0xFFFF. Handle the same by updating the dst_port to
QRTR_PORT_CTRL.
Signed-off-by: Vignesh Viswanathan <quic_viswanat@quicinc.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is a use after free scenario while iterating through the nodes
radix tree despite the ns being a single threaded process. This can
happen when the radix tree APIs are not synchronized with the
rcu_read_lock() APIs.
Convert the radix tree for nodes to xarray to take advantage of the
built in rcu lock usage provided by xarray.
Signed-off-by: Chris Lew <quic_clew@quicinc.com>
Signed-off-by: Vignesh Viswanathan <quic_viswanat@quicinc.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is a use after free scenario while iterating through the servers
radix tree despite the ns being a single threaded process. This can
happen when the radix tree APIs are not synchronized with the
rcu_read_lock() APIs.
Convert the radix tree for servers to xarray to take advantage of the
built in rcu lock usage provided by xarray.
Signed-off-by: Chris Lew <quic_clew@quicinc.com>
Signed-off-by: Vignesh Viswanathan <quic_viswanat@quicinc.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
BUG: KASAN: slab-use-after-free in bcm_proc_show+0x969/0xa80
Read of size 8 at addr ffff888155846230 by task cat/7862
CPU: 1 PID: 7862 Comm: cat Not tainted 6.5.0-rc1-00153-gc8746099c197 #230
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0xd5/0x150
print_report+0xc1/0x5e0
kasan_report+0xba/0xf0
bcm_proc_show+0x969/0xa80
seq_read_iter+0x4f6/0x1260
seq_read+0x165/0x210
proc_reg_read+0x227/0x300
vfs_read+0x1d5/0x8d0
ksys_read+0x11e/0x240
do_syscall_64+0x35/0xb0
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Allocated by task 7846:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
__kasan_kmalloc+0x9e/0xa0
bcm_sendmsg+0x264b/0x44e0
sock_sendmsg+0xda/0x180
____sys_sendmsg+0x735/0x920
___sys_sendmsg+0x11d/0x1b0
__sys_sendmsg+0xfa/0x1d0
do_syscall_64+0x35/0xb0
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 7846:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
kasan_save_free_info+0x27/0x40
____kasan_slab_free+0x161/0x1c0
slab_free_freelist_hook+0x119/0x220
__kmem_cache_free+0xb4/0x2e0
rcu_core+0x809/0x1bd0
bcm_op is freed before procfs entry be removed in bcm_release(),
this lead to bcm_proc_show() may read the freed bcm_op.
Fixes: ffd980f976e7 ("[CAN]: Add broadcast manager (bcm) protocol")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/all/20230715092543.15548-1-yuehaibing@huawei.com
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Got kmemleak errors with the following ltp can_filter testcase:
for ((i=1; i<=100; i++))
do
./can_filter &
sleep 0.1
done
==============================================================
[<00000000db4a4943>] can_rx_register+0x147/0x360 [can]
[<00000000a289549d>] raw_setsockopt+0x5ef/0x853 [can_raw]
[<000000006d3d9ebd>] __sys_setsockopt+0x173/0x2c0
[<00000000407dbfec>] __x64_sys_setsockopt+0x61/0x70
[<00000000fd468496>] do_syscall_64+0x33/0x40
[<00000000b7e47d51>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
It's a bug in the concurrent scenario of unregister_netdevice_many()
and raw_release() as following:
cpu0 cpu1
unregister_netdevice_many(can_dev)
unlist_netdevice(can_dev) // dev_get_by_index() return NULL after this
net_set_todo(can_dev)
raw_release(can_socket)
dev = dev_get_by_index(, ro->ifindex); // dev == NULL
if (dev) { // receivers in dev_rcv_lists not free because dev is NULL
raw_disable_allfilters(, dev, );
dev_put(dev);
}
...
ro->bound = 0;
...
call_netdevice_notifiers(NETDEV_UNREGISTER, )
raw_notify(, NETDEV_UNREGISTER, )
if (ro->bound) // invalid because ro->bound has been set 0
raw_disable_allfilters(, dev, ); // receivers in dev_rcv_lists will never be freed
Add a net_device pointer member in struct raw_sock to record bound
can_dev, and use rtnl_lock to serialize raw_socket members between
raw_bind(), raw_release(), raw_setsockopt() and raw_notify(). Use
ro->dev to decide whether to free receivers in dev_rcv_lists.
Fixes: 8d0caedb7596 ("can: bcm/raw/isotp: use per module netdevice notifier")
Reviewed-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Link: https://lore.kernel.org/all/20230711011737.1969582-1-william.xuanziyang@huawei.com
Cc: stable@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
If TCA_FLOWER_CLASSID is specified in the netlink message, the code will
call tcf_bind_filter. However, if any error occurs after that, the code
should undo this by calling tcf_unbind_filter.
Fixes: 77b9900ef53a ("tc: introduce Flower classifier")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If cls_bpf_offload errors out, we must also undo tcf_bind_filter that
was done before the error.
Fix that by calling tcf_unbind_filter in errout_parms.
Fixes: eadb41489fd2 ("net: cls_bpf: add support for marking filters as hardware-only")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the case of an update, when TCA_U32_LINK is set, u32_set_parms will
decrement the refcount of the ht_down (struct tc_u_hnode) pointer
present in the older u32 filter which we are replacing. However, if
u32_replace_hw_knode errors out, the update command fails and that
ht_down pointer continues decremented. To fix that, when
u32_replace_hw_knode fails, check if ht_down's refcount was decremented
and undo the decrement.
Fixes: d34e3e181395 ("net: cls_u32: Add support for skip-sw flag to tc u32 classifier.")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When u32_replace_hw_knode fails, we need to undo the tcf_bind_filter
operation done at u32_set_parms.
Fixes: d34e3e181395 ("net: cls_u32: Add support for skip-sw flag to tc u32 classifier.")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
mall_set_parms
In case an error occurred after mall_set_parms executed successfully, we
must undo the tcf_bind_filter call it issues.
Fix that by calling tcf_unbind_filter in err_replace_hw_filter label.
Fixes: ec2507d2a306 ("net/sched: cls_matchall: Fix error path")
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull ceph fix from Ilya Dryomov:
"A fix to prevent a potential buffer overrun in the messenger, marked
for stable"
* tag 'ceph-for-6.5-rc2' of https://github.com/ceph/ceph-client:
libceph: harden msgr2.1 frame segment length checks
|
|
Commit 1fd54773c267 ("udp: allow header check for dodgy GSO_UDP_L4
packets.") checks DODGY bit for UDP, but for packets that can be fed
directly to the device after gso_segs reset, it actually falls through
to fragmentation:
https://lore.kernel.org/all/CAJPywTKDdjtwkLVUW6LRA2FU912qcDmQOQGt2WaDo28KzYDg+A@mail.gmail.com/
This change restores the expected behavior of GSO_UDP_L4 packets.
Fixes: 1fd54773c267 ("udp: allow header check for dodgy GSO_UDP_L4 packets.")
Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The checks in question were introduced by:
commit 6b4db2e528f6 ("devlink: Fix use-after-free after a failed reload").
That fixed an issue of reload with mlxsw driver.
Back then, that was a valid fix, because there was a limitation
in place that prevented drivers from registering/unregistering params
when devlink instance was registered.
It was possible to do the fix differently by changing drivers to
register/unregister params in appropriate places making sure the ops
operate only on memory which is allocated and initialized. But that,
as a dependency, would require to remove the limitation mentioned above.
Eventually, this limitation was lifted by:
commit 1d18bb1a4ddd ("devlink: allow registering parameters after the instance")
Also, the alternative fix (which also fixed another issue) was done by:
commit 74cbc3c03c82 ("mlxsw: spectrum_acl_tcam: Move devlink param to TCAM code").
Therefore, the checks are no longer relevant. Each driver should make
sure to have the params registered only when the memory the ops
are working with is allocated and initialized.
So remove the checks.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When we create an L2 loop on a bridge in netns, we will see packets storm
even if STP is enabled.
# unshare -n
# ip link add br0 type bridge
# ip link add veth0 type veth peer name veth1
# ip link set veth0 master br0 up
# ip link set veth1 master br0 up
# ip link set br0 type bridge stp_state 1
# ip link set br0 up
# sleep 30
# ip -s link show br0
2: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether b6:61:98:1c:1c:b5 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped missed mcast
956553768 12861249 0 0 0 12861249 <-. Keep
TX: bytes packets errors dropped carrier collsns | increasing
1027834 11951 0 0 0 0 <-' rapidly
This is because llc_rcv() drops all packets in non-root netns and BPDU
is dropped.
Let's add extack warning when enabling STP in netns.
# unshare -n
# ip link add br0 type bridge
# ip link set br0 type bridge stp_state 1
Warning: bridge: STP does not work in non-root netns.
Note this commit will be reverted later when we namespacify the whole LLC
infra.
Fixes: e730c15519d0 ("[NET]: Make packet reception network namespace safe")
Suggested-by: Harry Coin <hcoin@quietfountain.com>
Link: https://lore.kernel.org/netdev/0f531295-e289-022d-5add-5ceffa0df9bc@quietfountain.com/
Suggested-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
icmpv6_flow_init(), ip6_datagram_flow_key_init() and ip6_mc_hdr() don't
need to modify their sk argument. Make that explicit using const.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:
====================
pull-request: bpf-next 2023-07-13
We've added 67 non-merge commits during the last 15 day(s) which contain
a total of 106 files changed, 4444 insertions(+), 619 deletions(-).
The main changes are:
1) Fix bpftool build in presence of stale vmlinux.h,
from Alexander Lobakin.
2) Introduce bpf_me_mcache_free_rcu() and fix OOM under stress,
from Alexei Starovoitov.
3) Teach verifier actual bounds of bpf_get_smp_processor_id()
and fix perf+libbpf issue related to custom section handling,
from Andrii Nakryiko.
4) Introduce bpf map element count, from Anton Protopopov.
5) Check skb ownership against full socket, from Kui-Feng Lee.
6) Support for up to 12 arguments in BPF trampoline, from Menglong Dong.
7) Export rcu_request_urgent_qs_task, from Paul E. McKenney.
8) Fix BTF walking of unions, from Yafang Shao.
9) Extend link_info for kprobe_multi and perf_event links,
from Yafang Shao.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (67 commits)
selftests/bpf: Add selftest for PTR_UNTRUSTED
bpf: Fix an error in verifying a field in a union
selftests/bpf: Add selftests for nested_trust
bpf: Fix an error around PTR_UNTRUSTED
selftests/bpf: add testcase for TRACING with 6+ arguments
bpf, x86: allow function arguments up to 12 for TRACING
bpf, x86: save/restore regs with BPF_DW size
bpftool: Use "fallthrough;" keyword instead of comments
bpf: Add object leak check.
bpf: Convert bpf_cpumask to bpf_mem_cache_free_rcu.
bpf: Introduce bpf_mem_free_rcu() similar to kfree_rcu().
selftests/bpf: Improve test coverage of bpf_mem_alloc.
rcu: Export rcu_request_urgent_qs_task()
bpf: Allow reuse from waiting_for_gp_ttrace list.
bpf: Add a hint to allocated objects.
bpf: Change bpf_mem_cache draining process.
bpf: Further refactor alloc_bulk().
bpf: Factor out inc/dec of active flag into helpers.
bpf: Refactor alloc_bulk().
bpf: Let free_all() return the number of freed elements.
...
====================
Link: https://lore.kernel.org/r/20230714020910.80794-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts or adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add fentry_many_args.c and fexit_many_args.c to test the fentry/fexit
with 7/11 arguments. As this feature is not supported by arm64 yet, we
disable these testcases for arm64 in DENYLIST.aarch64. We can combine
them with fentry_test.c/fexit_test.c when arm64 is supported too.
Correspondingly, add bpf_testmod_fentry_test7() and
bpf_testmod_fentry_test11() to bpf_testmod.c
Meanwhile, add bpf_modify_return_test2() to test_run.c to test the
MODIFY_RETURN with 7 arguments.
Add bpf_testmod_test_struct_arg_7/bpf_testmod_test_struct_arg_7 in
bpf_testmod.c to test the struct in the arguments.
And the testcases passed on x86_64:
./test_progs -t fexit
Summary: 5/14 PASSED, 0 SKIPPED, 0 FAILED
./test_progs -t fentry
Summary: 3/2 PASSED, 0 SKIPPED, 0 FAILED
./test_progs -t modify_return
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
./test_progs -t tracing_struct
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230713040738.1789742-4-imagedong@tencent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
ceph_frame_desc::fd_lens is an int array. decode_preamble() thus
effectively casts u32 -> int but the checks for segment lengths are
written as if on unsigned values. While reading in HELLO or one of the
AUTH frames (before authentication is completed), arithmetic in
head_onwire_len() can get duped by negative ctrl_len and produce
head_len which is less than CEPH_PREAMBLE_LEN but still positive.
This would lead to a buffer overrun in prepare_read_control() as the
preamble gets copied to the newly allocated buffer of size head_len.
Cc: stable@vger.kernel.org
Fixes: cd1a677cad99 ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)")
Reported-by: Thelford Williams <thelford@google.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
|
|
Lion says:
-------
In the QFQ scheduler a similar issue to CVE-2023-31436
persists.
Consider the following code in net/sched/sch_qfq.c:
static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
struct sk_buff **to_free)
{
unsigned int len = qdisc_pkt_len(skb), gso_segs;
// ...
if (unlikely(cl->agg->lmax < len)) {
pr_debug("qfq: increasing maxpkt from %u to %u for class %u",
cl->agg->lmax, len, cl->common.classid);
err = qfq_change_agg(sch, cl, cl->agg->class_weight, len);
if (err) {
cl->qstats.drops++;
return qdisc_drop(skb, sch, to_free);
}
// ...
}
Similarly to CVE-2023-31436, "lmax" is increased without any bounds
checks according to the packet length "len". Usually this would not
impose a problem because packet sizes are naturally limited.
This is however not the actual packet length, rather the
"qdisc_pkt_len(skb)" which might apply size transformations according to
"struct qdisc_size_table" as created by "qdisc_get_stab()" in
net/sched/sch_api.c if the TCA_STAB option was set when modifying the qdisc.
A user may choose virtually any size using such a table.
As a result the same issue as in CVE-2023-31436 can occur, allowing heap
out-of-bounds read / writes in the kmalloc-8192 cache.
-------
We can create the issue with the following commands:
tc qdisc add dev $DEV root handle 1: stab mtu 2048 tsize 512 mpu 0 \
overhead 999999999 linklayer ethernet qfq
tc class add dev $DEV parent 1: classid 1:1 htb rate 6mbit burst 15k
tc filter add dev $DEV parent 1: matchall classid 1:1
ping -I $DEV 1.1.1.2
This is caused by incorrectly assuming that qdisc_pkt_len() returns a
length within the QFQ_MIN_LMAX < len < QFQ_MAX_LMAX.
Fixes: 462dbc9101ac ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
Reported-by: Lion <nnamrec@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
25369891fcef deletes a check for the case where no 'lmax' is
specified which 3037933448f6 previously fixed as 'lmax'
could be set to the device's MTU without any bound checking
for QFQ_LMAX_MIN and QFQ_LMAX_MAX. Therefore, reintroduce the check.
Fixes: 25369891fcef ("net/sched: sch_qfq: refactor parsing of netlink parameters")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:
====================
pull-request: bpf 2023-07-12
We've added 5 non-merge commits during the last 7 day(s) which contain
a total of 7 files changed, 93 insertions(+), 28 deletions(-).
The main changes are:
1) Fix max stack depth check for async callbacks, from Kumar.
2) Fix inconsistent JIT image generation, from Björn.
3) Use trusted arguments in XDP hints kfuncs, from Larysa.
4) Fix memory leak in cpu_map_update_elem, from Pu.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
xdp: use trusted arguments in XDP hints kfuncs
bpf: cpumap: Fix memory leak in cpu_map_update_elem
riscv, bpf: Fix inconsistent JIT image generation
selftests/bpf: Add selftest for check_stack_max_depth bug
bpf: Fix max stack depth check for async callbacks
====================
Link: https://lore.kernel.org/r/20230712223045.40182-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix ethernet header length field after stripping the mesh header
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/CT5GNZSK28AI.2K6M69OXM9RW5@syracuse/
Fixes: 986e43b19ae9 ("wifi: mac80211: fix receiving A-MSDU frames on mesh interfaces")
Reported-and-tested-by: Nicolas Escande <nico.escande@gmail.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20230711115052.68430-1-nbd@nbd.name
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
RPL code has a pattern where skb_dst_drop() is called before
ip6_route_input().
However, ip6_route_input() calls skb_dst_drop() internally,
so we need not call skb_dst_drop() before ip6_route_input().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230710213511.5364-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Kubernetes[1] is going to stick with /proc/net/tcp for a while.
This commit reduces the scheduling latency introduced by
established_get_first(), similar to commit acffb584cda7 ("net: diag:
add a scheduling point in inet_diag_dump_icsk()").
In our environment, the scheduling latency affects the performance of
latency-sensitive services like Redis.
Changes in V2 :
- call cond_resched() before checking if a bucket is empty as
suggested by Eric Dumazet
- removed the delay of synchronize_net() from the commit message
[1] https://github.com/google/cadvisor/blob/v0.47.2/container/libcontainer/handler.go#L130
Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230711032405.3253025-1-wenjian1@xiaomi.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The kernel does not currently validate that both the minimum and maximum
ports of a port range are specified. This can lead user space to think
that a filter matching on a port range was successfully added, when in
fact it was not. For example, with a patched (buggy) iproute2 that only
sends the minimum port, the following commands do not return an error:
# tc filter add dev swp1 ingress pref 1 proto ip flower ip_proto udp src_port 100-200 action pass
# tc filter add dev swp1 ingress pref 1 proto ip flower ip_proto udp dst_port 100-200 action pass
# tc filter show dev swp1 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
eth_type ipv4
ip_proto udp
not_in_hw
action order 1: gact action pass
random type none pass val 0
index 1 ref 1 bind 1
filter protocol ip pref 1 flower chain 0 handle 0x2
eth_type ipv4
ip_proto udp
not_in_hw
action order 1: gact action pass
random type none pass val 0
index 2 ref 1 bind 1
Fix by returning an error unless both ports are specified:
# tc filter add dev swp1 ingress pref 1 proto ip flower ip_proto udp src_port 100-200 action pass
Error: Both min and max source ports must be specified.
We have an error talking to the kernel
# tc filter add dev swp1 ingress pref 1 proto ip flower ip_proto udp dst_port 100-200 action pass
Error: Both min and max destination ports must be specified.
We have an error talking to the kernel
Fixes: 5c72299fba9d ("net: sched: cls_flower: Classify packets using port ranges")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, verifier does not reject XDP programs that pass NULL pointer to
hints functions. At the same time, this case is not handled in any driver
implementation (including veth). For example, changing
bpf_xdp_metadata_rx_timestamp(ctx, ×tamp);
to
bpf_xdp_metadata_rx_timestamp(ctx, NULL);
in xdp_metadata test successfully crashes the system.
Add KF_TRUSTED_ARGS flag to hints kfunc definitions, so driver code
does not have to worry about getting invalid pointers.
Fixes: 3d76a4d3d4e5 ("bpf: XDP metadata RX kfuncs")
Reported-by: Stanislav Fomichev <sdf@google.com>
Closes: https://lore.kernel.org/bpf/ZKWo0BbpLfkZHbyE@google.com/
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230711105930.29170-1-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
We have for some time the __assign_bit() API to replace open coded
if (foo)
__set_bit(n, bar);
else
__clear_bit(n, bar);
Use this API in the code. No functional change intended.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Message-ID: <20230710100830.89936-2-andriy.shevchenko@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We have for some time the assign_bit() API to replace open coded
if (foo)
set_bit(n, bar);
else
clear_bit(n, bar);
Use this API in the code. No functional change intended.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Message-ID: <20230710100830.89936-1-andriy.shevchenko@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When ip_vti device is set to the qdisc of the sfb type, the cb field
of the sent skb may be modified during enqueuing. Then,
slab-use-after-free may occur when ip_vti device sends IPv6 packets.
As commit f855691975bb ("xfrm6: Fix the nexthdr offset in
_decode_session6.") showed, xfrm_decode_session was originally intended
only for the receive path. IP6CB(skb)->nhoff is not set during
transmission. Therefore, set the cb field in the skb to 0 before
sending packets.
Fixes: f855691975bb ("xfrm6: Fix the nexthdr offset in _decode_session6.")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
When ipv6_vti device is set to the qdisc of the sfb type, the cb field
of the sent skb may be modified during enqueuing. Then,
slab-use-after-free may occur when ipv6_vti device sends IPv6 packets.
The stack information is as follows:
BUG: KASAN: slab-use-after-free in decode_session6+0x103f/0x1890
Read of size 1 at addr ffff88802e08edc2 by task swapper/0/0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0-next-20230707-00001-g84e2cad7f979 #410
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl+0xd9/0x150
print_address_description.constprop.0+0x2c/0x3c0
kasan_report+0x11d/0x130
decode_session6+0x103f/0x1890
__xfrm_decode_session+0x54/0xb0
vti6_tnl_xmit+0x3e6/0x1ee0
dev_hard_start_xmit+0x187/0x700
sch_direct_xmit+0x1a3/0xc30
__qdisc_run+0x510/0x17a0
__dev_queue_xmit+0x2215/0x3b10
neigh_connected_output+0x3c2/0x550
ip6_finish_output2+0x55a/0x1550
ip6_finish_output+0x6b9/0x1270
ip6_output+0x1f1/0x540
ndisc_send_skb+0xa63/0x1890
ndisc_send_rs+0x132/0x6f0
addrconf_rs_timer+0x3f1/0x870
call_timer_fn+0x1a0/0x580
expire_timers+0x29b/0x4b0
run_timer_softirq+0x326/0x910
__do_softirq+0x1d4/0x905
irq_exit_rcu+0xb7/0x120
sysvec_apic_timer_interrupt+0x97/0xc0
</IRQ>
Allocated by task 9176:
kasan_save_stack+0x22/0x40
kasan_set_track+0x25/0x30
__kasan_slab_alloc+0x7f/0x90
kmem_cache_alloc_node+0x1cd/0x410
kmalloc_reserve+0x165/0x270
__alloc_skb+0x129/0x330
netlink_sendmsg+0x9b1/0xe30
sock_sendmsg+0xde/0x190
____sys_sendmsg+0x739/0x920
___sys_sendmsg+0x110/0x1b0
__sys_sendmsg+0xf7/0x1c0
do_syscall_64+0x39/0xb0
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 9176:
kasan_save_stack+0x22/0x40
kasan_set_track+0x25/0x30
kasan_save_free_info+0x2b/0x40
____kasan_slab_free+0x160/0x1c0
slab_free_freelist_hook+0x11b/0x220
kmem_cache_free+0xf0/0x490
skb_free_head+0x17f/0x1b0
skb_release_data+0x59c/0x850
consume_skb+0xd2/0x170
netlink_unicast+0x54f/0x7f0
netlink_sendmsg+0x926/0xe30
sock_sendmsg+0xde/0x190
____sys_sendmsg+0x739/0x920
___sys_sendmsg+0x110/0x1b0
__sys_sendmsg+0xf7/0x1c0
do_syscall_64+0x39/0xb0
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The buggy address belongs to the object at ffff88802e08ed00
which belongs to the cache skbuff_small_head of size 640
The buggy address is located 194 bytes inside of
freed 640-byte region [ffff88802e08ed00, ffff88802e08ef80)
As commit f855691975bb ("xfrm6: Fix the nexthdr offset in
_decode_session6.") showed, xfrm_decode_session was originally intended
only for the receive path. IP6CB(skb)->nhoff is not set during
transmission. Therefore, set the cb field in the skb to 0 before
sending packets.
Fixes: f855691975bb ("xfrm6: Fix the nexthdr offset in _decode_session6.")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
When the xfrm device is set to the qdisc of the sfb type, the cb field
of the sent skb may be modified during enqueuing. Then,
slab-use-after-free may occur when the xfrm device sends IPv6 packets.
The stack information is as follows:
BUG: KASAN: slab-use-after-free in decode_session6+0x103f/0x1890
Read of size 1 at addr ffff8881111458ef by task swapper/3/0
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.4.0-next-20230707 #409
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl+0xd9/0x150
print_address_description.constprop.0+0x2c/0x3c0
kasan_report+0x11d/0x130
decode_session6+0x103f/0x1890
__xfrm_decode_session+0x54/0xb0
xfrmi_xmit+0x173/0x1ca0
dev_hard_start_xmit+0x187/0x700
sch_direct_xmit+0x1a3/0xc30
__qdisc_run+0x510/0x17a0
__dev_queue_xmit+0x2215/0x3b10
neigh_connected_output+0x3c2/0x550
ip6_finish_output2+0x55a/0x1550
ip6_finish_output+0x6b9/0x1270
ip6_output+0x1f1/0x540
ndisc_send_skb+0xa63/0x1890
ndisc_send_rs+0x132/0x6f0
addrconf_rs_timer+0x3f1/0x870
call_timer_fn+0x1a0/0x580
expire_timers+0x29b/0x4b0
run_timer_softirq+0x326/0x910
__do_softirq+0x1d4/0x905
irq_exit_rcu+0xb7/0x120
sysvec_apic_timer_interrupt+0x97/0xc0
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20
RIP: 0010:intel_idle_hlt+0x23/0x30
Code: 1f 84 00 00 00 00 00 f3 0f 1e fa 41 54 41 89 d4 0f 1f 44 00 00 66 90 0f 1f 44 00 00 0f 00 2d c4 9f ab 00 0f 1f 44 00 00 fb f4 <fa> 44 89 e0 41 5c c3 66 0f 1f 44 00 00 f3 0f 1e fa 41 54 41 89 d4
RSP: 0018:ffffc90000197d78 EFLAGS: 00000246
RAX: 00000000000a83c3 RBX: ffffe8ffffd09c50 RCX: ffffffff8a22d8e5
RDX: 0000000000000001 RSI: ffffffff8d3f8080 RDI: ffffe8ffffd09c50
RBP: ffffffff8d3f8080 R08: 0000000000000001 R09: ffffed1026ba6d9d
R10: ffff888135d36ceb R11: 0000000000000001 R12: 0000000000000001
R13: ffffffff8d3f8100 R14: 0000000000000001 R15: 0000000000000000
cpuidle_enter_state+0xd3/0x6f0
cpuidle_enter+0x4e/0xa0
do_idle+0x2fe/0x3c0
cpu_startup_entry+0x18/0x20
start_secondary+0x200/0x290
secondary_startup_64_no_verify+0x167/0x16b
</TASK>
Allocated by task 939:
kasan_save_stack+0x22/0x40
kasan_set_track+0x25/0x30
__kasan_slab_alloc+0x7f/0x90
kmem_cache_alloc_node+0x1cd/0x410
kmalloc_reserve+0x165/0x270
__alloc_skb+0x129/0x330
inet6_ifa_notify+0x118/0x230
__ipv6_ifa_notify+0x177/0xbe0
addrconf_dad_completed+0x133/0xe00
addrconf_dad_work+0x764/0x1390
process_one_work+0xa32/0x16f0
worker_thread+0x67d/0x10c0
kthread+0x344/0x440
ret_from_fork+0x1f/0x30
The buggy address belongs to the object at ffff888111145800
which belongs to the cache skbuff_small_head of size 640
The buggy address is located 239 bytes inside of
freed 640-byte region [ffff888111145800, ffff888111145a80)
As commit f855691975bb ("xfrm6: Fix the nexthdr offset in
_decode_session6.") showed, xfrm_decode_session was originally intended
only for the receive path. IP6CB(skb)->nhoff is not set during
transmission. Therefore, set the cb field in the skb to 0 before
sending packets.
Fixes: f855691975bb ("xfrm6: Fix the nexthdr offset in _decode_session6.")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
After the elimination of inner modes, a couple of warnings that
were previously unreachable can now be triggered by malformed
inbound packets.
Fix this by:
1. Moving the setting of skb->protocol into the decap functions.
2. Returning -EINVAL when unexpected protocol is seen.
Reported-by: Maciej Żenczykowski<maze@google.com>
Fixes: 5f24f41e8ea6 ("xfrm: Remove inner/outer modes from input path")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
strlcpy() reads the entire source buffer first.
This read may exceed the destination size limit.
This is both inefficient and can lead to linear read
overflows if a source string is not NUL-terminated [1].
In an effort to remove strlcpy() completely [2], replace
strlcpy() here with strscpy().
Direct replacement is safe here since return value of -errno
is used to check for truncation instead of sizeof(dest).
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89
Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now in addrconf_mod_rs_timer(), reference idev depends on whether
rs_timer is not pending. Then modify rs_timer timeout.
There is a time gap in [1], during which if the pending rs_timer
becomes not pending. It will miss to hold idev, but the rs_timer
is activated. Thus rs_timer callback function addrconf_rs_timer()
will be executed and put idev later without holding idev. A refcount
underflow issue for idev can be caused by this.
if (!timer_pending(&idev->rs_timer))
in6_dev_hold(idev);
<--------------[1]
mod_timer(&idev->rs_timer, jiffies + when);
To fix the issue, hold idev if mod_timer() return 0.
Fixes: b7b1bfce0bb6 ("ipv6: split duplicate address detection and router solicitation timer")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Amit Klein reported that udp6_ehash_secret was initialized but never used.
Fixes: 1bbdceef1e53 ("inet: convert inet_ehash_secret and ipv6_hash_secret to net_get_random_once")
Reported-by: Amit Klein <aksecurity@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: David Ahern <dsahern@kernel.org>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With some IPv6 Ext Hdr (RPL, SRv6, etc.), we can send a packet that
has the link-local address as src and dst IP and will be forwarded to
an external IP in the IPv6 Ext Hdr.
For example, the script below generates a packet whose src IP is the
link-local address and dst is updated to 11::.
# for f in $(find /proc/sys/net/ -name *seg6_enabled*); do echo 1 > $f; done
# python3
>>> from socket import *
>>> from scapy.all import *
>>>
>>> SRC_ADDR = DST_ADDR = "fe80::5054:ff:fe12:3456"
>>>
>>> pkt = IPv6(src=SRC_ADDR, dst=DST_ADDR)
>>> pkt /= IPv6ExtHdrSegmentRouting(type=4, addresses=["11::", "22::"], segleft=1)
>>>
>>> sk = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW)
>>> sk.sendto(bytes(pkt), (DST_ADDR, 0))
For such a packet, we call ip6_route_input() to look up a route for the
next destination in these three functions depending on the header type.
* ipv6_rthdr_rcv()
* ipv6_rpl_srh_rcv()
* ipv6_srh_rcv()
If no route is found, ip6_null_entry is set to skb, and the following
dst_input(skb) calls ip6_pkt_drop().
Finally, in icmp6_dev(), we dereference skb_rt6_info(skb)->rt6i_idev->dev
as the input device is the loopback interface. Then, we have to check if
skb_rt6_info(skb)->rt6i_idev is NULL or not to avoid NULL pointer deref
for ip6_null_entry.
BUG: kernel NULL pointer dereference, address: 0000000000000000
PF: supervisor read access in kernel mode
PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 157 Comm: python3 Not tainted 6.4.0-11996-gb121d614371c #35
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:icmp6_send (net/ipv6/icmp.c:436 net/ipv6/icmp.c:503)
Code: fe ff ff 48 c7 40 30 c0 86 5d 83 e8 c6 44 1c 00 e9 c8 fc ff ff 49 8b 46 58 48 83 e0 fe 0f 84 4a fb ff ff 48 8b 80 d0 00 00 00 <48> 8b 00 44 8b 88 e0 00 00 00 e9 34 fb ff ff 4d 85 ed 0f 85 69 01
RSP: 0018:ffffc90000003c70 EFLAGS: 00000286
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00000000000000e0
RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffff888006d72a18
RBP: ffffc90000003d80 R08: 0000000000000000 R09: 0000000000000001
R10: ffffc90000003d98 R11: 0000000000000040 R12: ffff888006d72a10
R13: 0000000000000000 R14: ffff8880057fb800 R15: ffffffff835d86c0
FS: 00007f9dc72ee740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000057b2000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
<IRQ>
ip6_pkt_drop (net/ipv6/route.c:4513)
ipv6_rthdr_rcv (net/ipv6/exthdrs.c:640 net/ipv6/exthdrs.c:686)
ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:437 (discriminator 5))
ip6_input_finish (./include/linux/rcupdate.h:781 net/ipv6/ip6_input.c:483)
__netif_receive_skb_one_core (net/core/dev.c:5455)
process_backlog (./include/linux/rcupdate.h:781 net/core/dev.c:5895)
__napi_poll (net/core/dev.c:6460)
net_rx_action (net/core/dev.c:6529 net/core/dev.c:6660)
__do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:554)
do_softirq (kernel/softirq.c:454 kernel/softirq.c:441)
</IRQ>
<TASK>
__local_bh_enable_ip (kernel/softirq.c:381)
__dev_queue_xmit (net/core/dev.c:4231)
ip6_finish_output2 (./include/net/neighbour.h:544 net/ipv6/ip6_output.c:135)
rawv6_sendmsg (./include/net/dst.h:458 ./include/linux/netfilter.h:303 net/ipv6/raw.c:656 net/ipv6/raw.c:914)
sock_sendmsg (net/socket.c:725 net/socket.c:748)
__sys_sendto (net/socket.c:2134)
__x64_sys_sendto (net/socket.c:2146 net/socket.c:2142 net/socket.c:2142)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
RIP: 0033:0x7f9dc751baea
Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
RSP: 002b:00007ffe98712c38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007ffe98712cf8 RCX: 00007f9dc751baea
RDX: 0000000000000060 RSI: 00007f9dc6460b90 RDI: 0000000000000003
RBP: 00007f9dc56e8be0 R08: 00007ffe98712d70 R09: 000000000000001c
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007f9dc6af5d1b
</TASK>
Modules linked in:
CR2: 0000000000000000
---[ end trace 0000000000000000 ]---
RIP: 0010:icmp6_send (net/ipv6/icmp.c:436 net/ipv6/icmp.c:503)
Code: fe ff ff 48 c7 40 30 c0 86 5d 83 e8 c6 44 1c 00 e9 c8 fc ff ff 49 8b 46 58 48 83 e0 fe 0f 84 4a fb ff ff 48 8b 80 d0 00 00 00 <48> 8b 00 44 8b 88 e0 00 00 00 e9 34 fb ff ff 4d 85 ed 0f 85 69 01
RSP: 0018:ffffc90000003c70 EFLAGS: 00000286
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00000000000000e0
RDX: 0000000000000021 RSI: 0000000000000000 RDI: ffff888006d72a18
RBP: ffffc90000003d80 R08: 0000000000000000 R09: 0000000000000001
R10: ffffc90000003d98 R11: 0000000000000040 R12: ffff888006d72a10
R13: 0000000000000000 R14: ffff8880057fb800 R15: ffffffff835d86c0
FS: 00007f9dc72ee740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000057b2000 CR4: 00000000007506f0
PKRU: 55555554
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
Fixes: 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address")
Reported-by: Wang Yufen <wangyufen@huawei.com>
Closes: https://lore.kernel.org/netdev/c41403a9-c2f6-3b7e-0c96-e1901e605cd0@huawei.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ian reported several skb corruptions triggered by rx-gro-list,
collecting different oops alike:
[ 62.624003] BUG: kernel NULL pointer dereference, address: 00000000000000c0
[ 62.631083] #PF: supervisor read access in kernel mode
[ 62.636312] #PF: error_code(0x0000) - not-present page
[ 62.641541] PGD 0 P4D 0
[ 62.644174] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 62.648629] CPU: 1 PID: 913 Comm: napi/eno2-79 Not tainted 6.4.0 #364
[ 62.655162] Hardware name: Supermicro Super Server/A2SDi-12C-HLN4F, BIOS 1.7a 10/13/2022
[ 62.663344] RIP: 0010:__udp_gso_segment (./include/linux/skbuff.h:2858
./include/linux/udp.h:23 net/ipv4/udp_offload.c:228 net/ipv4/udp_offload.c:261
net/ipv4/udp_offload.c:277)
[ 62.687193] RSP: 0018:ffffbd3a83b4f868 EFLAGS: 00010246
[ 62.692515] RAX: 00000000000000ce RBX: 0000000000000000 RCX: 0000000000000000
[ 62.699743] RDX: ffffa124def8a000 RSI: 0000000000000079 RDI: ffffa125952a14d4
[ 62.706970] RBP: ffffa124def8a000 R08: 0000000000000022 R09: 00002000001558c9
[ 62.714199] R10: 0000000000000000 R11: 00000000be554639 R12: 00000000000000e2
[ 62.721426] R13: ffffa125952a1400 R14: ffffa125952a1400 R15: 00002000001558c9
[ 62.728654] FS: 0000000000000000(0000) GS:ffffa127efa40000(0000)
knlGS:0000000000000000
[ 62.736852] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 62.742702] CR2: 00000000000000c0 CR3: 00000001034b0000 CR4: 00000000003526e0
[ 62.749948] Call Trace:
[ 62.752498] <TASK>
[ 62.779267] inet_gso_segment (net/ipv4/af_inet.c:1398)
[ 62.787605] skb_mac_gso_segment (net/core/gro.c:141)
[ 62.791906] __skb_gso_segment (net/core/dev.c:3403 (discriminator 2))
[ 62.800492] validate_xmit_skb (./include/linux/netdevice.h:4862
net/core/dev.c:3659)
[ 62.804695] validate_xmit_skb_list (net/core/dev.c:3710)
[ 62.809158] sch_direct_xmit (net/sched/sch_generic.c:330)
[ 62.813198] __dev_queue_xmit (net/core/dev.c:3805 net/core/dev.c:4210)
net/netfilter/core.c:626)
[ 62.821093] br_dev_queue_push_xmit (net/bridge/br_forward.c:55)
[ 62.825652] maybe_deliver (net/bridge/br_forward.c:193)
[ 62.829420] br_flood (net/bridge/br_forward.c:233)
[ 62.832758] br_handle_frame_finish (net/bridge/br_input.c:215)
[ 62.837403] br_handle_frame (net/bridge/br_input.c:298
net/bridge/br_input.c:416)
[ 62.851417] __netif_receive_skb_core.constprop.0 (net/core/dev.c:5387)
[ 62.866114] __netif_receive_skb_list_core (net/core/dev.c:5570)
[ 62.871367] netif_receive_skb_list_internal (net/core/dev.c:5638
net/core/dev.c:5727)
[ 62.876795] napi_complete_done (./include/linux/list.h:37
./include/net/gro.h:434 ./include/net/gro.h:429 net/core/dev.c:6067)
[ 62.881004] ixgbe_poll (drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:3191)
[ 62.893534] __napi_poll (net/core/dev.c:6498)
[ 62.897133] napi_threaded_poll (./include/linux/netpoll.h:89
net/core/dev.c:6640)
[ 62.905276] kthread (kernel/kthread.c:379)
[ 62.913435] ret_from_fork (arch/x86/entry/entry_64.S:314)
[ 62.917119] </TASK>
In the critical scenario, rx-gro-list GRO-ed packets are fed, via a
bridge, both to the local input path and to an egress device (tun).
The segmentation of such packets unsafely writes to the cloned skbs
with shared heads.
This change addresses the issue by uncloning as needed the
to-be-segmented skbs.
Reported-by: Ian Kumlien <ian.kumlien@gmail.com>
Tested-by: Ian Kumlien <ian.kumlien@gmail.com>
Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The tracepoint has existed for 12 years, but it only covered udp
over the legacy IPv4 protocol. Having it enabled for udp6 removes
the unnecessary difference in error visibility.
Signed-off-by: Ivan Babrou <ivan@cloudflare.com>
Fixes: 296f7ea75b45 ("udp: add tracepoints for queueing skb to rcvbuf")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the event of a failure in tcf_change_indev(), fw_set_parms() will
immediately return an error after incrementing or decrementing
reference counter in tcf_bind_filter(). If attacker can control
reference counter to zero and make reference freed, leading to
use after free.
In order to prevent this, move the point of possible failure above the
point where the TC_FW_CLASSID is handled.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: M A Ramdhan <ramdhan@starlabs.sg>
Signed-off-by: M A Ramdhan <ramdhan@starlabs.sg>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Message-ID: <20230705161530.52003-1-ramdhan@starlabs.sg>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Fix missing overflow use refcount checks in nf_tables.
2) Do not set IPS_ASSURED for IPS_NAT_CLASH entries in GRE tracker,
from Florian Westphal.
3) Bail out if nf_ct_helper_hash is NULL before registering helper,
from Florent Revest.
4) Use siphash() instead siphash_4u64() to fix performance regression,
also from Florian.
5) Do not allow to add rules to removed chains via ID,
from Thadeu Lima de Souza Cascardo.
6) Fix oob read access in byteorder expression, also from Thadeu.
netfilter pull request 23-07-06
====================
Link: https://lore.kernel.org/r/20230705230406.52201-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.
It may lead to a stack-out-of-bounds access like the one below:
[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint
Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.
Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, bpf and wireguard.
Current release - regressions:
- nvme-tcp: fix comma-related oops after sendpage changes
Current release - new code bugs:
- ptp: make max_phase_adjustment sysfs device attribute invisible
when not supported
Previous releases - regressions:
- sctp: fix potential deadlock on &net->sctp.addr_wq_lock
- mptcp:
- ensure subflow is unhashed before cleaning the backlog
- do not rely on implicit state check in mptcp_listen()
Previous releases - always broken:
- net: fix net_dev_start_xmit trace event vs skb_transport_offset()
- Bluetooth:
- fix use-bdaddr-property quirk
- L2CAP: fix multiple UaFs
- ISO: use hci_sync for setting CIG parameters
- hci_event: fix Set CIG Parameters error status handling
- hci_event: fix parsing of CIS Established Event
- MGMT: fix marking SCAN_RSP as not connectable
- wireguard: queuing: use saner cpu selection wrapping
- sched: act_ipt: various bug fixes for iptables <> TC interactions
- sched: act_pedit: add size check for TCA_PEDIT_PARMS_EX
- dsa: fixes for receiving PTP packets with 8021q and sja1105 tagging
- eth: sfc: fix null-deref in devlink port without MAE access
- eth: ibmvnic: do not reset dql stats on NON_FATAL err
Misc:
- xsk: honor SO_BINDTODEVICE on bind"
* tag 'net-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
nfp: clean mc addresses in application firmware when closing port
selftests: mptcp: pm_nl_ctl: fix 32-bit support
selftests: mptcp: depend on SYN_COOKIES
selftests: mptcp: userspace_pm: report errors with 'remove' tests
selftests: mptcp: userspace_pm: use correct server port
selftests: mptcp: sockopt: return error if wrong mark
selftests: mptcp: sockopt: use 'iptables-legacy' if available
selftests: mptcp: connect: fail if nft supposed to work
mptcp: do not rely on implicit state check in mptcp_listen()
mptcp: ensure subflow is unhashed before cleaning the backlog
s390/qeth: Fix vipa deletion
octeontx-af: fix hardware timestamp configuration
net: dsa: sja1105: always enable the send_meta options
net: dsa: tag_sja1105: fix MAC DA patching from meta frames
net: Replace strlcpy with strscpy
pptp: Fix fib lookup calls.
mlxsw: spectrum_router: Fix an IS_ERR() vs NULL check
net/sched: act_pedit: Add size check for TCA_PEDIT_PARMS_EX
xsk: Honor SO_BINDTODEVICE on bind
ptp: Make max_phase_adjustment sysfs device attribute invisible when not supported
...
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-07-05
We've added 2 non-merge commits during the last 1 day(s) which contain
a total of 3 files changed, 16 insertions(+), 4 deletions(-).
The main changes are:
1) Fix BTF to warn but not returning an error for a NULL BTF to still be
able to load modules under CONFIG_DEBUG_INFO_BTF, from SeongJae Park.
2) Fix xsk sockets to honor SO_BINDTODEVICE in bind(), from Ilya Maximets.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
xsk: Honor SO_BINDTODEVICE on bind
bpf, btf: Warn but return no error for NULL btf from __register_btf_kfunc_id_set()
====================
Link: https://lore.kernel.org/r/20230705171716.6494-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When adding a rule to a chain referring to its ID, if that chain had been
deleted on the same batch, the rule might end up referring to a deleted
chain.
This will lead to a WARNING like following:
[ 33.098431] ------------[ cut here ]------------
[ 33.098678] WARNING: CPU: 5 PID: 69 at net/netfilter/nf_tables_api.c:2037 nf_tables_chain_destroy+0x23d/0x260
[ 33.099217] Modules linked in:
[ 33.099388] CPU: 5 PID: 69 Comm: kworker/5:1 Not tainted 6.4.0+ #409
[ 33.099726] Workqueue: events nf_tables_trans_destroy_work
[ 33.100018] RIP: 0010:nf_tables_chain_destroy+0x23d/0x260
[ 33.100306] Code: 8b 7c 24 68 e8 64 9c ed fe 4c 89 e7 e8 5c 9c ed fe 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 89 c6 89 c7 c3 cc cc cc cc <0f> 0b 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 89 c6 89 c7
[ 33.101271] RSP: 0018:ffffc900004ffc48 EFLAGS: 00010202
[ 33.101546] RAX: 0000000000000001 RBX: ffff888006fc0a28 RCX: 0000000000000000
[ 33.101920] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 33.102649] RBP: ffffc900004ffc78 R08: 0000000000000000 R09: 0000000000000000
[ 33.103018] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8880135ef500
[ 33.103385] R13: 0000000000000000 R14: dead000000000122 R15: ffff888006fc0a10
[ 33.103762] FS: 0000000000000000(0000) GS:ffff888024c80000(0000) knlGS:0000000000000000
[ 33.104184] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 33.104493] CR2: 00007fe863b56a50 CR3: 00000000124b0001 CR4: 0000000000770ee0
[ 33.104872] PKRU: 55555554
[ 33.104999] Call Trace:
[ 33.105113] <TASK>
[ 33.105214] ? show_regs+0x72/0x90
[ 33.105371] ? __warn+0xa5/0x210
[ 33.105520] ? nf_tables_chain_destroy+0x23d/0x260
[ 33.105732] ? report_bug+0x1f2/0x200
[ 33.105902] ? handle_bug+0x46/0x90
[ 33.106546] ? exc_invalid_op+0x19/0x50
[ 33.106762] ? asm_exc_invalid_op+0x1b/0x20
[ 33.106995] ? nf_tables_chain_destroy+0x23d/0x260
[ 33.107249] ? nf_tables_chain_destroy+0x30/0x260
[ 33.107506] nf_tables_trans_destroy_work+0x669/0x680
[ 33.107782] ? mark_held_locks+0x28/0xa0
[ 33.107996] ? __pfx_nf_tables_trans_destroy_work+0x10/0x10
[ 33.108294] ? _raw_spin_unlock_irq+0x28/0x70
[ 33.108538] process_one_work+0x68c/0xb70
[ 33.108755] ? lock_acquire+0x17f/0x420
[ 33.108977] ? __pfx_process_one_work+0x10/0x10
[ 33.109218] ? do_raw_spin_lock+0x128/0x1d0
[ 33.109435] ? _raw_spin_lock_irq+0x71/0x80
[ 33.109634] worker_thread+0x2bd/0x700
[ 33.109817] ? __pfx_worker_thread+0x10/0x10
[ 33.110254] kthread+0x18b/0x1d0
[ 33.110410] ? __pfx_kthread+0x10/0x10
[ 33.110581] ret_from_fork+0x29/0x50
[ 33.110757] </TASK>
[ 33.110866] irq event stamp: 1651
[ 33.111017] hardirqs last enabled at (1659): [<ffffffffa206a209>] __up_console_sem+0x79/0xa0
[ 33.111379] hardirqs last disabled at (1666): [<ffffffffa206a1ee>] __up_console_sem+0x5e/0xa0
[ 33.111740] softirqs last enabled at (1616): [<ffffffffa1f5d40e>] __irq_exit_rcu+0x9e/0xe0
[ 33.112094] softirqs last disabled at (1367): [<ffffffffa1f5d40e>] __irq_exit_rcu+0x9e/0xe0
[ 33.112453] ---[ end trace 0000000000000000 ]---
This is due to the nft_chain_lookup_byid ignoring the genmask. After this
change, adding the new rule will fail as it will not find the chain.
Fixes: 837830a4b439 ("netfilter: nf_tables: add NFTA_RULE_CHAIN_ID attribute")
Cc: stable@vger.kernel.org
Reported-by: Mingi Cho of Theori working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|