Age | Commit message (Collapse) | Author |
|
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Rewrite inner header IPv6 in ICMPv6 messages in ip6t_NPT,
from Michael Zhou.
2) do_ip_vs_set_ctl() dereferences uninitialized value,
from Peilin Ye.
3) Support for userdata in tables, from Jose M. Guisado.
4) Do not increment ct error and invalid stats at the same time,
from Florian Westphal.
5) Remove ct ignore stats, also from Florian.
6) Add ct stats for clash resolution, from Florian Westphal.
7) Bump reference counter bump on ct clash resolution only,
this is safe because bucket lock is held, again from Florian.
8) Use ip_is_fragment() in xt_HMARK, from YueHaibing.
9) Add wildcard support for nft_socket, from Balazs Scheidler.
10) Remove superfluous IPVS dependency on iptables, from
Yaroslav Bolyukin.
11) Remove unused definition in ebt_stp, from Wang Hai.
12) Replace CONFIG_NFT_CHAIN_NAT_{IPV4,IPV6} by CONFIG_NFT_NAT
in selftests/net, from Fabian Frederick.
13) Add userdata support for nft_object, from Jose M. Guisado.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
clean follow coccicheck warning:
net//hsr/hsr_netlink.c:94:8-42: WARNING avoid newline at end of message
in NL_SET_ERR_MSG_MOD
net//hsr/hsr_netlink.c:87:30-57: WARNING avoid newline at end of message
in NL_SET_ERR_MSG_MOD
net//hsr/hsr_netlink.c:79:29-53: WARNING avoid newline at end of message
in NL_SET_ERR_MSG_MOD
Signed-off-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull NFS client bugfixes from Trond Myklebust:
- Fix an NFS/RDMA resource leak
- Fix the error handling during delegation recall
- NFSv4.0 needs to return the delegation on a zero-stateid SETATTR
- Stop printk reading past end of string
* tag 'nfs-for-5.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: stop printk reading past end of string
NFS: Zero-stateid SETATTR should first return delegation
NFSv4.1 handle ERR_DELAY error reclaiming locking state on delegation recall
xprtrdma: Release in-flight MRs on disconnect
|
|
If skb_put_padto() returns an error, skb has been freed.
Better not touch it anymore, as reported by syzbot [1]
Note to qrtr maintainers : this suggests qrtr_sendmsg()
should adjust sock_alloc_send_skb() second parameter
to account for the potential added alignment to avoid
reallocation.
[1]
BUG: KASAN: use-after-free in __skb_insert include/linux/skbuff.h:1907 [inline]
BUG: KASAN: use-after-free in __skb_queue_before include/linux/skbuff.h:2016 [inline]
BUG: KASAN: use-after-free in __skb_queue_tail include/linux/skbuff.h:2049 [inline]
BUG: KASAN: use-after-free in skb_queue_tail+0x6b/0x120 net/core/skbuff.c:3146
Write of size 8 at addr ffff88804d8ab3c0 by task syz-executor.4/4316
CPU: 1 PID: 4316 Comm: syz-executor.4 Not tainted 5.9.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1d6/0x29e lib/dump_stack.c:118
print_address_description+0x66/0x620 mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report+0x132/0x1d0 mm/kasan/report.c:530
__skb_insert include/linux/skbuff.h:1907 [inline]
__skb_queue_before include/linux/skbuff.h:2016 [inline]
__skb_queue_tail include/linux/skbuff.h:2049 [inline]
skb_queue_tail+0x6b/0x120 net/core/skbuff.c:3146
qrtr_tun_send+0x1a/0x40 net/qrtr/tun.c:23
qrtr_node_enqueue+0x44f/0xc00 net/qrtr/qrtr.c:364
qrtr_bcast_enqueue+0xbe/0x140 net/qrtr/qrtr.c:861
qrtr_sendmsg+0x680/0x9c0 net/qrtr/qrtr.c:960
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg net/socket.c:671 [inline]
sock_write_iter+0x317/0x470 net/socket.c:998
call_write_iter include/linux/fs.h:1882 [inline]
new_sync_write fs/read_write.c:503 [inline]
vfs_write+0xa96/0xd10 fs/read_write.c:578
ksys_write+0x11b/0x220 fs/read_write.c:631
do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45d5b9
Code: 5d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f84b5b81c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000038b40 RCX: 000000000045d5b9
RDX: 0000000000000055 RSI: 0000000020001240 RDI: 0000000000000003
RBP: 00007f84b5b81ca0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000f
R13: 00007ffcbbf86daf R14: 00007f84b5b829c0 R15: 000000000118cf4c
Allocated by task 4316:
kasan_save_stack mm/kasan/common.c:48 [inline]
kasan_set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc+0x100/0x130 mm/kasan/common.c:461
slab_post_alloc_hook+0x3e/0x290 mm/slab.h:518
slab_alloc mm/slab.c:3312 [inline]
kmem_cache_alloc+0x1c1/0x2d0 mm/slab.c:3482
skb_clone+0x1b2/0x370 net/core/skbuff.c:1449
qrtr_bcast_enqueue+0x6d/0x140 net/qrtr/qrtr.c:857
qrtr_sendmsg+0x680/0x9c0 net/qrtr/qrtr.c:960
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg net/socket.c:671 [inline]
sock_write_iter+0x317/0x470 net/socket.c:998
call_write_iter include/linux/fs.h:1882 [inline]
new_sync_write fs/read_write.c:503 [inline]
vfs_write+0xa96/0xd10 fs/read_write.c:578
ksys_write+0x11b/0x220 fs/read_write.c:631
do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 4316:
kasan_save_stack mm/kasan/common.c:48 [inline]
kasan_set_track+0x3d/0x70 mm/kasan/common.c:56
kasan_set_free_info+0x17/0x30 mm/kasan/generic.c:355
__kasan_slab_free+0xdd/0x110 mm/kasan/common.c:422
__cache_free mm/slab.c:3418 [inline]
kmem_cache_free+0x82/0xf0 mm/slab.c:3693
__skb_pad+0x3f5/0x5a0 net/core/skbuff.c:1823
__skb_put_padto include/linux/skbuff.h:3233 [inline]
skb_put_padto include/linux/skbuff.h:3252 [inline]
qrtr_node_enqueue+0x62f/0xc00 net/qrtr/qrtr.c:360
qrtr_bcast_enqueue+0xbe/0x140 net/qrtr/qrtr.c:861
qrtr_sendmsg+0x680/0x9c0 net/qrtr/qrtr.c:960
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg net/socket.c:671 [inline]
sock_write_iter+0x317/0x470 net/socket.c:998
call_write_iter include/linux/fs.h:1882 [inline]
new_sync_write fs/read_write.c:503 [inline]
vfs_write+0xa96/0xd10 fs/read_write.c:578
ksys_write+0x11b/0x220 fs/read_write.c:631
do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The buggy address belongs to the object at ffff88804d8ab3c0
which belongs to the cache skbuff_head_cache of size 224
The buggy address is located 0 bytes inside of
224-byte region [ffff88804d8ab3c0, ffff88804d8ab4a0)
The buggy address belongs to the page:
page:00000000ea8cccfb refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88804d8abb40 pfn:0x4d8ab
flags: 0xfffe0000000200(slab)
raw: 00fffe0000000200 ffffea0002237ec8 ffffea00029b3388 ffff88821bb66800
raw: ffff88804d8abb40 ffff88804d8ab000 000000010000000b 0000000000000000
page dumped because: kasan: bad access detected
Fixes: ce57785bf91b ("net: qrtr: fix len of skb_put_padto in qrtr_node_enqueue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Carl Huang <cjhuang@codeaurora.org>
Cc: Wen Gong <wgong@codeaurora.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, ipv6 stack does not do any TOS reflection. To make the
behavior consistent with v4 stack, this commit adds TOS reflection in
tcp_v6_reqsk_send_ack() and tcp_v6_send_reset(). We clear the lower
2-bit ECN value of the received TOS in compliance with RFC 3168 6.1.5
robustness principles.
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, in tcp_v4_reqsk_send_ack() and tcp_v4_send_reset(), we
echo the TOS value of the received packets in the response.
However, we do not want to echo the lower 2 ECN bits in accordance
with RFC 3168 6.1.5 robustness principles.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
rxrpc: Allow more calls to same peer
Here are some development patches for AF_RXRPC that allow more simultaneous
calls to be made to the same peer with the same security parameters. The
current code allows a maximum of 4 simultaneous calls, which limits the afs
filesystem to that many simultaneous threads. This increases the limit to
16.
To make this work, the way client connections are limited has to be changed
(incoming call/connection limits are unaffected) as the current code
depends on queuing calls on a connection and then pushing the connection
through a queue. The limit is on the number of available connections.
This is changed such that there's a limit[*] on the total number of calls
systemwide across all namespaces, but the limit on the number of client
connections is removed.
Once a call is allowed to proceed, it finds a bundle of connections and
tries to grab a call slot. If there's a spare call slot, fine, otherwise
it will wait. If there's already a waiter, it will try to create another
connection in the bundle, unless the limit of 4 is reached (4 calls per
connection, giving 16).
A number of things throttle someone trying to set up endless connections:
- Calls that fail immediately have their conns deleted immediately,
- Calls that don't fail immediately have to wait for a timeout,
- Connections normally get automatically reaped if they haven't been used
for 2m, but this is sped up to 2s if the number of connections rises
over 900. This number is tunable by sysctl.
[*] Technically two limits - kernel sockets and userspace rxrpc sockets are
accounted separately.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan
Stefan Schmidt says:
====================
pull-request: ieee802154 for net 2020-09-08
An update from ieee802154 for your *net* tree.
A potential memory leak fix for ca8210 from Liu Jian,
a check on the return for a register read in adf7242
and finally a user after free fix in the softmac tx
function from Eric found by syzkaller.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Stephen reported the following warning:
net/bridge/br_multicast.c: In function 'br_multicast_find_port':
net/bridge/br_multicast.c:1818:21: warning: unused variable 'br' [-Wunused-variable]
1818 | struct net_bridge *br = mp->br;
| ^~
It happens due to bridge's mlock_dereference() when lockdep isn't defined.
Silence the warning by annotating the variable as __maybe_unused.
Fixes: 0436862e417e ("net: bridge: mcast: support for IGMPv3/MLDv2 ALLOW_NEW_SOURCES report")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If CONFIG_IPV6=m, the IPV6 functions won't be found by the linker:
ld: net/core/fib_rules.o: in function `fib_rules_lookup':
fib_rules.c:(.text+0x606): undefined reference to `fib6_rule_match'
ld: fib_rules.c:(.text+0x611): undefined reference to `fib6_rule_match'
ld: fib_rules.c:(.text+0x68c): undefined reference to `fib6_rule_action'
ld: fib_rules.c:(.text+0x693): undefined reference to `fib6_rule_action'
ld: fib_rules.c:(.text+0x6aa): undefined reference to `fib6_rule_suppress'
ld: fib_rules.c:(.text+0x6bc): undefined reference to `fib6_rule_suppress'
make: *** [Makefile:1166: vmlinux] Error 1
Reported-by: Sven Joachim <svenjoac@gmx.de>
Fixes: b9aaec8f0be5 ("fib: use indirect call wrappers in the most common fib_rules_ops")
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
===================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Allow conntrack entries with l3num == NFPROTO_IPV4 or == NFPROTO_IPV6
only via ctnetlink, from Will McVicker.
2) Batch notifications to userspace to improve netlink socket receive
utilization.
3) Restore mark based dump filtering via ctnetlink, from Martin Willi.
4) nf_conncount_init() fails with -EPROTO with CONFIG_IPV6, from
Eelco Chaudron.
5) Containers fail to match on meta skuid and skgid, use socket user_ns
to retrieve meta skuid and skgid.
===================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes the following W=1 kernel build warning(s):
net/netlabel/netlabel_calipso.c:438: warning: Excess function parameter 'audit_secid' description in 'calipso_doi_remove'
net/netlabel/netlabel_calipso.c:605: warning: Excess function parameter 'reg' description in 'calipso_req_delattr'
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes the following W=1 kernel build warning(s):
net/ipv4/cipso_ipv4.c:510: warning: Excess function parameter 'audit_secid' description in 'cipso_v4_doi_remove'
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzbot reported twice a lockdep issue in fib6_del() [1]
which I think is caused by net->ipv6.fib6_null_entry
having a NULL fib6_table pointer.
fib6_del() already checks for fib6_null_entry special
case, we only need to return earlier.
Bug seems to occur very rarely, I have thus chosen
a 'bug origin' that makes backports not too complex.
[1]
WARNING: suspicious RCU usage
5.9.0-rc4-syzkaller #0 Not tainted
-----------------------------
net/ipv6/ip6_fib.c:1996 suspicious rcu_dereference_protected() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
4 locks held by syz-executor.5/8095:
#0: ffffffff8a7ea708 (rtnl_mutex){+.+.}-{3:3}, at: ppp_release+0x178/0x240 drivers/net/ppp/ppp_generic.c:401
#1: ffff88804c422dd8 (&net->ipv6.fib6_gc_lock){+.-.}-{2:2}, at: spin_trylock_bh include/linux/spinlock.h:414 [inline]
#1: ffff88804c422dd8 (&net->ipv6.fib6_gc_lock){+.-.}-{2:2}, at: fib6_run_gc+0x21b/0x2d0 net/ipv6/ip6_fib.c:2312
#2: ffffffff89bd6a40 (rcu_read_lock){....}-{1:2}, at: __fib6_clean_all+0x0/0x290 net/ipv6/ip6_fib.c:2613
#3: ffff8880a82e6430 (&tb->tb6_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:359 [inline]
#3: ffff8880a82e6430 (&tb->tb6_lock){+.-.}-{2:2}, at: __fib6_clean_all+0x107/0x290 net/ipv6/ip6_fib.c:2245
stack backtrace:
CPU: 1 PID: 8095 Comm: syz-executor.5 Not tainted 5.9.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x198/0x1fd lib/dump_stack.c:118
fib6_del+0x12b4/0x1630 net/ipv6/ip6_fib.c:1996
fib6_clean_node+0x39b/0x570 net/ipv6/ip6_fib.c:2180
fib6_walk_continue+0x4aa/0x8e0 net/ipv6/ip6_fib.c:2102
fib6_walk+0x182/0x370 net/ipv6/ip6_fib.c:2150
fib6_clean_tree+0xdb/0x120 net/ipv6/ip6_fib.c:2230
__fib6_clean_all+0x120/0x290 net/ipv6/ip6_fib.c:2246
fib6_clean_all net/ipv6/ip6_fib.c:2257 [inline]
fib6_run_gc+0x113/0x2d0 net/ipv6/ip6_fib.c:2320
ndisc_netdev_event+0x217/0x350 net/ipv6/ndisc.c:1805
notifier_call_chain+0xb5/0x200 kernel/notifier.c:83
call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:2033
call_netdevice_notifiers_extack net/core/dev.c:2045 [inline]
call_netdevice_notifiers net/core/dev.c:2059 [inline]
dev_close_many+0x30b/0x650 net/core/dev.c:1634
rollback_registered_many+0x3a8/0x1210 net/core/dev.c:9261
rollback_registered net/core/dev.c:9329 [inline]
unregister_netdevice_queue+0x2dd/0x570 net/core/dev.c:10410
unregister_netdevice include/linux/netdevice.h:2774 [inline]
ppp_release+0x216/0x240 drivers/net/ppp/ppp_generic.c:403
__fput+0x285/0x920 fs/file_table.c:281
task_work_run+0xdd/0x190 kernel/task_work.c:141
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_user_mode_loop kernel/entry/common.c:163 [inline]
exit_to_user_mode_prepare+0x1e1/0x200 kernel/entry/common.c:190
syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:265
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 421842edeaf6 ("net/ipv6: Add fib6_null_entry")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Ahern <dsahern@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since commit 845e0ebb4408 ("net: change addr_list_lock back to static
key"), cascaded DSA setups (DSA switch port as DSA master for another
DSA switch port) are emitting this lockdep warning:
============================================
WARNING: possible recursive locking detected
5.8.0-rc1-00133-g923e4b5032dd-dirty #208 Not tainted
--------------------------------------------
dhcpcd/323 is trying to acquire lock:
ffff000066dd4268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90
but task is already holding lock:
ffff00006608c268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&dsa_master_addr_list_lock_key/1);
lock(&dsa_master_addr_list_lock_key/1);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by dhcpcd/323:
#0: ffffdbd1381dda18 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x24/0x30
#1: ffff00006614b268 (_xmit_ETHER){+...}-{2:2}, at: dev_set_rx_mode+0x28/0x48
#2: ffff00006608c268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90
stack backtrace:
Call trace:
dump_backtrace+0x0/0x1e0
show_stack+0x20/0x30
dump_stack+0xec/0x158
__lock_acquire+0xca0/0x2398
lock_acquire+0xe8/0x440
_raw_spin_lock_nested+0x64/0x90
dev_mc_sync+0x44/0x90
dsa_slave_set_rx_mode+0x34/0x50
__dev_set_rx_mode+0x60/0xa0
dev_mc_sync+0x84/0x90
dsa_slave_set_rx_mode+0x34/0x50
__dev_set_rx_mode+0x60/0xa0
dev_set_rx_mode+0x30/0x48
__dev_open+0x10c/0x180
__dev_change_flags+0x170/0x1c8
dev_change_flags+0x2c/0x70
devinet_ioctl+0x774/0x878
inet_ioctl+0x348/0x3b0
sock_do_ioctl+0x50/0x310
sock_ioctl+0x1f8/0x580
ksys_ioctl+0xb0/0xf0
__arm64_sys_ioctl+0x28/0x38
el0_svc_common.constprop.0+0x7c/0x180
do_el0_svc+0x2c/0x98
el0_sync_handler+0x9c/0x1b8
el0_sync+0x158/0x180
Since DSA never made use of the netdev API for describing links between
upper devices and lower devices, the dev->lower_level value of a DSA
switch interface would be 1, which would warn when it is a DSA master.
We can use netdev_upper_dev_link() to describe the relationship between
a DSA slave and a DSA master. To be precise, a DSA "slave" (switch port)
is an "upper" to a DSA "master" (host port). The relationship is "many
uppers to one lower", like in the case of VLAN. So, for that reason, we
use the same function as VLAN uses.
There might be a chance that somebody will try to take hold of this
interface and use it immediately after register_netdev() and before
netdev_upper_dev_link(). To avoid that, we do the registration and
linkage while holding the RTNL, and we use the RTNL-locked cousin of
register_netdev(), which is register_netdevice().
Since this warning was not there when lockdep was using dynamic keys for
addr_list_lock, we are blaming the lockdep patch itself. The network
stack _has_ been using static lockdep keys before, and it _is_ likely
that stacked DSA setups have been triggering these lockdep warnings
since forever, however I can't test very old kernels on this particular
stacked DSA setup, to ensure I'm not in fact introducing regressions.
Fixes: 845e0ebb4408 ("net: change addr_list_lock back to static key")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Reviewing the error handling in tcf_action_init_1()
most of the early handling uses
err_out:
if (cookie) {
kfree(cookie->data);
kfree(cookie);
}
before cookie could ever be set.
So skip the unnecessay check.
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow the number of parallel connections to a machine to be expanded from a
single connection to a maximum of four. This allows up to 16 calls to be
in progress at the same time to any particular peer instead of 4.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Rewrite the rxrpc client connection manager so that it can support multiple
connections for a given security key to a peer. The following changes are
made:
(1) For each open socket, the code currently maintains an rbtree with the
connections placed into it, keyed by communications parameters. This
is tricky to maintain as connections can be culled from the tree or
replaced within it. Connections can require replacement for a number
of reasons, e.g. their IDs span too great a range for the IDR data
type to represent efficiently, the call ID numbers on that conn would
overflow or the conn got aborted.
This is changed so that there's now a connection bundle object placed
in the tree, keyed on the same parameters. The bundle, however, does
not need to be replaced.
(2) An rxrpc_bundle object can now manage the available channels for a set
of parallel connections. The lock that manages this is moved there
from the rxrpc_connection struct (channel_lock).
(3) There'a a dummy bundle for all incoming connections to share so that
they have a channel_lock too. It might be better to give each
incoming connection its own bundle. This bundle is not needed to
manage which channels incoming calls are made on because that's the
solely at whim of the client.
(4) The restrictions on how many client connections are around are
removed. Instead, a previous patch limits the number of client calls
that can be allocated. Ordinarily, client connections are reaped
after 2 minutes on the idle queue, but when more than a certain number
of connections are in existence, the reaper starts reaping them after
2s of idleness instead to get the numbers back down.
It could also be made such that new call allocations are forced to
wait until the number of outstanding connections subsides.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Impose a maximum on the number of client rxrpc calls that are allowed
simultaneously. This will be in lieu of a maximum number of client
connections as this is easier to administed as, unlike connections, calls
aren't reusable (to be changed in a subsequent patch)..
This doesn't affect the limits on service calls and connections.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Bryce reported that he saw the following with:
0: r6 = r1
1: r1 = 12
2: r0 = *(u16 *)skb[r1]
The xlated sequence was incorrectly clobbering r2 with pointer
value of r6 ...
0: (bf) r6 = r1
1: (b7) r1 = 12
2: (bf) r1 = r6
3: (bf) r2 = r1
4: (85) call bpf_skb_load_helper_16_no_cache#7692160
... and hence call to the load helper never succeeded given the
offset was too high. Fix it by reordering the load of r6 to r1.
Other than that the insn has similar calling convention than BPF
helpers, that is, r0 - r5 are scratch regs, so nothing else
affected after the insn.
Fixes: e0cea7ce988c ("bpf: implement ld_abs/ld_ind in native bpf")
Reported-by: Bryce Kahle <bryce.kahle@datadoghq.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/cace836e4d07bb63b1a53e49c5dfb238a040c298.1599512096.git.daniel@iogearbox.net
|
|
Enables storing userdata for nft_object. Initially this will store an
optional comment but can be extended in the future as needed.
Adds new attribute NFTA_OBJ_USERDATA to nft_object.
Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
syzbot reported a bug in ieee802154_tx() [1]
A similar issue in ieee802154_xmit_worker() is also fixed in this patch.
[1]
BUG: KASAN: use-after-free in ieee802154_tx+0x3d2/0x480 net/mac802154/tx.c:88
Read of size 4 at addr ffff8880251a8c70 by task syz-executor.3/928
CPU: 0 PID: 928 Comm: syz-executor.3 Not tainted 5.9.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x198/0x1fd lib/dump_stack.c:118
print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
ieee802154_tx+0x3d2/0x480 net/mac802154/tx.c:88
ieee802154_subif_start_xmit+0xbe/0xe4 net/mac802154/tx.c:130
__netdev_start_xmit include/linux/netdevice.h:4634 [inline]
netdev_start_xmit include/linux/netdevice.h:4648 [inline]
dev_direct_xmit+0x4e9/0x6e0 net/core/dev.c:4203
packet_snd net/packet/af_packet.c:2989 [inline]
packet_sendmsg+0x2413/0x5290 net/packet/af_packet.c:3014
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:671
____sys_sendmsg+0x6e8/0x810 net/socket.c:2353
___sys_sendmsg+0xf3/0x170 net/socket.c:2407
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2440
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45d5b9
Code: 5d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fc98e749c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000000000002ccc0 RCX: 000000000045d5b9
RDX: 0000000000000000 RSI: 0000000020007780 RDI: 000000000000000b
RBP: 000000000118d020 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118cfec
R13: 00007fff690c720f R14: 00007fc98e74a9c0 R15: 000000000118cfec
Allocated by task 928:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
kasan_set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461
slab_post_alloc_hook mm/slab.h:518 [inline]
slab_alloc_node mm/slab.c:3254 [inline]
kmem_cache_alloc_node+0x136/0x3e0 mm/slab.c:3574
__alloc_skb+0x71/0x550 net/core/skbuff.c:198
alloc_skb include/linux/skbuff.h:1094 [inline]
alloc_skb_with_frags+0x92/0x570 net/core/skbuff.c:5771
sock_alloc_send_pskb+0x72a/0x880 net/core/sock.c:2348
packet_alloc_skb net/packet/af_packet.c:2837 [inline]
packet_snd net/packet/af_packet.c:2932 [inline]
packet_sendmsg+0x19fb/0x5290 net/packet/af_packet.c:3014
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:671
____sys_sendmsg+0x6e8/0x810 net/socket.c:2353
___sys_sendmsg+0xf3/0x170 net/socket.c:2407
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2440
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 928:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
kasan_set_track+0x1c/0x30 mm/kasan/common.c:56
kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355
__kasan_slab_free+0xd8/0x120 mm/kasan/common.c:422
__cache_free mm/slab.c:3418 [inline]
kmem_cache_free.part.0+0x74/0x1e0 mm/slab.c:3693
kfree_skbmem+0xef/0x1b0 net/core/skbuff.c:622
__kfree_skb net/core/skbuff.c:679 [inline]
consume_skb net/core/skbuff.c:838 [inline]
consume_skb+0xcf/0x160 net/core/skbuff.c:832
__dev_kfree_skb_any+0x9c/0xc0 net/core/dev.c:3107
fakelb_hw_xmit+0x20e/0x2a0 drivers/net/ieee802154/fakelb.c:81
drv_xmit_async net/mac802154/driver-ops.h:16 [inline]
ieee802154_tx+0x282/0x480 net/mac802154/tx.c:81
ieee802154_subif_start_xmit+0xbe/0xe4 net/mac802154/tx.c:130
__netdev_start_xmit include/linux/netdevice.h:4634 [inline]
netdev_start_xmit include/linux/netdevice.h:4648 [inline]
dev_direct_xmit+0x4e9/0x6e0 net/core/dev.c:4203
packet_snd net/packet/af_packet.c:2989 [inline]
packet_sendmsg+0x2413/0x5290 net/packet/af_packet.c:3014
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:671
____sys_sendmsg+0x6e8/0x810 net/socket.c:2353
___sys_sendmsg+0xf3/0x170 net/socket.c:2407
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2440
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The buggy address belongs to the object at ffff8880251a8c00
which belongs to the cache skbuff_head_cache of size 224
The buggy address is located 112 bytes inside of
224-byte region [ffff8880251a8c00, ffff8880251a8ce0)
The buggy address belongs to the page:
page:0000000062b6a4f1 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x251a8
flags: 0xfffe0000000200(slab)
raw: 00fffe0000000200 ffffea0000435c88 ffffea00028b6c08 ffff8880a9055d00
raw: 0000000000000000 ffff8880251a80c0 000000010000000c 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8880251a8b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880251a8b80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8880251a8c00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8880251a8c80: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
ffff8880251a8d00: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
Fixes: 409c3b0c5f03 ("mac802154: tx: move stats tx increment")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Alexander Aring <alex.aring@gmail.com>
Cc: Stefan Schmidt <stefan@datenfreihafen.org>
Cc: linux-wpan@vger.kernel.org
Link: https://lore.kernel.org/r/20200908104025.4009085-1-edumazet@google.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
|
|
... instead of using init_user_ns.
Fixes: 96518518cc41 ("netfilter: add nftables")
Tested-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The openvswitch module fails initialization when used in a kernel
without IPv6 enabled. nf_conncount_init() fails because the ct code
unconditionally tries to initialize the netns IPv6 related bit,
regardless of the build option. The change below ignores the IPv6
part if not enabled.
Note that the corresponding _put() function already has this IPv6
configuration check.
Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
conntrack mark based dump filtering may falsely skip entries if a mask
is given: If the mask-based check does not filter out the entry, the
else-if check is always true and compares the mark without considering
the mask. The if/else-if logic seems wrong.
Given that the mask during filter setup is implicitly set to 0xffffffff
if not specified explicitly, the mark filtering flags seem to just
complicate things. Restore the previously used approach by always
matching against a zero mask is no filter mark is given.
Fixes: cb8aa9a3affb ("netfilter: ctnetlink: add kernel side filtering for dump")
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
On x86_64, each notification results in one skbuff allocation which
consumes at least 768 bytes due to the skbuff overhead.
This patch coalesces several notifications into one single skbuff, so
each notification consumes at least ~211 bytes, that ~3.5 times less
memory consumption. As a result, this is reducing the chances to exhaust
the netlink socket receive buffer.
Rule of thumb is that each notification batch only contains netlink
messages whose report flag is the same, nfnetlink_send() requires this
to do appropriate delivery to userspace, either via unicast (echo
mode) or multicast (monitor mode).
The skbuff control buffer is used to annotate the report flag for later
handling at the new coalescing routine.
The batch skbuff notification size is NLMSG_GOODSIZE, using a larger
skbuff would allow for more socket receiver buffer savings (to amortize
the cost of the skbuff even more), however, going over that size might
break userspace applications, so let's be conservative and stick to
NLMSG_GOODSIZE.
Reported-by: Phil Sutter <phil@nwl.cc>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
BPDU_TYPE_TCN is never used after it was introduced.
So better to remove it.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The indexes to the nf_nat_l[34]protos arrays come from userspace. So
check the tuple's family, e.g. l3num, when creating the conntrack in
order to prevent an OOB memory access during setup. Here is an example
kernel panic on 4.14.180 when userspace passes in an index greater than
NFPROTO_NUMPROTO.
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Modules linked in:...
Process poc (pid: 5614, stack limit = 0x00000000a3933121)
CPU: 4 PID: 5614 Comm: poc Tainted: G S W O 4.14.180-g051355490483
Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM
task: 000000002a3dfffe task.stack: 00000000a3933121
pc : __cfi_check_fail+0x1c/0x24
lr : __cfi_check_fail+0x1c/0x24
...
Call trace:
__cfi_check_fail+0x1c/0x24
name_to_dev_t+0x0/0x468
nfnetlink_parse_nat_setup+0x234/0x258
ctnetlink_parse_nat_setup+0x4c/0x228
ctnetlink_new_conntrack+0x590/0xc40
nfnetlink_rcv_msg+0x31c/0x4d4
netlink_rcv_skb+0x100/0x184
nfnetlink_rcv+0xf4/0x180
netlink_unicast+0x360/0x770
netlink_sendmsg+0x5a0/0x6a4
___sys_sendmsg+0x314/0x46c
SyS_sendmsg+0xb4/0x108
el0_svc_naked+0x34/0x38
This crash is not happening since 5.4+, however, ctnetlink still
allows for creating entries with unsupported layer 3 protocol number.
Fixes: c1d10adb4a521 ("[NETFILTER]: Add ctnetlink port for nf_conntrack")
Signed-off-by: Will McVicker <willmcvicker@google.com>
[pablo@netfilter.org: rebased original patch on top of nf.git]
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Commit 72579e14a1d3 ("net: dsa: don't fail to probe if we couldn't set
the MTU") changed, for some reason, the "err && err != -EOPNOTSUPP"
check into a simple "err". This causes the MTU warning to be printed
even for drivers that don't have the MTU operations implemented.
Fix that.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
slave_dev->name is only populated at this stage if it was specified
through a label in the device tree. However that is not mandatory.
When it isn't, the error message looks like this:
[ 5.037057] fsl_enetc 0000:00:00.2 eth2: error -19 setting up slave PHY for eth%d
[ 5.044672] fsl_enetc 0000:00:00.2 eth2: error -19 setting up slave PHY for eth%d
[ 5.052275] fsl_enetc 0000:00:00.2 eth2: error -19 setting up slave PHY for eth%d
[ 5.059877] fsl_enetc 0000:00:00.2 eth2: error -19 setting up slave PHY for eth%d
which is especially confusing since the error gets printed on behalf of
the DSA master (fsl_enetc in this case).
Printing an error message that contains a valid reference to the DSA
port's name is difficult at this point in the initialization stage, so
at least we should print some info that is more reliable, even if less
user-friendly. That may be the driver name and the hardware port index.
After this change, the error is printed as:
[ 6.051587] mscc_felix 0000:00:00.5: error -19 setting up PHY for tree 0, switch 0, port 0
[ 6.061192] mscc_felix 0000:00:00.5: error -19 setting up PHY for tree 0, switch 0, port 1
[ 6.070765] mscc_felix 0000:00:00.5: error -19 setting up PHY for tree 0, switch 0, port 2
[ 6.080324] mscc_felix 0000:00:00.5: error -19 setting up PHY for tree 0, switch 0, port 3
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
rxrpc_min_rtt_wlen is never used after it was introduced.
So better to remove it.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This reverts commit 8d7e5dee972f1cde2ba96c621f1541fa36e7d4f4.
To protect netns id, the nsid_lock is used when netns id is being
allocated and removed by peernet2id_alloc() and unhash_nsid().
The nsid_lock can be used in BH context but only spin_lock() is used
in this code.
Using spin_lock() instead of spin_lock_bh() can result in a deadlock in
the following scenario reported by the lockdep.
In order to avoid a deadlock, the spin_lock_bh() should be used instead
of spin_lock() to acquire nsid_lock.
Test commands:
ip netns del nst
ip netns add nst
ip link add veth1 type veth peer name veth2
ip link set veth1 netns nst
ip netns exec nst ip link add name br1 type bridge vlan_filtering 1
ip netns exec nst ip link set dev br1 up
ip netns exec nst ip link set dev veth1 master br1
ip netns exec nst ip link set dev veth1 up
ip netns exec nst ip link add macvlan0 link br1 up type macvlan
Splat looks like:
[ 33.615860][ T607] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
[ 33.617194][ T607] 5.9.0-rc1+ #665 Not tainted
[ ... ]
[ 33.670615][ T607] Chain exists of:
[ 33.670615][ T607] &mc->mca_lock --> &bridge_netdev_addr_lock_key --> &net->nsid_lock
[ 33.670615][ T607]
[ 33.673118][ T607] Possible interrupt unsafe locking scenario:
[ 33.673118][ T607]
[ 33.674599][ T607] CPU0 CPU1
[ 33.675557][ T607] ---- ----
[ 33.676516][ T607] lock(&net->nsid_lock);
[ 33.677306][ T607] local_irq_disable();
[ 33.678517][ T607] lock(&mc->mca_lock);
[ 33.679725][ T607] lock(&bridge_netdev_addr_lock_key);
[ 33.681166][ T607] <Interrupt>
[ 33.681791][ T607] lock(&mc->mca_lock);
[ 33.682579][ T607]
[ 33.682579][ T607] *** DEADLOCK ***
[ ... ]
[ 33.922046][ T607] stack backtrace:
[ 33.922999][ T607] CPU: 3 PID: 607 Comm: ip Not tainted 5.9.0-rc1+ #665
[ 33.924099][ T607] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 33.925714][ T607] Call Trace:
[ 33.926238][ T607] dump_stack+0x78/0xab
[ 33.926905][ T607] check_irq_usage+0x70b/0x720
[ 33.927708][ T607] ? iterate_chain_key+0x60/0x60
[ 33.928507][ T607] ? check_path+0x22/0x40
[ 33.929201][ T607] ? check_noncircular+0xcf/0x180
[ 33.930024][ T607] ? __lock_acquire+0x1952/0x1f20
[ 33.930860][ T607] __lock_acquire+0x1952/0x1f20
[ 33.931667][ T607] lock_acquire+0xaf/0x3a0
[ 33.932366][ T607] ? peernet2id_alloc+0x3a/0x170
[ 33.933147][ T607] ? br_port_fill_attrs+0x54c/0x6b0 [bridge]
[ 33.934140][ T607] ? br_port_fill_attrs+0x5de/0x6b0 [bridge]
[ 33.935113][ T607] ? kvm_sched_clock_read+0x14/0x30
[ 33.935974][ T607] _raw_spin_lock+0x30/0x70
[ 33.936728][ T607] ? peernet2id_alloc+0x3a/0x170
[ 33.937523][ T607] peernet2id_alloc+0x3a/0x170
[ 33.938313][ T607] rtnl_fill_ifinfo+0xb5e/0x1400
[ 33.939091][ T607] rtmsg_ifinfo_build_skb+0x8a/0xf0
[ 33.939953][ T607] rtmsg_ifinfo_event.part.39+0x17/0x50
[ 33.940863][ T607] rtmsg_ifinfo+0x1f/0x30
[ 33.941571][ T607] __dev_notify_flags+0xa5/0xf0
[ 33.942376][ T607] ? __irq_work_queue_local+0x49/0x50
[ 33.943249][ T607] ? irq_work_queue+0x1d/0x30
[ 33.943993][ T607] ? __dev_set_promiscuity+0x7b/0x1a0
[ 33.944878][ T607] __dev_set_promiscuity+0x7b/0x1a0
[ 33.945758][ T607] dev_set_promiscuity+0x1e/0x50
[ 33.946582][ T607] br_port_set_promisc+0x1f/0x40 [bridge]
[ 33.947487][ T607] br_manage_promisc+0x8b/0xe0 [bridge]
[ 33.948388][ T607] __dev_set_promiscuity+0x123/0x1a0
[ 33.949244][ T607] __dev_set_rx_mode+0x68/0x90
[ 33.950021][ T607] dev_uc_add+0x50/0x60
[ 33.950720][ T607] macvlan_open+0x18e/0x1f0 [macvlan]
[ 33.951601][ T607] __dev_open+0xd6/0x170
[ 33.952269][ T607] __dev_change_flags+0x181/0x1d0
[ 33.953056][ T607] rtnl_configure_link+0x2f/0xa0
[ 33.953884][ T607] __rtnl_newlink+0x6b9/0x8e0
[ 33.954665][ T607] ? __lock_acquire+0x95d/0x1f20
[ 33.955450][ T607] ? lock_acquire+0xaf/0x3a0
[ 33.956193][ T607] ? is_bpf_text_address+0x5/0xe0
[ 33.956999][ T607] rtnl_newlink+0x47/0x70
Acked-by: Guillaume Nault <gnault@redhat.com>
Fixes: 8d7e5dee972f ("netns: don't disable BHs when locking "nsid_lock"")
Reported-by: syzbot+3f960c64a104eaa2c813@syzkaller.appspotmail.com
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since each entry type has timers that can be running simultaneously we need
to make sure that entries are not freed before their timers have finished.
In order to do that generalize the src gc work to mcast gc work and use a
callback to free the entries (mdb, port group or src).
v3: add IPv6 support
v2: force mcast gc on port del to make sure all port group timers have
finished before freeing the bridge port
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When an IGMPv3/MLDv2 query is received and we're operating in such mode
then we need to avoid updating group timers if the suppress flag is set.
Also we should update only timers for groups in exclude mode.
v3: add IPv6/MLDv2 support
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We already have all necessary helpers, so process IGMPV3/MLDv2
BLOCK_OLD_SOURCES as per the RFCs.
v3: add IPv6/MLDv2 support
v2: directly do flag bit operations
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In order to process IGMPV3/MLDv2 CHANGE_TO_INCLUDE/EXCLUDE report types we
need new helpers which allow us to mark entries based on their timer
state and to query only marked entries.
v3: add IPv6/MLDv2 support, fix other_query checks
v2: directly do flag bit operations
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In order to process IGMPV3/MLDv2_MODE_IS_INCLUDE/EXCLUDE report types we
need some new helpers which allow us to set/clear flags for all current
entries and later delete marked entries after the report sources have been
processed.
v3: add IPv6/MLDv2 support
v2: drop flag helpers and directly do flag bit operations
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds handling for the ALLOW_NEW_SOURCES IGMPv3/MLDv2 report
types and limits them only when multicast_igmp_version == 3 or
multicast_mld_version == 2 respectively. Now that IGMPv3/MLDv2 handling
functions will be managing timers we need to delay their activation, thus
a new argument is added which controls if the timer should be updated.
We also disable host IGMPv3/MLDv2 handling as it's not yet implemented and
could cause inconsistent group state, the host can only join a group as
EXCLUDE {} or leave it.
v4: rename update_timer to igmpv2_mldv1 and use the passed value from
br_multicast_add_group's callers
v3: Add IPv6/MLDv2 support
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If an expired port group is in EXCLUDE mode, then we have to turn it
into INCLUDE mode, remove all srcs with zero timer and finally remove
the group itself if there are no more srcs with an active timer.
For IGMPv2 use there would be no sources, so this will reduce to just
removing the group as before.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We have to use mdb and port entries when sending mdb notifications in
order to fill in all group attributes properly. Before this change we
would've used a fake br_mdb_entry struct to fill in only partial
information about the mdb. Now we can also reuse the mdb dump fill
function and thus have only a single central place which fills the mdb
attributes.
v3: add IPv6 support
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This change is in preparation for using the mdb port group entries when
sending a notification, so their full state and additional attributes can
be filled in.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We need to be able to retransmit group-specific and group-and-source
specific queries. The new timer takes care of those.
v3: add IPv6 support
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Allows br_multicast_alloc_query to build queries with the port group's
source lists and sends a query for sources over and under lmqt when
necessary as per RFCs 3376 and 3810 with the suppress flag set
appropriately.
v3: add IPv6 support
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Support per port group src list (address and timer) and filter mode
dumping. Protected by either multicast_lock or rcu.
v3: add IPv6 support
v2: require RCU or multicast_lock to traverse src groups
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Initial functions for group source lists which are needed for IGMPv3
and MLDv2 include/exclude lists. Both IPv4 and IPv6 sources are supported.
User-added mdb entries are created with exclude filter mode, we can
extend that later to allow user-supplied mode. When group src entries
are deleted, they're freed from a workqueue to make sure their timers
are not still running. Source entries are protected by the multicast_lock
and rcu. The number of src groups per port group is limited to 32.
v4: use the new port group del function directly
add igmpv2/mldv1 bool to denote if the entry was added in those
modes, it will later replace the old update_timer bool
v3: add IPv6 support
v2: allow src groups to be traversed under rcu
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In order to avoid future errors and reduce code duplication we should
factor out the port group del sequence. This allows us to have one
function which takes care of all details when removing a port group.
v4: set pg's fast leave flag when deleting due to fast leave
move the patch before adding source lists
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Before this patch we'd need 2 cache lines for fast-path, now all used
fields are in the first cache line.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In the commit fdeba99b1e58
("tipc: fix use-after-free in tipc_bcast_get_mode"), we're trying
to make sure the tipc_net_finalize_work work item finished if it
enqueued. But calling flush_scheduled_work() is not just affecting
above work item but either any scheduled work. This has turned out
to be overkill and caused to deadlock as syzbot reported:
======================================================
WARNING: possible circular locking dependency detected
5.9.0-rc2-next-20200828-syzkaller #0 Not tainted
------------------------------------------------------
kworker/u4:6/349 is trying to acquire lock:
ffff8880aa063d38 ((wq_completion)events){+.+.}-{0:0}, at: flush_workqueue+0xe1/0x13e0 kernel/workqueue.c:2777
but task is already holding lock:
ffffffff8a879430 (pernet_ops_rwsem){++++}-{3:3}, at: cleanup_net+0x9b/0xb10 net/core/net_namespace.c:565
[...]
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(pernet_ops_rwsem);
lock(&sb->s_type->i_mutex_key#13);
lock(pernet_ops_rwsem);
lock((wq_completion)events);
*** DEADLOCK ***
[...]
v1:
To fix the original issue, we replace above calling by introducing
a bit flag. When a namespace cleaned-up, bit flag is set to zero and:
- tipc_net_finalize functionial just does return immediately.
- tipc_net_finalize_work does not enqueue into the scheduled work queue.
v2:
Use cancel_work_sync() helper to make sure ONLY the
tipc_net_finalize_work() stopped before releasing bcbase object.
Reported-by: syzbot+d5aa7e0385f6a5d0f4fd@syzkaller.appspotmail.com
Fixes: fdeba99b1e58 ("tipc: fix use-after-free in tipc_bcast_get_mode")
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When we clone state only add_time was cloned. It missed values like
bytes, packets. Now clone the all members of the structure.
v1->v3:
- use memcpy to copy the entire structure
Fixes: 80c9abaabf42 ("[XFRM]: Extension for dynamic update of endpoint address(es)")
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
XFRMA_SEC_CTX was not cloned from the old to the new.
Migrate this attribute during XFRMA_MSG_MIGRATE
v1->v2:
- return -ENOMEM on error
v2->v3:
- fix return type to int
Fixes: 80c9abaabf42 ("[XFRM]: Extension for dynamic update of endpoint address(es)")
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|