summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2021-01-19net: core: devlink: use right genl user_ptr when handling port param get/setOleksandr Mazur
Fix incorrect user_ptr dereferencing when handling port param get/set: idx [0] stores the 'struct devlink' pointer; idx [1] stores the 'struct devlink_port' pointer; Fixes: 637989b5d77e ("devlink: Always use user_ptr[0] for devlink and simplify post_doit") CC: Parav Pandit <parav@mellanox.com> Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu> Link: https://lore.kernel.org/r/20210119085333.16833-1-vadym.kochan@plvision.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net/tls: Except bond interface from some TLS checksTariq Toukan
In the tls_dev_event handler, ignore tlsdev_ops requirement for bond interfaces, they do not exist as the interaction is done directly with the lower device. Also, make the validate function pass when it's called with the upper bond interface. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net/tls: Device offload to use lowest netdevice in chainTariq Toukan
Do not call the tls_dev_ops of upper devices. Instead, ask them for the proper lowest device and communicate with it directly. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net: netdevice: Add operation ndo_sk_get_lower_devTariq Toukan
ndo_sk_get_lower_dev returns the lower netdev that corresponds to a given socket. Additionally, we implement a helper netdev_sk_get_lowest_dev() to get the lowest one in chain. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net_sched: fix RTNL deadlock again caused by request_module()Cong Wang
tcf_action_init_1() loads tc action modules automatically with request_module() after parsing the tc action names, and it drops RTNL lock and re-holds it before and after request_module(). This causes a lot of troubles, as discovered by syzbot, because we can be in the middle of batch initializations when we create an array of tc actions. One of the problem is deadlock: CPU 0 CPU 1 rtnl_lock(); for (...) { tcf_action_init_1(); -> rtnl_unlock(); -> request_module(); rtnl_lock(); for (...) { tcf_action_init_1(); -> tcf_idr_check_alloc(); // Insert one action into idr, // but it is not committed until // tcf_idr_insert_many(), then drop // the RTNL lock in the _next_ // iteration -> rtnl_unlock(); -> rtnl_lock(); -> a_o->init(); -> tcf_idr_check_alloc(); // Now waiting for the same index // to be committed -> request_module(); -> rtnl_lock() // Now waiting for RTNL lock } rtnl_unlock(); } rtnl_unlock(); This is not easy to solve, we can move the request_module() before this loop and pre-load all the modules we need for this netlink message and then do the rest initializations. So the loop breaks down to two now: for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) { struct tc_action_ops *a_o; a_o = tc_action_load_ops(name, tb[i]...); ops[i - 1] = a_o; } for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) { act = tcf_action_init_1(ops[i - 1]...); } Although this looks serious, it only has been reported by syzbot, so it seems hard to trigger this by humans. And given the size of this patch, I'd suggest to make it to net-next and not to backport to stable. This patch has been tested by syzbot and tested with tdc.py by me. Fixes: 0fedc63fadf0 ("net_sched: commit action insertions together") Reported-and-tested-by: syzbot+82752bc5331601cf4899@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+b3b63b6bff456bd95294@syzkaller.appspotmail.com Reported-by: syzbot+ba67b12b1ca729912834@syzkaller.appspotmail.com Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Tested-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20210117005657.14810-1-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18tcp: fix TCP_USER_TIMEOUT with zero windowEnke Chen
The TCP session does not terminate with TCP_USER_TIMEOUT when data remain untransmitted due to zero window. The number of unanswered zero-window probes (tcp_probes_out) is reset to zero with incoming acks irrespective of the window size, as described in tcp_probe_timer(): RFC 1122 4.2.2.17 requires the sender to stay open indefinitely as long as the receiver continues to respond probes. We support this by default and reset icsk_probes_out with incoming ACKs. This counter, however, is the wrong one to be used in calculating the duration that the window remains closed and data remain untransmitted. Thanks to Jonathan Maxwell <jmaxwell37@gmail.com> for diagnosing the actual issue. In this patch a new timestamp is introduced for the socket in order to track the elapsed time for the zero-window probes that have not been answered with any non-zero window ack. Fixes: 9721e709fa68 ("tcp: simplify window probe aborting on USER_TIMEOUT") Reported-by: William McCall <william.mccall@gmail.com> Co-developed-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Enke Chen <enchen@paloaltonetworks.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210115223058.GA39267@localhost.localdomain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18ipv6: set multicast flag on the multicast routeMatteo Croce
The multicast route ff00::/8 is created with type RTN_UNICAST: $ ip -6 -d route unicast ::1 dev lo proto kernel scope global metric 256 pref medium unicast fe80::/64 dev eth0 proto kernel scope global metric 256 pref medium unicast ff00::/8 dev eth0 proto kernel scope global metric 256 pref medium Set the type to RTN_MULTICAST which is more appropriate. Fixes: e8478e80e5a7 ("net/ipv6: Save route type in rt6_info") Signed-off-by: Matteo Croce <mcroce@microsoft.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18ipv6: create multicast route with RTPROT_KERNELMatteo Croce
The ff00::/8 multicast route is created without specifying the fc_protocol field, so the default RTPROT_BOOT value is used: $ ip -6 -d route unicast ::1 dev lo proto kernel scope global metric 256 pref medium unicast fe80::/64 dev eth0 proto kernel scope global metric 256 pref medium unicast ff00::/8 dev eth0 proto boot scope global metric 256 pref medium As the documentation says, this value identifies routes installed during boot, but the route is created when interface is set up. Change the value to RTPROT_KERNEL which is a better value. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Matteo Croce <mcroce@microsoft.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18net: bridge: check vlan with eth_type_vlan() methodMenglong Dong
Replace some checks for ETH_P_8021Q and ETH_P_8021AD with eth_type_vlan(). Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20210117080950.122761-1-dong.menglong@zte.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-18Merge tag 'mac80211-for-net-2021-01-18.2' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Various fixes: * kernel-doc parsing fixes * incorrect debugfs string checks * locking fix in regulatory * some encryption-related fixes * tag 'mac80211-for-net-2021-01-18.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211: mac80211: check if atf has been disabled in __ieee80211_schedule_txq mac80211: do not drop tx nulldata packets on encrypted links mac80211: fix encryption key selection for 802.3 xmit mac80211: fix fast-rx encryption check mac80211: fix incorrect strlen of .write in debugfs cfg80211: fix a kerneldoc markup cfg80211: Save the regulatory domain with a lock cfg80211/mac80211: fix kernel-doc for SAR APIs ==================== Link: https://lore.kernel.org/r/20210118204750.7243-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-16sctp: remove the NETIF_F_SG flag before calling skb_segmentXin Long
It makes more sense to clear NETIF_F_SG instead of set it when calling skb_segment() in sctp_gso_segment(), since SCTP GSO is using head_skb's fraglist, of which all frags are linear skbs. This will make SCTP GSO code more understandable. Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-16net: move the hsize check to the else block in skb_segmentXin Long
After commit 89319d3801d1 ("net: Add frag_list support to skb_segment"), it goes to process frag_list when !hsize in skb_segment(). However, when using skb frag_list, sg normally should not be set. In this case, hsize will be set with len right before !hsize check, then it won't go to frag_list processing code. So the right thing to do is move the hsize check to the else block, so that it won't affect the !hsize check for frag_list processing. v1->v2: - change to do "hsize <= 0" check instead of "!hsize", and also move "hsize < 0" into else block, to save some cycles, as Alex suggested. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-16skbuff: back tiny skbs with kmalloc() in __netdev_alloc_skb() tooAlexander Lobakin
Commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs") ensured that skbs with data size lower than 1025 bytes will be kmalloc'ed to avoid excessive page cache fragmentation and memory consumption. However, the fix adressed only __napi_alloc_skb() (primarily for virtio_net and napi_get_frags()), but the issue can still be achieved through __netdev_alloc_skb(), which is still used by several drivers. Drivers often allocate a tiny skb for headers and place the rest of the frame to frags (so-called copybreak). Mirror the condition to __netdev_alloc_skb() to handle this case too. Since v1 [0]: - fix "Fixes:" tag; - refine commit message (mention copybreak usecase). [0] https://lore.kernel.org/netdev/20210114235423.232737-1-alobakin@pm.me Fixes: a1c7fff7e18f ("net: netdev_alloc_skb() use build_skb()") Signed-off-by: Alexander Lobakin <alobakin@pm.me> Link: https://lore.kernel.org/r/20210115150354.85967-1-alobakin@pm.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-16netfilter: nft_dynset: dump expressions when set definition contains no ↵Pablo Neira Ayuso
expressions If the set definition provides no stateful expressions, then include the stateful expression in the ruleset listing. Without this fix, the dynset rule listing shows the stateful expressions provided by the set definition. Fixes: 65038428b2c6 ("netfilter: nf_tables: allow to specify stateful expression in set definition") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-01-16netfilter: nft_dynset: add timeout extension to templatePablo Neira Ayuso
Otherwise, the newly create element shows no timeout when listing the ruleset. If the set definition does not specify a default timeout, then the set element only shows the expiration time, but not the timeout. This is a problem when restoring a stateful ruleset listing since it skips the timeout policy entirely. Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-01-16netfilter: nft_dynset: honor stateful expressions in set definitionPablo Neira Ayuso
If the set definition contains stateful expressions, allocate them for the newly added entries from the packet path. Fixes: 65038428b2c6 ("netfilter: nf_tables: allow to specify stateful expression in set definition") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-01-15tcp_cubic: use memset and offsetof initYejune Deng
In bictcp_reset(), use memset and offsetof instead of = 0. Signed-off-by: Yejune Deng <yejune.deng@gmail.com> Link: https://lore.kernel.org/r/1610597696-128610-1-git-send-email-yejune.deng@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15nfc: netlink: use &w->w in nfc_genl_rcv_nl_eventGeliang Tang
Use the struct member w of the struct urelease_work directly instead of casting it. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Link: https://lore.kernel.org/r/f0ed86d6d54ac0834bd2e161d172bf7bb5647cf7.1610683862.git.geliangtang@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15net: dsa: add ops for devlink-sbVladimir Oltean
Switches that care about QoS might have hardware support for reserving buffer pools for individual ports or traffic classes, and configuring their sizes and thresholds. Through devlink-sb (shared buffers), this is all configurable, as well as their occupancy being viewable. Add the plumbing in DSA for these operations. Individual drivers still need to call devlink_sb_register() with the shared buffers they want to expose. A helper was not created in DSA for this purpose (unlike, say, dsa_devlink_params_register), since in my opinion it does not bring any benefit over plainly calling devlink_sb_register() directly. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15net_sched: avoid shift-out-of-bounds in tcindex_set_parms()Eric Dumazet
tc_index being 16bit wide, we need to check that TCA_TCINDEX_SHIFT attribute is not silly. UBSAN: shift-out-of-bounds in net/sched/cls_tcindex.c:260:29 shift exponent 255 is too large for 32-bit type 'int' CPU: 0 PID: 8516 Comm: syz-executor228 Not tainted 5.10.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395 valid_perfect_hash net/sched/cls_tcindex.c:260 [inline] tcindex_set_parms.cold+0x1b/0x215 net/sched/cls_tcindex.c:425 tcindex_change+0x232/0x340 net/sched/cls_tcindex.c:546 tc_new_tfilter+0x13fb/0x21b0 net/sched/cls_api.c:2127 rtnetlink_rcv_msg+0x8b6/0xb80 net/core/rtnetlink.c:5555 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x907/0xe40 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2336 ___sys_sendmsg+0xf3/0x170 net/socket.c:2390 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2423 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20210114185229.1742255-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15net_sched: gen_estimator: support large ewma logEric Dumazet
syzbot report reminded us that very big ewma_log were supported in the past, even if they made litle sense. tc qdisc replace dev xxx root est 1sec 131072sec ... While fixing the bug, also add boundary checks for ewma_log, in line with range supported by iproute2. UBSAN: shift-out-of-bounds in net/core/gen_estimator.c:83:38 shift exponent -1 is negative CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395 est_timer.cold+0xbb/0x12d net/core/gen_estimator.c:83 call_timer_fn+0x1a5/0x710 kernel/time/timer.c:1417 expire_timers kernel/time/timer.c:1462 [inline] __run_timers.part.0+0x692/0xa80 kernel/time/timer.c:1731 __run_timers kernel/time/timer.c:1712 [inline] run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1744 __do_softirq+0x2bc/0xa77 kernel/softirq.c:343 asm_call_irq_on_stack+0xf/0x20 </IRQ> __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline] do_softirq_own_stack+0xaa/0xd0 arch/x86/kernel/irq_64.c:77 invoke_softirq kernel/softirq.c:226 [inline] __irq_exit_rcu+0x17f/0x200 kernel/softirq.c:420 irq_exit_rcu+0x5/0x20 kernel/softirq.c:432 sysvec_apic_timer_interrupt+0x4d/0x100 arch/x86/kernel/apic/apic.c:1096 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:628 RIP: 0010:native_save_fl arch/x86/include/asm/irqflags.h:29 [inline] RIP: 0010:arch_local_save_flags arch/x86/include/asm/irqflags.h:79 [inline] RIP: 0010:arch_irqs_disabled arch/x86/include/asm/irqflags.h:169 [inline] RIP: 0010:acpi_safe_halt drivers/acpi/processor_idle.c:111 [inline] RIP: 0010:acpi_idle_do_entry+0x1c9/0x250 drivers/acpi/processor_idle.c:516 Fixes: 1c0d32fde5bd ("net_sched: gen_estimator: complete rewrite of rate estimators") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20210114181929.1717985-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15net_sched: reject silly cell_log in qdisc_get_rtab()Eric Dumazet
iproute2 probably never goes beyond 8 for the cell exponent, but stick to the max shift exponent for signed 32bit. UBSAN reported: UBSAN: shift-out-of-bounds in net/sched/sch_api.c:389:22 shift exponent 130 is too large for 32-bit type 'int' CPU: 1 PID: 8450 Comm: syz-executor586 Not tainted 5.11.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x183/0x22e lib/dump_stack.c:120 ubsan_epilogue lib/ubsan.c:148 [inline] __ubsan_handle_shift_out_of_bounds+0x432/0x4d0 lib/ubsan.c:395 __detect_linklayer+0x2a9/0x330 net/sched/sch_api.c:389 qdisc_get_rtab+0x2b5/0x410 net/sched/sch_api.c:435 cbq_init+0x28f/0x12c0 net/sched/sch_cbq.c:1180 qdisc_create+0x801/0x1470 net/sched/sch_api.c:1246 tc_modify_qdisc+0x9e3/0x1fc0 net/sched/sch_api.c:1662 rtnetlink_rcv_msg+0xb1d/0xe60 net/core/rtnetlink.c:5564 netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2494 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x7de/0x9b0 net/netlink/af_netlink.c:1330 netlink_sendmsg+0xaa6/0xe90 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg net/socket.c:672 [inline] ____sys_sendmsg+0x5a2/0x900 net/socket.c:2345 ___sys_sendmsg net/socket.c:2399 [inline] __sys_sendmsg+0x319/0x400 net/socket.c:2432 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/20210114160637.1660597-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf-next 2021-01-16 1) Extend atomic operations to the BPF instruction set along with x86-64 JIT support, that is, atomic{,64}_{xchg,cmpxchg,fetch_{add,and,or,xor}}, from Brendan Jackman. 2) Add support for using kernel module global variables (__ksym externs in BPF programs) retrieved via module's BTF, from Andrii Nakryiko. 3) Generalize BPF stackmap's buildid retrieval and add support to have buildid stored in mmap2 event for perf, from Jiri Olsa. 4) Various fixes for cross-building BPF sefltests out-of-tree which then will unblock wider automated testing on ARM hardware, from Jean-Philippe Brucker. 5) Allow to retrieve SOL_SOCKET opts from sock_addr progs, from Daniel Borkmann. 6) Clean up driver's XDP buffer init and split into two helpers to init per- descriptor and non-changing fields during processing, from Lorenzo Bianconi. 7) Minor misc improvements to libbpf & bpftool, from Ian Rogers. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (41 commits) perf: Add build id data in mmap2 event bpf: Add size arg to build_id_parse function bpf: Move stack_map_get_build_id into lib bpf: Document new atomic instructions bpf: Add tests for new BPF atomic operations bpf: Add bitwise atomic instructions bpf: Pull out a macro for interpreting atomic ALU operations bpf: Add instructions for atomic_[cmp]xchg bpf: Add BPF_FETCH field / create atomic_fetch_add instruction bpf: Move BPF_STX reserved field check into BPF_STX verifier code bpf: Rename BPF_XADD and prepare to encode other atomics in .imm bpf: x86: Factor out a lookup table for some ALU opcodes bpf: x86: Factor out emission of REX byte bpf: x86: Factor out emission of ModR/M for *(reg + off) tools/bpftool: Add -Wall when building BPF programs bpf, libbpf: Avoid unused function warning on bpf_tail_call_static selftests/bpf: Install btf_dump test cases selftests/bpf: Fix installation of urandom_read selftests/bpf: Move generated test files to $(TEST_GEN_FILES) selftests/bpf: Fix out-of-tree build ... ==================== Link: https://lore.kernel.org/r/20210116012922.17823-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15neighbor: remove definition of DEBUGTom Rix
Defining DEBUG should only be done in development. So remove DEBUG. Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20210114212917.48174-1-trix@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15net: dsa: set configure_vlan_while_not_filtering to true by defaultVladimir Oltean
As explained in commit 54a0ed0df496 ("net: dsa: provide an option for drivers to always receive bridge VLANs"), DSA has historically been skipping VLAN switchdev operations when the bridge wasn't in vlan_filtering mode, but the reason why it was doing that has never been clear. So the configure_vlan_while_not_filtering option is there merely to preserve functionality for existing drivers. It isn't some behavior that drivers should opt into. Ideally, when all drivers leave this flag set, we can delete the dsa_port_skip_vlan_configuration() function. New drivers always seem to omit setting this flag, for some reason. So let's reverse the logic: the DSA core sets it by default to true before the .setup() callback, and legacy drivers can turn it off. This way, new drivers get the new behavior by default, unless they explicitly set the flag to false, which is more obvious during review. Remove the assignment from drivers which were setting it to true, and add the assignment to false for the drivers that didn't previously have it. This way, it should be easier to see how many we have left. The following drivers: lan9303, mv88e6060 were skipped from setting this flag to false, because they didn't have any VLAN offload ops in the first place. The Broadcom Starfighter 2 driver calls the common b53_switch_alloc and therefore also inherits the configure_vlan_while_not_filtering=true behavior. Also, print a message through netlink extack every time a VLAN has been skipped. This is mildly annoying on purpose, so that (a) it is at least clear that VLANs are being skipped - the legacy behavior in itself is confusing, and the extack should be much more difficult to miss, unlike kernel logs - and (b) people have one more incentive to convert to the new behavior. No behavior change except for the added prints is intended at this time. $ ip link add br0 type bridge vlan_filtering 0 $ ip link set sw0p2 master br0 [ 60.315148] br0: port 1(sw0p2) entered blocking state [ 60.320350] br0: port 1(sw0p2) entered disabled state [ 60.327839] device sw0p2 entered promiscuous mode [ 60.334905] br0: port 1(sw0p2) entered blocking state [ 60.340142] br0: port 1(sw0p2) entered forwarding state Warning: dsa_core: skipping configuration of VLAN. # This was the pvid $ bridge vlan add dev sw0p2 vid 100 Warning: dsa_core: skipping configuration of VLAN. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20210115231919.43834-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf 2021-01-16 1) Fix a double bpf_prog_put() for BPF_PROG_{TYPE_EXT,TYPE_TRACING} types in link creation's error path causing a refcount underflow, from Jiri Olsa. 2) Fix BTF validation errors for the case where kernel modules don't declare any new types and end up with an empty BTF, from Andrii Nakryiko. 3) Fix BPF local storage helpers to first check their {task,inode} owners for being NULL before access, from KP Singh. 4) Fix a memory leak in BPF setsockopt handling for the case where optlen is zero and thus temporary optval buffer should be freed, from Stanislav Fomichev. 5) Fix a syzbot memory allocation splat in BPF_PROG_TEST_RUN infra for raw_tracepoint caused by too big ctx_size_in, from Song Liu. 6) Fix LLVM code generation issues with verifier where PTR_TO_MEM{,_OR_NULL} registers were spilled to stack but not recognized, from Gilad Reti. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: MAINTAINERS: Update my email address selftests/bpf: Add verifier test for PTR_TO_MEM spill bpf: Support PTR_TO_MEM{,_OR_NULL} register spilling bpf: Reject too big ctx_size_in for raw_tp test run libbpf: Allow loading empty BTFs bpf: Allow empty module BTFs bpf: Don't leak memory in bpf getsockopt when optlen == 0 bpf: Update local storage test to check handling of null ptrs bpf: Fix typo in bpf_inode_storage.c bpf: Local storage helpers should check nullness of owner ptr passed bpf: Prevent double bpf_prog_put call from bpf_tracing_prog_attach ==================== Link: https://lore.kernel.org/r/20210116002025.15706-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15cls_flower: call nla_ok() before nla_next()Cong Wang
fl_set_enc_opt() simply checks if there are still bytes left to parse, but this is not sufficent as syzbot seems to be able to generate malformatted netlink messages. nla_ok() is more strict so should be used to validate the next nlattr here. And nla_validate_nested_deprecated() has less strict check too, it is probably too late to switch to the strict version, but we can just call nla_ok() too after it. Reported-and-tested-by: syzbot+2624e3778b18fc497c92@syzkaller.appspotmail.com Fixes: 0a6e77784f49 ("net/sched: allow flower to match tunnel options") Fixes: 79b1011cb33d ("net: sched: allow flower to match erspan options") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Xin Long <lucien.xin@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/20210115185024.72298-1-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15dsa: add support for Arrow XRS700x tag trailerGeorge McCollister
Add support for Arrow SpeedChips XRS700x single byte tag trailer. This is modeled on tag_trailer.c which works in a similar way. Signed-off-by: George McCollister <george.mccollister@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-14net: flow_dissector: Parse PTP L2 packet headerEran Ben Elisha
Add support for parsing PTP L2 packet header. Such packet consists of an L2 header (with ethertype of ETH_P_1588), PTP header, body and an optional suffix. Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-14net: vlan: Add parse protocol header opsEran Ben Elisha
Add parse protocol header ops for vlan device. Before this patch, vlan tagged packet transmitted by af_packet had skb->protocol unset. Some kernel methods (like __skb_flow_dissect()) rely on this missing information for its packet processing. Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-14net: dsa: tag_dsa: Support reception of packets from LAG devicesTobias Waldekranz
Packets ingressing on a LAG that egress on the CPU port, which are not classified as management, will have a FORWARD tag that does not contain the normal source device/port tuple. Instead the trunk bit will be set, and the port field holds the LAG id. Since the exact source port information is not available in the tag, frames are injected directly on the LAG interface and thus do never pass through any DSA port interface on ingress. Management frames (TO_CPU) are not affected and will pass through the DSA port interface as usual. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-14net: dsa: Link aggregation supportTobias Waldekranz
Monitor the following events and notify the driver when: - A DSA port joins/leaves a LAG. - A LAG, made up of DSA ports, joins/leaves a bridge. - A DSA port in a LAG is enabled/disabled (enabled meaning "distributing" in 802.3ad LACP terms). When a LAG joins a bridge, the DSA subsystem will treat that as each individual port joining the bridge. The driver may look at the port's LAG device pointer to see if it is associated with any LAG, if that is required. This is analogue to how switchdev events are replicated out to all lower devices when reaching e.g. a LAG. Drivers can optionally request that DSA maintain a linear mapping from a LAG ID to the corresponding netdev by setting ds->num_lag_ids to the desired size. In the event that the hardware is not capable of offloading a particular LAG for any reason (the typical case being use of exotic modes like broadcast), DSA will take a hands-off approach, allowing the LAG to be formed as a pure software construct. This is reported back through the extended ACK, but is otherwise transparent to the user. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-14net: dsa: Don't offload port attributes on standalone portsTobias Waldekranz
In a situation where a standalone port is indirectly attached to a bridge (e.g. via a LAG) which is not offloaded, do not offload any port attributes either. The port should behave as a standard NIC. Previously, on mv88e6xxx, this meant that in the following setup: br0 / team0 / \ swp0 swp1 If vlan filtering was enabled on br0, swp0's and swp1's QMode was set to "secure". This caused all untagged packets to be dropped, as their default VID (0) was not loaded into the VTU. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-14net: openvswitch: add log message for error caseEelco Chaudron
As requested by upstream OVS, added some error messages in the validate_and_copy_dec_ttl function. Includes a small cleanup, which removes an unnecessary parameter from the dec_ttl_exception_handler() function. Reported-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Link: https://lore.kernel.org/r/161054576573.26637.18396634650212670580.stgit@ebuild Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-14Merge tag 'net-5.11-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "We have a few fixes for long standing issues, in particular Eric's fix to not underestimate the skb sizes, and my fix for brokenness of register_netdevice() error path. They may uncover other bugs so we will keep an eye on them. Also included are Willem's fixes for kmap(_atomic). Looking at the "current release" fixes, it seems we are about one rc behind a normal cycle. We've previously seen an uptick of "people had run their test suites" / "humans actually tried to use new features" fixes between rc2 and rc3. Summary: Current release - regressions: - fix feature enforcement to allow NETIF_F_HW_TLS_TX if IP_CSUM && IPV6_CSUM - dcb: accept RTM_GETDCB messages carrying set-like DCB commands if user is admin for backward-compatibility - selftests/tls: fix selftests build after adding ChaCha20-Poly1305 Current release - always broken: - ppp: fix refcount underflow on channel unbridge - bnxt_en: clear DEFRAG flag in firmware message when retry flashing - smc: fix out of bound access in the new netlink interface Previous releases - regressions: - fix use-after-free with UDP GRO by frags - mptcp: better msk-level shutdown - rndis_host: set proper input size for OID_GEN_PHYSICAL_MEDIUM request - i40e: xsk: fix potential NULL pointer dereferencing Previous releases - always broken: - skb frag: kmap_atomic fixes - avoid 32 x truesize under-estimation for tiny skbs - fix issues around register_netdevice() failures - udp: prevent reuseport_select_sock from reading uninitialized socks - dsa: unbind all switches from tree when DSA master unbinds - dsa: clear devlink port type before unregistering slave netdevs - can: isotp: isotp_getname(): fix kernel information leak - mlxsw: core: Thermal control fixes - ipv6: validate GSO SKB against MTU before finish IPv6 processing - stmmac: use __napi_schedule() for PREEMPT_RT - net: mvpp2: remove Pause and Asym_Pause support Misc: - remove from MAINTAINERS folks who had been inactive for >5yrs" * tag 'net-5.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (58 commits) mptcp: fix locking in mptcp_disconnect() net: Allow NETIF_F_HW_TLS_TX if IP_CSUM && IPV6_CSUM MAINTAINERS: dccp: move Gerrit Renker to CREDITS MAINTAINERS: ipvs: move Wensong Zhang to CREDITS MAINTAINERS: tls: move Aviad to CREDITS MAINTAINERS: ena: remove Zorik Machulsky from reviewers MAINTAINERS: vrf: move Shrijeet to CREDITS MAINTAINERS: net: move Alexey Kuznetsov to CREDITS MAINTAINERS: altx: move Jay Cliburn to CREDITS net: avoid 32 x truesize under-estimation for tiny skbs nt: usb: USB_RTL8153_ECM should not default to y net: stmmac: fix taprio configuration when base_time is in the past net: stmmac: fix taprio schedule configuration net: tip: fix a couple kernel-doc markups net: sit: unregister_netdevice on newlink's error path net: stmmac: Fixed mtu channged by cache aligned cxgb4/chtls: Fix tid stuck due to wrong update of qid i40e: fix potential NULL pointer dereferencing net: stmmac: use __napi_schedule() for PREEMPT_RT can: mcp251xfd: mcp251xfd_handle_rxif_one(): fix wrong NULL pointer check ...
2021-01-14mac80211: check if atf has been disabled in __ieee80211_schedule_txqLorenzo Bianconi
Check if atf has been disabled in __ieee80211_schedule_txq() in order to avoid a given sta is always put to the beginning of the active_txqs list and never moved to the end since deficit is not decremented in ieee80211_sta_register_airtime() Fixes: b4809e9484da1 ("mac80211: Add airtime accounting and scheduling to TXQs") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://lore.kernel.org/r/93889406c50f1416214c079ca0b8c9faecc5143e.1608975195.git.lorenzo@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-01-14mac80211: do not drop tx nulldata packets on encrypted linksFelix Fietkau
ieee80211_tx_h_select_key drops any non-mgmt packets without a key when encryption is used. This is wrong for nulldata packets that can't be encrypted and are sent out for probing clients and indicating 4-address mode. Reported-by: Sebastian Gottschall <s.gottschall@dd-wrt.com> Fixes: a0761a301746 ("mac80211: drop data frames without key on encrypted links") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20201218191525.1168-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-01-14mac80211: fix encryption key selection for 802.3 xmitFelix Fietkau
When using WEP, the default unicast key needs to be selected, instead of the STA PTK. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20201218184718.93650-4-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-01-14mac80211: fix fast-rx encryption checkFelix Fietkau
When using WEP, the default unicast key needs to be selected, instead of the STA PTK. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20201218184718.93650-5-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-01-14mac80211: fix incorrect strlen of .write in debugfsShayne Chen
This fixes strlen mismatch problems happening in some .write callbacks of debugfs. When trying to configure airtime_flags in debugfs, an error appeared: ash: write error: Invalid argument The error is returned from kstrtou16() since a wrong length makes it miss the real end of input string. To fix this, use count as the string length, and set proper end of string for a char buffer. The debug print is shown - airtime_flags_write: count = 2, len = 8, where the actual length is 2, but "len = strlen(buf)" gets 8. Also cleanup the other similar cases for the sake of consistency. Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com> Signed-off-by: Ryder Lee <ryder.lee@mediatek.com> Signed-off-by: Shayne Chen <shayne.chen@mediatek.com> Link: https://lore.kernel.org/r/20210112032028.7482-1-shayne.chen@mediatek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-01-14mptcp: fix locking in mptcp_disconnect()Paolo Abeni
tcp_disconnect() expects the caller acquires the sock lock, but mptcp_disconnect() is not doing that. Add the missing required lock. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Fixes: 76e2a55d1625 ("mptcp: better msk-level shutdown.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/f818e82b58a556feeb71dcccc8bf1c87aafc6175.1610638176.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-14net: Allow NETIF_F_HW_TLS_TX if IP_CSUM && IPV6_CSUMTariq Toukan
Cited patch below blocked the TLS TX device offload unless HW_CSUM is set. This broke devices that use IP_CSUM && IP6_CSUM. Here we fix it. Note that the single HW_TLS_TX feature flag indicates support for both IPv4/6, hence it should still be disabled in case only one of (IP_CSUM | IPV6_CSUM) is set. Fixes: ae0b04b238e2 ("net: Disable NETIF_F_HW_TLS_TX when HW_CSUM is disabled") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reported-by: Rohit Maheshwari <rohitm@chelsio.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Link: https://lore.kernel.org/r/20210114151215.7061-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-14net: avoid 32 x truesize under-estimation for tiny skbsEric Dumazet
Both virtio net and napi_get_frags() allocate skbs with a very small skb->head While using page fragments instead of a kmalloc backed skb->head might give a small performance improvement in some cases, there is a huge risk of under estimating memory usage. For both GOOD_COPY_LEN and GRO_MAX_HEAD, we can fit at least 32 allocations per page (order-3 page in x86), or even 64 on PowerPC We have been tracking OOM issues on GKE hosts hitting tcp_mem limits but consuming far more memory for TCP buffers than instructed in tcp_mem[2] Even if we force napi_alloc_skb() to only use order-0 pages, the issue would still be there on arches with PAGE_SIZE >= 32768 This patch makes sure that small skb head are kmalloc backed, so that other objects in the slab page can be reused instead of being held as long as skbs are sitting in socket queues. Note that we might in the future use the sk_buff napi cache, instead of going through a more expensive __alloc_skb() Another idea would be to use separate page sizes depending on the allocated length (to never have more than 4 frags per page) I would like to thank Greg Thelen for his precious help on this matter, analysing crash dumps is always a time consuming task. Fixes: fd11a83dd363 ("net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Greg Thelen <gthelen@google.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20210113161819.1155526-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-14net: tip: fix a couple kernel-doc markupsMauro Carvalho Chehab
A function has a different name between their prototype and its kernel-doc markup: ../net/tipc/link.c:2551: warning: expecting prototype for link_reset_stats(). Prototype was for tipc_link_reset_stats() instead ../net/tipc/node.c:1678: warning: expecting prototype for is the general link level function for message sending(). Prototype was for tipc_node_xmit() instead Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-14net: sit: unregister_netdevice on newlink's error pathJakub Kicinski
We need to unregister the netdevice if config failed. .ndo_uninit takes care of most of the heavy lifting. This was uncovered by recent commit c269a24ce057 ("net: make free_netdev() more lenient with unregistering devices"). Previously the partially-initialized device would be left in the system. Reported-and-tested-by: syzbot+2393580080a2da190f04@syzkaller.appspotmail.com Fixes: e2f1f072db8d ("sit: allow to configure 6rd tunnels via netlink") Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20210114012947.2515313-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-13tcp: assign skb hash after tcp_event_data_sentYuchung Cheng
Move skb_set_hash_from_sk s.t. it's called after instead of before tcp_event_data_sent is called. This enables congestion control modules to change the socket hash right before restarting from idle (via the TX_START congestion event). Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Link: https://lore.kernel.org/r/20210111230552.2704579-1-ycheng@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-13bpf: Reject too big ctx_size_in for raw_tp test runSong Liu
syzbot reported a WARNING for allocating too big memory: WARNING: CPU: 1 PID: 8484 at mm/page_alloc.c:4976 __alloc_pages_nodemask+0x5f8/0x730 mm/page_alloc.c:5011 Modules linked in: CPU: 1 PID: 8484 Comm: syz-executor862 Not tainted 5.11.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__alloc_pages_nodemask+0x5f8/0x730 mm/page_alloc.c:4976 Code: 00 00 0c 00 0f 85 a7 00 00 00 8b 3c 24 4c 89 f2 44 89 e6 c6 44 24 70 00 48 89 6c 24 58 e8 d0 d7 ff ff 49 89 c5 e9 ea fc ff ff <0f> 0b e9 b5 fd ff ff 89 74 24 14 4c 89 4c 24 08 4c 89 74 24 18 e8 RSP: 0018:ffffc900012efb10 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 1ffff9200025df66 RCX: 0000000000000000 RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000140dc0 RBP: 0000000000140dc0 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff81b1f7e1 R11: 0000000000000000 R12: 0000000000000014 R13: 0000000000000014 R14: 0000000000000000 R15: 0000000000000000 FS: 000000000190c880(0000) GS:ffff8880b9e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f08b7f316c0 CR3: 0000000012073000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: alloc_pages_current+0x18c/0x2a0 mm/mempolicy.c:2267 alloc_pages include/linux/gfp.h:547 [inline] kmalloc_order+0x2e/0xb0 mm/slab_common.c:837 kmalloc_order_trace+0x14/0x120 mm/slab_common.c:853 kmalloc include/linux/slab.h:557 [inline] kzalloc include/linux/slab.h:682 [inline] bpf_prog_test_run_raw_tp+0x4b5/0x670 net/bpf/test_run.c:282 bpf_prog_test_run kernel/bpf/syscall.c:3120 [inline] __do_sys_bpf+0x1ea9/0x4f10 kernel/bpf/syscall.c:4398 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x440499 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffe1f3bfb18 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440499 RDX: 0000000000000048 RSI: 0000000020000600 RDI: 000000000000000a RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401ca0 R13: 0000000000401d30 R14: 0000000000000000 R15: 0000000000000000 This is because we didn't filter out too big ctx_size_in. Fix it by rejecting ctx_size_in that are bigger than MAX_BPF_FUNC_ARGS (12) u64 numbers. Fixes: 1b4d60ec162f ("bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint") Reported-by: syzbot+4f98876664c7337a4ae6@syzkaller.appspotmail.com Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210112234254.1906829-1-songliubraving@fb.com
2021-01-13net: core: use eth_type_vlan in __netif_receive_skb_coreMenglong Dong
Replace the check for ETH_P_8021Q and ETH_P_8021AD in __netif_receive_skb_core with eth_type_vlan. Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Link: https://lore.kernel.org/r/20210111104221.3451-1-dong.menglong@zte.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-13can: isotp: isotp_getname(): fix kernel information leakOliver Hartkopp
Initialize the sockaddr_can structure to prevent a data leak to user space. Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Reported-by: syzbot+057884e2f453e8afebc8@syzkaller.appspotmail.com Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol") Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/r/20210112091643.11789-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>