summaryrefslogtreecommitdiff
path: root/drivers/net/virtio_net.c
AgeCommit message (Collapse)Author
2022-04-26virtio_net: fix wrong buf address calculation when using xdpNikolay Aleksandrov
We received a report[1] of kernel crashes when Cilium is used in XDP mode with virtio_net after updating to newer kernels. After investigating the reason it turned out that when using mergeable bufs with an XDP program which adjusts xdp.data or xdp.data_meta page_to_buf() calculates the build_skb address wrong because the offset can become less than the headroom so it gets the address of the previous page (-X bytes depending on how lower offset is): page_to_skb: page addr ffff9eb2923e2000 buf ffff9eb2923e1ffc offset 252 headroom 256 This is a pr_err() I added in the beginning of page_to_skb which clearly shows offset that is less than headroom by adding 4 bytes of metadata via an xdp prog. The calculations done are: receive_mergeable(): headroom = VIRTIO_XDP_HEADROOM; // VIRTIO_XDP_HEADROOM == 256 bytes offset = xdp.data - page_address(xdp_page) - vi->hdr_len - metasize; page_to_skb(): p = page_address(page) + offset; ... buf = p - headroom; Now buf goes -4 bytes from the page's starting address as can be seen above which is set as skb->head and skb->data by build_skb later. Depending on what's done with the skb (when it's freed most often) we get all kinds of corruptions and BUG_ON() triggers in mm[2]. We have to recalculate the new headroom after the xdp program has run, similar to how offset and len are recalculated. Headroom is directly related to data_hard_start, data and data_meta, so we use them to get the new size. The result is correct (similar pr_err() in page_to_skb, one case of xdp_page and one case of virtnet buf): a) Case with 4 bytes of metadata [ 115.949641] page_to_skb: page addr ffff8b4dcfad2000 offset 252 headroom 252 [ 121.084105] page_to_skb: page addr ffff8b4dcf018000 offset 20732 headroom 252 b) Case of pushing data +32 bytes [ 153.181401] page_to_skb: page addr ffff8b4dd0c4d000 offset 288 headroom 288 [ 158.480421] page_to_skb: page addr ffff8b4dd00b0000 offset 24864 headroom 288 c) Case of pushing data -33 bytes [ 835.906830] page_to_skb: page addr ffff8b4dd3270000 offset 223 headroom 223 [ 840.839910] page_to_skb: page addr ffff8b4dcdd68000 offset 12511 headroom 223 Offset and headroom are equal because offset points to the start of reserved bytes for the virtio_net header which are at buf start + headroom, while data points at buf start + vnet hdr size + headroom so when data or data_meta are adjusted by the xdp prog both the headroom size and the offset change equally. We can use data_hard_start to compute the new headroom after the xdp prog (linearized / page start case, the virtnet buf case is similar just with bigger base offset): xdp.data_hard_start = page_address + vnet_hdr xdp.data = page_address + vnet_hdr + headroom new headroom after xdp prog = xdp.data - xdp.data_hard_start - metasize An example reproducer xdp prog[3] is below. [1] https://github.com/cilium/cilium/issues/19453 [2] Two of the many traces: [ 40.437400] BUG: Bad page state in process swapper/0 pfn:14940 [ 40.916726] BUG: Bad page state in process systemd-resolve pfn:053b7 [ 41.300891] kernel BUG at include/linux/mm.h:720! [ 41.301801] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 41.302784] CPU: 1 PID: 1181 Comm: kubelet Kdump: loaded Tainted: G B W 5.18.0-rc1+ #37 [ 41.304458] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014 [ 41.306018] RIP: 0010:page_frag_free+0x79/0xe0 [ 41.306836] Code: 00 00 75 ea 48 8b 07 a9 00 00 01 00 74 e0 48 8b 47 48 48 8d 50 ff a8 01 48 0f 45 fa eb d0 48 c7 c6 18 b8 30 a6 e8 d7 f8 fc ff <0f> 0b 48 8d 78 ff eb bc 48 8b 07 a9 00 00 01 00 74 3a 66 90 0f b6 [ 41.310235] RSP: 0018:ffffac05c2a6bc78 EFLAGS: 00010292 [ 41.311201] RAX: 000000000000003e RBX: 0000000000000000 RCX: 0000000000000000 [ 41.312502] RDX: 0000000000000001 RSI: ffffffffa6423004 RDI: 00000000ffffffff [ 41.313794] RBP: ffff993c98823600 R08: 0000000000000000 R09: 00000000ffffdfff [ 41.315089] R10: ffffac05c2a6ba68 R11: ffffffffa698ca28 R12: ffff993c98823600 [ 41.316398] R13: ffff993c86311ebc R14: 0000000000000000 R15: 000000000000005c [ 41.317700] FS: 00007fe13fc56740(0000) GS:ffff993cdd900000(0000) knlGS:0000000000000000 [ 41.319150] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 41.320152] CR2: 000000c00008a000 CR3: 0000000014908000 CR4: 0000000000350ee0 [ 41.321387] Call Trace: [ 41.321819] <TASK> [ 41.322193] skb_release_data+0x13f/0x1c0 [ 41.322902] __kfree_skb+0x20/0x30 [ 41.343870] tcp_recvmsg_locked+0x671/0x880 [ 41.363764] tcp_recvmsg+0x5e/0x1c0 [ 41.384102] inet_recvmsg+0x42/0x100 [ 41.406783] ? sock_recvmsg+0x1d/0x70 [ 41.428201] sock_read_iter+0x84/0xd0 [ 41.445592] ? 0xffffffffa3000000 [ 41.462442] new_sync_read+0x148/0x160 [ 41.479314] ? 0xffffffffa3000000 [ 41.496937] vfs_read+0x138/0x190 [ 41.517198] ksys_read+0x87/0xc0 [ 41.535336] do_syscall_64+0x3b/0x90 [ 41.551637] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 41.568050] RIP: 0033:0x48765b [ 41.583955] Code: e8 4a 35 fe ff eb 88 cc cc cc cc cc cc cc cc e8 fb 7a fe ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30 [ 41.632818] RSP: 002b:000000c000a2f5b8 EFLAGS: 00000212 ORIG_RAX: 0000000000000000 [ 41.664588] RAX: ffffffffffffffda RBX: 000000c000062000 RCX: 000000000048765b [ 41.681205] RDX: 0000000000005e54 RSI: 000000c000e66000 RDI: 0000000000000016 [ 41.697164] RBP: 000000c000a2f608 R08: 0000000000000001 R09: 00000000000001b4 [ 41.713034] R10: 00000000000000b6 R11: 0000000000000212 R12: 00000000000000e9 [ 41.728755] R13: 0000000000000001 R14: 000000c000a92000 R15: ffffffffffffffff [ 41.744254] </TASK> [ 41.758585] Modules linked in: br_netfilter bridge veth netconsole virtio_net and [ 33.524802] BUG: Bad page state in process systemd-network pfn:11e60 [ 33.528617] page ffffe05dc0147b00 ffffe05dc04e7a00 ffff8ae9851ec000 (1) len 82 offset 252 metasize 4 hroom 0 hdr_len 12 data ffff8ae9851ec10c data_meta ffff8ae9851ec108 data_end ffff8ae9851ec14e [ 33.529764] page:000000003792b5ba refcount:0 mapcount:-512 mapping:0000000000000000 index:0x0 pfn:0x11e60 [ 33.532463] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) [ 33.532468] raw: 000fffffc0000000 0000000000000000 dead000000000122 0000000000000000 [ 33.532470] raw: 0000000000000000 0000000000000000 00000000fffffdff 0000000000000000 [ 33.532471] page dumped because: nonzero mapcount [ 33.532472] Modules linked in: br_netfilter bridge veth netconsole virtio_net [ 33.532479] CPU: 0 PID: 791 Comm: systemd-network Kdump: loaded Not tainted 5.18.0-rc1+ #37 [ 33.532482] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014 [ 33.532484] Call Trace: [ 33.532496] <TASK> [ 33.532500] dump_stack_lvl+0x45/0x5a [ 33.532506] bad_page.cold+0x63/0x94 [ 33.532510] free_pcp_prepare+0x290/0x420 [ 33.532515] free_unref_page+0x1b/0x100 [ 33.532518] skb_release_data+0x13f/0x1c0 [ 33.532524] kfree_skb_reason+0x3e/0xc0 [ 33.532527] ip6_mc_input+0x23c/0x2b0 [ 33.532531] ip6_sublist_rcv_finish+0x83/0x90 [ 33.532534] ip6_sublist_rcv+0x22b/0x2b0 [3] XDP program to reproduce(xdp_pass.c): #include <linux/bpf.h> #include <bpf/bpf_helpers.h> SEC("xdp_pass") int xdp_pkt_pass(struct xdp_md *ctx) { bpf_xdp_adjust_head(ctx, -(int)32); return XDP_PASS; } char _license[] SEC("license") = "GPL"; compile: clang -O2 -g -Wall -target bpf -c xdp_pass.c -o xdp_pass.o load on virtio_net: ip link set enp1s0 xdpdrv obj xdp_pass.o sec xdp_pass CC: stable@vger.kernel.org CC: Jason Wang <jasowang@redhat.com> CC: Xuan Zhuo <xuanzhuo@linux.alibaba.com> CC: Daniel Borkmann <daniel@iogearbox.net> CC: "Michael S. Tsirkin" <mst@redhat.com> CC: virtualization@lists.linux-foundation.org Fixes: 8fb7da9e9907 ("virtio_net: get build_skb() buf by data ptr") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20220425103703.3067292-1-razor@blackwall.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-03-31Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: - vdpa generic device type support - more virtio hardening for broken devices (but on the same theme, revert some virtio hotplug hardening patches - they were misusing some interrupt flags and had to be reverted) - RSS support in virtio-net - max device MTU support in mlx5 vdpa - akcipher support in virtio-crypto - shared IRQ support in ifcvf vdpa - a minor performance improvement in vhost - enable virtio mem for ARM64 - beginnings of advance dma support - cleanups, fixes all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (33 commits) vdpa/mlx5: Avoid processing works if workqueue was destroyed vhost: handle error while adding split ranges to iotlb vdpa: support exposing the count of vqs to userspace vdpa: change the type of nvqs to u32 vdpa: support exposing the config size to userspace vdpa/mlx5: re-create forwarding rules after mac modified virtio: pci: check bar values read from virtio config space Revert "virtio_pci: harden MSI-X interrupts" Revert "virtio-pci: harden INTX interrupts" drivers/net/virtio_net: Added RSS hash report control. drivers/net/virtio_net: Added RSS hash report. drivers/net/virtio_net: Added basic RSS support. drivers/net/virtio_net: Fixed padded vheader to use v1 with hash. virtio: use virtio_device_ready() in virtio_device_restore() tools/virtio: compile with -pthread tools/virtio: fix after premapped buf support virtio_ring: remove flags check for unmap packed indirect desc virtio_ring: remove flags check for unmap split indirect desc virtio_ring: rename vring_unmap_state_packed() to vring_unmap_extra_packed() net/mlx5: Add support for configuring max device MTU ...
2022-03-28drivers/net/virtio_net: Added RSS hash report control.Andrew Melnychenko
Now it's possible to control supported hashflows. Added hashflow set/get callbacks. Also, disabling RXH_IP_SRC/DST for TCP would disable then for UDP. TCP and UDP supports only: ethtool -U eth0 rx-flow-hash tcp4 sd RXH_IP_SRC + RXH_IP_DST ethtool -U eth0 rx-flow-hash tcp4 sdfn RXH_IP_SRC + RXH_IP_DST + RXH_L4_B_0_1 + RXH_L4_B_2_3 Disabling happens because VirtioNET hashtype for IP doesn't check L4 proto, it works for all IP packets(TCP, UDP, ICMP, etc.). For TCP and UDP, it's possible to set IP+PORT hashes. But disabling IP hashes will disable them for TCP and UDP simultaneously. It's possible to set IP+PORT for TCP/UDP and disable/enable IP for everything else(UDP, ICMP, etc.). Signed-off-by: Andrew Melnychenko <andrew@daynix.com> Link: https://lore.kernel.org/r/20220328175336.10802-5-andrew@daynix.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-28drivers/net/virtio_net: Added RSS hash report.Andrew Melnychenko
Added features for RSS hash report. If hash is provided - it sets to skb. Added checks if rss and/or hash are enabled together. Signed-off-by: Andrew Melnychenko <andrew@daynix.com> Link: https://lore.kernel.org/r/20220328175336.10802-4-andrew@daynix.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-28drivers/net/virtio_net: Added basic RSS support.Andrew Melnychenko
Added features for RSS. Added initialization, RXHASH feature and ethtool ops. By default RSS/RXHASH is disabled. Virtio RSS "IPv6 extensions" hashes disabled. Added ethtools ops to set key and indirection table. Signed-off-by: Andrew Melnychenko <andrew@daynix.com> Link: https://lore.kernel.org/r/20220328175336.10802-3-andrew@daynix.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-28drivers/net/virtio_net: Fixed padded vheader to use v1 with hash.Andrew Melnychenko
The header v1 provides additional info about RSS. Added changes to computing proper header length. In the next patches, the header may contain RSS hash info for the hash population. Signed-off-by: Andrew Melnychenko <andrew@daynix.com> Link: https://lore.kernel.org/r/20220328175336.10802-2-andrew@daynix.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-02-15virtio_net: Fix code indent errorMichael Catanzaro
This patch fixes the checkpatch.pl warning: ERROR: code indent should use tabs where possible #3453: FILE: drivers/net/virtio_net.c:3453: ret = register_virtio_driver(&virtio_net_driver);$ Uneccessary newline was also removed making line 3453 now 3452. Signed-off-by: Michael Catanzaro <mcatanzaro.kernel@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-23Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linuxLinus Torvalds
Pull bitmap updates from Yury Norov: - introduce for_each_set_bitrange() - use find_first_*_bit() instead of find_next_*_bit() where possible - unify for_each_bit() macros * tag 'bitmap-5.17-rc1' of git://github.com/norov/linux: vsprintf: rework bitmap_list_string lib: bitmap: add performance test for bitmap_print_to_pagebuf bitmap: unify find_bit operations mm/percpu: micro-optimize pcpu_is_populated() Replace for_each_*_bit_from() with for_each_*_bit() where appropriate find: micro-optimize for_each_{set,clear}_bit() include/linux: move for_each_bit() macros from bitops.h to find.h cpumask: replace cpumask_next_* with cpumask_first_* where appropriate tools: sync tools/bitmap with mother linux all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate cpumask: use find_first_and_bit() lib: add find_first_and_bit() arch: remove GENERIC_FIND_FIRST_BIT entirely include: move find.h from asm_generic to linux bitops: move find_bit_*_le functions from le.h to find.h bitops: protect find_first_{,zero}_bit properly
2022-01-18Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: "virtio,vdpa,qemu_fw_cfg: features, cleanups, and fixes. - partial support for < MAX_ORDER - 1 granularity for virtio-mem - driver_override for vdpa - sysfs ABI documentation for vdpa - multiqueue config support for mlx5 vdpa - and misc fixes, cleanups" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (42 commits) vdpa/mlx5: Fix tracking of current number of VQs vdpa/mlx5: Fix is_index_valid() to refer to features vdpa: Protect vdpa reset with cf_mutex vdpa: Avoid taking cf_mutex lock on get status vdpa/vdpa_sim_net: Report max device capabilities vdpa: Use BIT_ULL for bit operations vdpa/vdpa_sim: Configure max supported virtqueues vdpa/mlx5: Report max device capabilities vdpa: Support reporting max device capabilities vdpa/mlx5: Restore cur_num_vqs in case of failure in change_num_qps() vdpa: Add support for returning device configuration information vdpa/mlx5: Support configuring max data virtqueue vdpa/mlx5: Fix config_attr_mask assignment vdpa: Allow to configure max data virtqueues vdpa: Read device configuration only if FEATURES_OK vdpa: Sync calls set/get config/status with cf_mutex vdpa/mlx5: Distribute RX virtqueues in RQT object vdpa: Provide interface to read driver features vdpa: clean up get_config_size ret value handling virtio_ring: mark ring unused on error ...
2022-01-15cpumask: replace cpumask_next_* with cpumask_first_* where appropriateYury Norov
cpumask_first() is a more effective analogue of 'next' version if n == -1 (which means start == 0). This patch replaces 'next' with 'first' where things look trivial. There's no cpumask_first_zero() function, so create it. Signed-off-by: Yury Norov <yury.norov@gmail.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2022-01-14virtio: wrap config->reset callsMichael S. Tsirkin
This will enable cleanups down the road. The idea is to disable cbs, then add "flush_queued_cbs" callback as a parameter, this way drivers can flush any work queued after callbacks have been disabled. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20211013105226.20225-1-mst@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-12-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-12-30 The following pull-request contains BPF updates for your *net-next* tree. We've added 72 non-merge commits during the last 20 day(s) which contain a total of 223 files changed, 3510 insertions(+), 1591 deletions(-). The main changes are: 1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii. 2) Beautify and de-verbose verifier logs, from Christy. 3) Composable verifier types, from Hao. 4) bpf_strncmp helper, from Hou. 5) bpf.h header dependency cleanup, from Jakub. 6) get_func_[arg|ret|arg_cnt] helpers, from Jiri. 7) Sleepable local storage, from KP. 8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16virtio_net: fix rx_drops stat for small pktsWenliang Wang
We found the stat of rx drops for small pkts does not increment when build_skb fail, it's not coherent with other mode's rx drops stat. Signed-off-by: Wenliang Wang <wangwenliang.1995@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-13bpf: Let bpf_warn_invalid_xdp_action() report more infoPaolo Abeni
In non trivial scenarios, the action id alone is not sufficient to identify the program causing the warning. Before the previous patch, the generated stack-trace pointed out at least the involved device driver. Let's additionally include the program name and id, and the relevant device name. If the user needs additional infos, he can fetch them via a kernel probe, leveraging the arguments added here. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/ddb96bb975cbfddb1546cf5da60e77d5100b533c.1638189075.git.pabeni@redhat.com
2021-12-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-24Revert "virtio-net: don't let virtio core to validate used length"Michael S. Tsirkin
This reverts commit 816625c13652cef5b2c49082d652875da6f2ad7a. Attempts to validate length in the core did not work out. We'll drop them, so revert the dependent changes in drivers. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-22ethtool: extend ringparam setting/getting API with rx_buf_lenHao Chen
Add two new parameters kernel_ringparam and extack for .get_ringparam and .set_ringparam to extend more ring params through netlink. Signed-off-by: Hao Chen <chenhao288@hisilicon.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-17net: annotate accesses to queue->trans_startEric Dumazet
In following patches, dev_watchdog() will no longer stop all queues. It will read queue->trans_start locklessly. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-03Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: "vhost and virtio fixes and features: - Hardening work by Jason - vdpa driver for Alibaba ENI - Performance tweaks for virtio blk - virtio rng rework using an internal buffer - mac/mtu programming for mlx5 vdpa - Misc fixes, cleanups" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (45 commits) vdpa/mlx5: Forward only packets with allowed MAC address vdpa/mlx5: Support configuration of MAC vdpa/mlx5: Fix clearing of VIRTIO_NET_F_MAC feature bit vdpa_sim_net: Enable user to set mac address and mtu vdpa: Enable user to set mac and mtu of vdpa device vdpa: Use kernel coding style for structure comments vdpa: Introduce query of device config layout vdpa: Introduce and use vdpa device get, set config helpers virtio-scsi: don't let virtio core to validate used buffer length virtio-blk: don't let virtio core to validate used length virtio-net: don't let virtio core to validate used length virtio_ring: validate used buffer length virtio_blk: correct types for status handling virtio_blk: allow 0 as num_request_queues i2c: virtio: Add support for zero-length requests virtio-blk: fixup coccinelle warnings virtio_ring: fix typos in vring_desc_extra virtio-pci: harden INTX interrupts virtio_pci: harden MSI-X interrupts virtio_config: introduce a new .enable_cbs method ...
2021-11-01virtio-net: don't let virtio core to validate used lengthJason Wang
For RX virtuqueue, the used length is validated in all the three paths (big, small and mergeable). For control vq, we never tries to use used length. So this patch forbids the core to validate the used length. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20211027022107.14357-3-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01virtio_net: clarify tailroom logicMichael S. Tsirkin
Make tailroom math follow same logic as everything else, subtracing values in the order in which things are laid out in the buffer. Tested-by: Corentin Noël <corentin.noel@collabora.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-10-28net: virtio: use eth_hw_addr_set()Jakub Kicinski
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it go through appropriate helpers. Even though the current code uses dev->addr_len the we can switch to eth_hw_addr_set() instead of dev_addr_set(). The netdev is always allocated by alloc_etherdev_mq() and there are at least two places which assume Ethernet address: - the line below calling eth_hw_addr_random() - virtnet_set_mac_address() -> eth_commit_mac_addr_change() Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20211027152012.3393077-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
tools/testing/selftests/net/ioam6.sh 7b1700e009cc ("selftests: net: modify IOAM tests for undef bits") bf77b1400a56 ("selftests: net: Test for the IOAM encapsulation with IPv6") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-10virtio_net: skip RCU read lock by checking xdp_enabled of viLi RongQing
networking benchmark shows that __rcu_read_lock and __rcu_read_unlock takes some cpu cycles, and we can avoid calling them partially in virtio rx path by check xdp_enabled of vi, and xdp is disabled most of time Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-09virtio-net: fix for skb_over_panic inside big modeXuan Zhuo
commit 126285651b7f ("Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net") accidentally reverted the effect of commit 1a8024239da ("virtio-net: fix for skb_over_panic inside big mode") on drivers/net/virtio_net.c As a result, users of crosvm (which is using large packet mode) are experiencing crashes with 5.14-rc1 and above that do not occur with 5.13. Crash trace: [ 61.346677] skbuff: skb_over_panic: text:ffffffff881ae2c7 len:3762 put:3762 head:ffff8a5ec8c22000 data:ffff8a5ec8c22010 tail:0xec2 end:0xec0 dev:<NULL> [ 61.369192] kernel BUG at net/core/skbuff.c:111! [ 61.372840] invalid opcode: 0000 [#1] SMP PTI [ 61.374892] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.14.0-rc1 linux-v5.14-rc1-for-mesa-ci.tar.bz2 #1 [ 61.376450] Hardware name: ChromiumOS crosvm, BIOS 0 .. [ 61.393635] Call Trace: [ 61.394127] <IRQ> [ 61.394488] skb_put.cold+0x10/0x10 [ 61.395095] page_to_skb+0xf7/0x410 [ 61.395689] receive_buf+0x81/0x1660 [ 61.396228] ? netif_receive_skb_list_internal+0x1ad/0x2b0 [ 61.397180] ? napi_gro_flush+0x97/0xe0 [ 61.397896] ? detach_buf_split+0x67/0x120 [ 61.398573] virtnet_poll+0x2cf/0x420 [ 61.399197] __napi_poll+0x25/0x150 [ 61.399764] net_rx_action+0x22f/0x280 [ 61.400394] __do_softirq+0xba/0x257 [ 61.401012] irq_exit_rcu+0x8e/0xb0 [ 61.401618] common_interrupt+0x7b/0xa0 [ 61.402270] </IRQ> See https://lore.kernel.org/r/5edaa2b7c2fe4abd0347b8454b2ac032b6694e2c.camel%40collabora.com for the report. Apply the original 1a8024239da ("virtio-net: fix for skb_over_panic inside big mode") again, the original logic still holds: In virtio-net's large packet mode, there is a hole in the space behind buf. hdr_padded_len - hdr_len We must take this into account when calculating tailroom. Cc: Greg KH <gregkh@linuxfoundation.org> Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom") Fixes: 126285651b7f ("Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reported-by: Corentin Noël <corentin.noel@collabora.com> Tested-by: Corentin Noël <corentin.noel@collabora.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
net/mptcp/protocol.c 977d293e23b4 ("mptcp: ensure tx skbs always have the MPTCP ext") efe686ffce01 ("mptcp: ensure tx skbs always have the MPTCP ext") same patch merged in both trees, keep net-next. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-20virtio_net: introduce TX timeout watchdogTony Lu
This implements ndo_tx_timeout handler and put this into stats. When there is something wrong to send out packets, we could notice tx timeout events and total timeout counter. We have suffered send timeout issues due to the backends hung. With this, we can find the details, and collect the counters by monitor systems. Signed-off-by: Tony Lu <tony.ly@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-19virtio_net: use netdev_warn_once to output warn when without enough queuesXuan Zhuo
This warning is output when virtnet does not have enough queues, but it only needs to be printed once to inform the user of this situation. It is not necessary to print it every time. If the user loads xdp frequently, this log appears too much. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-19virtio-net: fix pages leaking when building skb in big modeJason Wang
We try to use build_skb() if we had sufficient tailroom. But we forget to release the unused pages chained via private in big mode which will leak pages. Fixing this by release the pages after building the skb in big mode. Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom") Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-29virtio_net: reduce raw_smp_processor_id() calling in virtnet_xdp_get_sqLi RongQing
smp_processor_id()/raw* will be called once each when not more queues in virtnet_xdp_get_sq() which is called in non-preemptible context, so it's safe to call the function smp_processor_id() once. Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-24ethtool: extend coalesce setting uAPI with CQE modeYufeng Mo
In order to support more coalesce parameters through netlink, add two new parameter kernel_coal and extack for .set_coalesce and .get_coalesce, then some extra info can return to user with the netlink API. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/ptp/Kconfig: 55c8fca1dae1 ("ptp_pch: Restore dependency on PCI") e5f31552674e ("ethernet: fix PTP_1588_CLOCK dependencies") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-17virtio-net: use NETIF_F_GRO_HW instead of NETIF_F_LROJason Wang
Commit a02e8964eaf92 ("virtio-net: ethtool configurable LRO") maps LRO to virtio guest offloading features and allows the administrator to enable and disable those features via ethtool. This leads to several issues: - For a device that doesn't support control guest offloads, the "LRO" can't be disabled triggering WARN in dev_disable_lro() when turning off LRO or when enabling forwarding bridging etc. - For a device that supports control guest offloads, the guest offloads are disabled in cases of bridging, forwarding etc slowing down the traffic. Fix this by using NETIF_F_GRO_HW instead. Though the spec does not guarantee packets to be re-segmented as the original ones, we can add that to the spec, possibly with a flag for devices to differentiate between GRO and LRO. Further, we never advertised LRO historically before a02e8964eaf92 ("virtio-net: ethtool configurable LRO") and so bridged/forwarded configs effectively always relied on virtio receive offloads behaving like GRO - thus even if this breaks any configs it is at least not a regression. Fixes: a02e8964eaf92 ("virtio-net: ethtool configurable LRO") Acked-by: Michael S. Tsirkin <mst@redhat.com> Reported-by: Ivan <ivan@prestigetransportation.com> Tested-by: Ivan <ivan@prestigetransportation.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-04virtio_net: Replace deprecated CPU-hotplug functions.Sebastian Andrzej Siewior
The functions get_online_cpus() and put_online_cpus() have been deprecated during the CPU hotplug rework. They map directly to cpus_read_lock() and cpus_read_unlock(). Replace deprecated CPU-hotplug functions with the official version. The behavior remains unchanged. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: virtualization@lists.linux-foundation.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-03virtio-net: realign page_to_skb() after mergesJakub Kicinski
We ended up merging two versions of the same patch set: commit 8fb7da9e9907 ("virtio_net: get build_skb() buf by data ptr") commit 5c37711d9f27 ("virtio-net: fix for unable to handle page fault for address") into net, and commit 7bf64460e3b2 ("virtio-net: get build_skb() buf by data ptr") commit 6c66c147b9a4 ("virtio-net: fix for unable to handle page fault for address") into net-next. Redo the merge from commit 126285651b7f ("Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net"), so that the most recent code remains. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-14Merge tag 'net-5.14-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski. "Including fixes from bpf and netfilter. Current release - regressions: - sock: fix parameter order in sock_setsockopt() Current release - new code bugs: - netfilter: nft_last: - fix incorrect arithmetic when restoring last used - honor NFTA_LAST_SET on restoration Previous releases - regressions: - udp: properly flush normal packet at GRO time - sfc: ensure correct number of XDP queues; don't allow enabling the feature if there isn't sufficient resources to Tx from any CPU - dsa: sja1105: fix address learning getting disabled on the CPU port - mptcp: addresses a rmem accounting issue that could keep packets in subflow receive buffers longer than necessary, delaying MPTCP-level ACKs - ip_tunnel: fix mtu calculation for ETHER tunnel devices - do not reuse skbs allocated from skbuff_fclone_cache in the napi skb cache, we'd try to return them to the wrong slab cache - tcp: consistently disable header prediction for mptcp Previous releases - always broken: - bpf: fix subprog poke descriptor tracking use-after-free - ipv6: - allocate enough headroom in ip6_finish_output2() in case iptables TEE is used - tcp: drop silly ICMPv6 packet too big messages to avoid expensive and pointless lookups (which may serve as a DDOS vector) - make sure fwmark is copied in SYNACK packets - fix 'disable_policy' for forwarded packets (align with IPv4) - netfilter: conntrack: - do not renew entry stuck in tcp SYN_SENT state - do not mark RST in the reply direction coming after SYN packet for an out-of-sync entry - mptcp: cleanly handle error conditions with MP_JOIN and syncookies - mptcp: fix double free when rejecting a join due to port mismatch - validate lwtstate->data before returning from skb_tunnel_info() - tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path - mt76: mt7921: continue to probe driver when fw already downloaded - bonding: fix multiple issues with offloading IPsec to (thru?) bond - stmmac: ptp: fix issues around Qbv support and setting time back - bcmgenet: always clear wake-up based on energy detection Misc: - sctp: move 198 addresses from unusable to private scope - ptp: support virtual clocks and timestamping - openvswitch: optimize operation for key comparison" * tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (158 commits) net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave() sfc: add logs explaining XDP_TX/REDIRECT is not available sfc: ensure correct number of XDP queues sfc: fix lack of XDP TX queues - error XDP TX failed (-22) net: fddi: fix UAF in fza_probe net: dsa: sja1105: fix address learning getting disabled on the CPU port net: ocelot: fix switchdev objects synced for wrong netdev with LAG offload net: Use nlmsg_unicast() instead of netlink_unicast() octeontx2-pf: Fix uninitialized boolean variable pps ipv6: allocate enough headroom in ip6_finish_output2() net: hdlc: rename 'mod_init' & 'mod_exit' functions to be module-specific net: bridge: multicast: fix MRD advertisement router port marking race net: bridge: multicast: fix PIM hello router port marking race net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340 dsa: fix for_each_child.cocci warnings virtio_net: check virtqueue_add_sgs() return value mptcp: properly account bulk freed memory selftests: mptcp: fix case multiple subflows limited by server mptcp: avoid processing packet if a subflow reset mptcp: fix syncookie process if mptcp can not_accept new subflow ...
2021-07-10virtio_net: check virtqueue_add_sgs() return valueYunjian Wang
As virtqueue_add_sgs() can fail, we should check the return value. Addresses-Coverity-ID: 1464439 ("Unchecked return value") Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-08virtio_net: disable cb aggressivelyMichael S. Tsirkin
There are currently two cases where we poll TX vq not in response to a callback: start xmit and rx napi. We currently do this with callbacks enabled which can cause extra interrupts from the card. Used not to be a big issue as we run with interrupts disabled but that is no longer the case, and in some cases the rate of spurious interrupts is so high linux detects this and actually kills the interrupt. Fix up by disabling the callbacks before polling the tx vq. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03virtio_net: move txq wakeups under tx q lockMichael S. Tsirkin
We currently check num_free outside tx q lock which is unsafe: new packets can arrive meanwhile and there won't be space in the queue. Thus a spurious queue wakeup causing overhead and even packet drops. Move the check under the lock to fix that. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03virtio_net: move tx vq operation under tx queue lockMichael S. Tsirkin
It's unsafe to operate a vq from multiple threads. Unfortunately this is exactly what we do when invoking clean tx poll from rx napi. Same happens with napi-tx even without the opportunistic cleaning from the receive interrupt: that races with processing the vq in start_xmit. As a fix move everything that deals with the vq to under tx lock. Fixes: b92f1e6751a6 ("virtio-net: transmit napi") Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03virtio_net: Fix error handling in virtnet_restore()Xie Yongji
Do some cleanups in virtnet_restore() when virtnet_cpu_notif_add() failed. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20210517084516.332-1-xieyongji@bytedance.com Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-06-23virtio_net: Use virtio_find_vqs_ctx() helperXianting Tian
virtio_find_vqs_ctx() is defined but never be called currently, it is the right place to use it. Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-07virtio_net: Remove BUG() to avoid machine deadXianting Tian
We should not directly BUG() when there is hdr error, it is better to output a print when such error happens. Currently, the caller of xmit_skb() already did it. Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-07Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Bug fixes overlapping feature additions and refactoring, mostly. Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-03virtio-net: fix for skb_over_panic inside big modeXuan Zhuo
In virtio-net's large packet mode, there is a hole in the space behind buf. hdr_padded_len - hdr_len We must take this into account when calculating tailroom. [ 44.544385] skb_put.cold (net/core/skbuff.c:5254 (discriminator 1) net/core/skbuff.c:5252 (discriminator 1)) [ 44.544864] page_to_skb (drivers/net/virtio_net.c:485) [ 44.545361] receive_buf (drivers/net/virtio_net.c:849 drivers/net/virtio_net.c:1131) [ 44.545870] ? netif_receive_skb_list_internal (net/core/dev.c:5714) [ 44.546628] ? dev_gro_receive (net/core/dev.c:6103) [ 44.547135] ? napi_complete_done (./include/linux/list.h:35 net/core/dev.c:5867 net/core/dev.c:5862 net/core/dev.c:6565) [ 44.547672] virtnet_poll (drivers/net/virtio_net.c:1427 drivers/net/virtio_net.c:1525) [ 44.548251] __napi_poll (net/core/dev.c:6985) [ 44.548744] net_rx_action (net/core/dev.c:7054 net/core/dev.c:7139) [ 44.549264] __do_softirq (./arch/x86/include/asm/jump_label.h:19 ./include/linux/jump_label.h:200 ./include/trace/events/irq.h:142 kernel/softirq.c:560) [ 44.549762] irq_exit_rcu (kernel/softirq.c:433 kernel/softirq.c:637 kernel/softirq.c:649) [ 44.551384] common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 13)) [ 44.551991] ? asm_common_interrupt (./arch/x86/include/asm/idtentry.h:638) [ 44.552654] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:638) Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reported-by: Corentin Noël <corentin.noel@collabora.com> Tested-by: Corentin Noël <corentin.noel@collabora.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-01virtio_net: get build_skb() buf by data ptrXuan Zhuo
In the case of merge, the page passed into page_to_skb() may be a head page, not the page where the current data is located. So when trying to get the buf where the data is located, we should get buf based on headroom instead of offset. This patch solves this problem. But if you don't use this patch, the original code can also run, because if the page is not the page of the current data, the calculated tailroom will be less than 0, and will not enter the logic of build_skb() . The significance of this patch is to modify this logical problem, allowing more situations to use build_skb(). Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-01virtio-net: fix for unable to handle page fault for addressXuan Zhuo
In merge mode, when xdp is enabled, if the headroom of buf is smaller than virtnet_get_headroom(), xdp_linearize_page() will be called but the variable of "headroom" is still 0, which leads to wrong logic after entering page_to_skb(). [ 16.600944] BUG: unable to handle page fault for address: ffffecbfff7b43c8[ 16.602175] #PF: supervisor read access in kernel mode [ 16.603350] #PF: error_code(0x0000) - not-present page [ 16.604200] PGD 0 P4D 0 [ 16.604686] Oops: 0000 [#1] SMP PTI [ 16.605306] CPU: 4 PID: 715 Comm: sh Tainted: G B 5.12.0+ #312 [ 16.606429] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/04 [ 16.608217] RIP: 0010:unmap_page_range+0x947/0xde0 [ 16.609014] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065 [ 16.611863] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286 [ 16.612720] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359 [ 16.613853] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005 [ 16.614976] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030 [ 16.616124] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f [ 16.617276] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000 [ 16.618423] FS: 0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000 [ 16.619738] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 16.620670] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0 [ 16.621792] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 16.622920] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 16.624047] Call Trace: [ 16.624525] ? release_pages+0x24d/0x730 [ 16.625209] unmap_single_vma+0xa9/0x130 [ 16.625885] unmap_vmas+0x76/0xf0 [ 16.626480] exit_mmap+0xa0/0x210 [ 16.627129] mmput+0x67/0x180 [ 16.627673] do_exit+0x3d1/0xf10 [ 16.628259] ? do_user_addr_fault+0x231/0x840 [ 16.629000] do_group_exit+0x53/0xd0 [ 16.629631] __x64_sys_exit_group+0x1d/0x20 [ 16.630354] do_syscall_64+0x3c/0x80 [ 16.630988] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 16.631828] RIP: 0033:0x7f1a043d0191 [ 16.632464] Code: Unable to access opcode bytes at RIP 0x7f1a043d0167. [ 16.633502] RSP: 002b:00007ffe3d993308 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 [ 16.634737] RAX: ffffffffffffffda RBX: 00007f1a044c9490 RCX: 00007f1a043d0191 [ 16.635857] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000 [ 16.636986] RBP: 0000000000000000 R08: ffffffffffffff88 R09: 0000000000000001 [ 16.638120] R10: 0000000000000008 R11: 0000000000000246 R12: 00007f1a044c9490 [ 16.639245] R13: 0000000000000001 R14: 00007f1a044c9968 R15: 0000000000000000 [ 16.640408] Modules linked in: [ 16.640958] CR2: ffffecbfff7b43c8 [ 16.641557] ---[ end trace bc4891c6ce46354c ]--- [ 16.642335] RIP: 0010:unmap_page_range+0x947/0xde0 [ 16.643135] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065 [ 16.645983] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286 [ 16.646845] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359 [ 16.647970] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005 [ 16.649091] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030 [ 16.650250] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f [ 16.651394] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000 [ 16.652529] FS: 0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000 [ 16.653887] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 16.654841] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0 [ 16.655992] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 16.657150] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 16.658290] Kernel panic - not syncing: Fatal exception [ 16.659613] Kernel Offset: disabled [ 16.660234] ---[ end Kernel panic - not syncing: Fatal exception ]--- Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-31virtio-net: Add validation for used lengthXie Yongji
This adds validation for used length (might come from an untrusted device) to avoid data corruption or loss. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210531135852.113-1-xieyongji@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-13virtio-net: get build_skb() buf by data ptrXuan Zhuo
In the case of merge, the page passed into page_to_skb() may be a head page, not the page where the current data is located. So when trying to get the buf where the data is located, you should directly use the pointer(p) to get the address corresponding to the page. At the same time, the offset of the data in the page should also be obtained using offset_in_page(). This patch solves this problem. But if you don’t use this patch, the original code can also run, because if the page is not the page of the current data, the calculated tailroom will be less than 0, and will not enter the logic of build_skb() . The significance of this patch is to modify this logical problem, allowing more situations to use build_skb(). Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>