summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-06-21net: mana: Fix a memory leak in an error handling path in 'mana_create_txq()'Christophe JAILLET
If this test fails we must free some resources as in all the other error handling paths of this function. Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21Merge branch 'nnicstar-fixes'David S. Miller
Zheyu Ma says: ==================== atm: nicstar: fix two bugs about error handling Zheyu Ma (2): atm: nicstar: use 'dma_free_coherent' instead of 'kfree' atm: nicstar: register the interrupt handler in the right place ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21atm: nicstar: register the interrupt handler in the right placeZheyu Ma
Because the error handling is sequential, the application of resources should be carried out in the order of error handling, so the operation of registering the interrupt handler should be put in front, so as not to free the unregistered interrupt handler during error handling. This log reveals it: [ 3.438724] Trying to free already-free IRQ 23 [ 3.439060] WARNING: CPU: 5 PID: 1 at kernel/irq/manage.c:1825 free_irq+0xfb/0x480 [ 3.440039] Modules linked in: [ 3.440257] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.12.4-g70e7f0549188-dirty #142 [ 3.440793] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [ 3.441561] RIP: 0010:free_irq+0xfb/0x480 [ 3.441845] Code: 6e 08 74 6f 4d 89 f4 e8 c3 78 09 00 4d 8b 74 24 18 4d 85 f6 75 e3 e8 b4 78 09 00 8b 75 c8 48 c7 c7 a0 ac d5 85 e8 95 d7 f5 ff <0f> 0b 48 8b 75 c0 4c 89 ff e8 87 c5 90 03 48 8b 43 40 4c 8b a0 80 [ 3.443121] RSP: 0000:ffffc90000017b50 EFLAGS: 00010086 [ 3.443483] RAX: 0000000000000000 RBX: ffff888107c6f000 RCX: 0000000000000000 [ 3.443972] RDX: 0000000000000000 RSI: ffffffff8123f301 RDI: 00000000ffffffff [ 3.444462] RBP: ffffc90000017b90 R08: 0000000000000001 R09: 0000000000000003 [ 3.444950] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 [ 3.444994] R13: ffff888107dc0000 R14: ffff888104f6bf00 R15: ffff888107c6f0a8 [ 3.444994] FS: 0000000000000000(0000) GS:ffff88817bd40000(0000) knlGS:0000000000000000 [ 3.444994] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3.444994] CR2: 0000000000000000 CR3: 000000000642e000 CR4: 00000000000006e0 [ 3.444994] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3.444994] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3.444994] Call Trace: [ 3.444994] ns_init_card_error+0x18e/0x250 [ 3.444994] nicstar_init_one+0x10d2/0x1130 [ 3.444994] local_pci_probe+0x4a/0xb0 [ 3.444994] pci_device_probe+0x126/0x1d0 [ 3.444994] ? pci_device_remove+0x100/0x100 [ 3.444994] really_probe+0x27e/0x650 [ 3.444994] driver_probe_device+0x84/0x1d0 [ 3.444994] ? mutex_lock_nested+0x16/0x20 [ 3.444994] device_driver_attach+0x63/0x70 [ 3.444994] __driver_attach+0x117/0x1a0 [ 3.444994] ? device_driver_attach+0x70/0x70 [ 3.444994] bus_for_each_dev+0xb6/0x110 [ 3.444994] ? rdinit_setup+0x40/0x40 [ 3.444994] driver_attach+0x22/0x30 [ 3.444994] bus_add_driver+0x1e6/0x2a0 [ 3.444994] driver_register+0xa4/0x180 [ 3.444994] __pci_register_driver+0x77/0x80 [ 3.444994] ? uPD98402_module_init+0xd/0xd [ 3.444994] nicstar_init+0x1f/0x75 [ 3.444994] do_one_initcall+0x7a/0x3d0 [ 3.444994] ? rdinit_setup+0x40/0x40 [ 3.444994] ? rcu_read_lock_sched_held+0x4a/0x70 [ 3.444994] kernel_init_freeable+0x2a7/0x2f9 [ 3.444994] ? rest_init+0x2c0/0x2c0 [ 3.444994] kernel_init+0x13/0x180 [ 3.444994] ? rest_init+0x2c0/0x2c0 [ 3.444994] ? rest_init+0x2c0/0x2c0 [ 3.444994] ret_from_fork+0x1f/0x30 [ 3.444994] Kernel panic - not syncing: panic_on_warn set ... [ 3.444994] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.12.4-g70e7f0549188-dirty #142 [ 3.444994] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [ 3.444994] Call Trace: [ 3.444994] dump_stack+0xba/0xf5 [ 3.444994] ? free_irq+0xfb/0x480 [ 3.444994] panic+0x155/0x3ed [ 3.444994] ? __warn+0xed/0x150 [ 3.444994] ? free_irq+0xfb/0x480 [ 3.444994] __warn+0x103/0x150 [ 3.444994] ? free_irq+0xfb/0x480 [ 3.444994] report_bug+0x119/0x1c0 [ 3.444994] handle_bug+0x3b/0x80 [ 3.444994] exc_invalid_op+0x18/0x70 [ 3.444994] asm_exc_invalid_op+0x12/0x20 [ 3.444994] RIP: 0010:free_irq+0xfb/0x480 [ 3.444994] Code: 6e 08 74 6f 4d 89 f4 e8 c3 78 09 00 4d 8b 74 24 18 4d 85 f6 75 e3 e8 b4 78 09 00 8b 75 c8 48 c7 c7 a0 ac d5 85 e8 95 d7 f5 ff <0f> 0b 48 8b 75 c0 4c 89 ff e8 87 c5 90 03 48 8b 43 40 4c 8b a0 80 [ 3.444994] RSP: 0000:ffffc90000017b50 EFLAGS: 00010086 [ 3.444994] RAX: 0000000000000000 RBX: ffff888107c6f000 RCX: 0000000000000000 [ 3.444994] RDX: 0000000000000000 RSI: ffffffff8123f301 RDI: 00000000ffffffff [ 3.444994] RBP: ffffc90000017b90 R08: 0000000000000001 R09: 0000000000000003 [ 3.444994] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 [ 3.444994] R13: ffff888107dc0000 R14: ffff888104f6bf00 R15: ffff888107c6f0a8 [ 3.444994] ? vprintk_func+0x71/0x110 [ 3.444994] ns_init_card_error+0x18e/0x250 [ 3.444994] nicstar_init_one+0x10d2/0x1130 [ 3.444994] local_pci_probe+0x4a/0xb0 [ 3.444994] pci_device_probe+0x126/0x1d0 [ 3.444994] ? pci_device_remove+0x100/0x100 [ 3.444994] really_probe+0x27e/0x650 [ 3.444994] driver_probe_device+0x84/0x1d0 [ 3.444994] ? mutex_lock_nested+0x16/0x20 [ 3.444994] device_driver_attach+0x63/0x70 [ 3.444994] __driver_attach+0x117/0x1a0 [ 3.444994] ? device_driver_attach+0x70/0x70 [ 3.444994] bus_for_each_dev+0xb6/0x110 [ 3.444994] ? rdinit_setup+0x40/0x40 [ 3.444994] driver_attach+0x22/0x30 [ 3.444994] bus_add_driver+0x1e6/0x2a0 [ 3.444994] driver_register+0xa4/0x180 [ 3.444994] __pci_register_driver+0x77/0x80 [ 3.444994] ? uPD98402_module_init+0xd/0xd [ 3.444994] nicstar_init+0x1f/0x75 [ 3.444994] do_one_initcall+0x7a/0x3d0 [ 3.444994] ? rdinit_setup+0x40/0x40 [ 3.444994] ? rcu_read_lock_sched_held+0x4a/0x70 [ 3.444994] kernel_init_freeable+0x2a7/0x2f9 [ 3.444994] ? rest_init+0x2c0/0x2c0 [ 3.444994] kernel_init+0x13/0x180 [ 3.444994] ? rest_init+0x2c0/0x2c0 [ 3.444994] ? rest_init+0x2c0/0x2c0 [ 3.444994] ret_from_fork+0x1f/0x30 [ 3.444994] Dumping ftrace buffer: [ 3.444994] (ftrace buffer empty) [ 3.444994] Kernel Offset: disabled [ 3.444994] Rebooting in 1 seconds.. Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21atm: nicstar: use 'dma_free_coherent' instead of 'kfree'Zheyu Ma
When 'nicstar_init_one' fails, 'ns_init_card_error' will be executed for error handling, but the correct memory free function should be used, otherwise it will cause an error. Since 'card->rsq.org' and 'card->tsq.org' are allocated using 'dma_alloc_coherent' function, they should be freed using 'dma_free_coherent'. Fix this by using 'dma_free_coherent' instead of 'kfree' This log reveals it: [ 3.440294] kernel BUG at mm/slub.c:4206! [ 3.441059] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 3.441430] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.12.4-g70e7f0549188-dirty #141 [ 3.441986] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [ 3.442780] RIP: 0010:kfree+0x26a/0x300 [ 3.443065] Code: e8 3a c3 b9 ff e9 d6 fd ff ff 49 8b 45 00 31 db a9 00 00 01 00 75 4d 49 8b 45 00 a9 00 00 01 00 75 0a 49 8b 45 08 a8 01 75 02 <0f> 0b 89 d9 b8 00 10 00 00 be 06 00 00 00 48 d3 e0 f7 d8 48 63 d0 [ 3.443396] RSP: 0000:ffffc90000017b70 EFLAGS: 00010246 [ 3.443396] RAX: dead000000000100 RBX: 0000000000000000 RCX: 0000000000000000 [ 3.443396] RDX: 0000000000000000 RSI: ffffffff85d3df94 RDI: ffffffff85df38e6 [ 3.443396] RBP: ffffc90000017b90 R08: 0000000000000001 R09: 0000000000000001 [ 3.443396] R10: 0000000000000000 R11: 0000000000000001 R12: ffff888107dc0000 [ 3.443396] R13: ffffea00001f0100 R14: ffff888101a8bf00 R15: ffff888107dc0160 [ 3.443396] FS: 0000000000000000(0000) GS:ffff88817bc80000(0000) knlGS:0000000000000000 [ 3.443396] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3.443396] CR2: 0000000000000000 CR3: 000000000642e000 CR4: 00000000000006e0 [ 3.443396] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3.443396] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3.443396] Call Trace: [ 3.443396] ns_init_card_error+0x12c/0x220 [ 3.443396] nicstar_init_one+0x10d2/0x1130 [ 3.443396] local_pci_probe+0x4a/0xb0 [ 3.443396] pci_device_probe+0x126/0x1d0 [ 3.443396] ? pci_device_remove+0x100/0x100 [ 3.443396] really_probe+0x27e/0x650 [ 3.443396] driver_probe_device+0x84/0x1d0 [ 3.443396] ? mutex_lock_nested+0x16/0x20 [ 3.443396] device_driver_attach+0x63/0x70 [ 3.443396] __driver_attach+0x117/0x1a0 [ 3.443396] ? device_driver_attach+0x70/0x70 [ 3.443396] bus_for_each_dev+0xb6/0x110 [ 3.443396] ? rdinit_setup+0x40/0x40 [ 3.443396] driver_attach+0x22/0x30 [ 3.443396] bus_add_driver+0x1e6/0x2a0 [ 3.443396] driver_register+0xa4/0x180 [ 3.443396] __pci_register_driver+0x77/0x80 [ 3.443396] ? uPD98402_module_init+0xd/0xd [ 3.443396] nicstar_init+0x1f/0x75 [ 3.443396] do_one_initcall+0x7a/0x3d0 [ 3.443396] ? rdinit_setup+0x40/0x40 [ 3.443396] ? rcu_read_lock_sched_held+0x4a/0x70 [ 3.443396] kernel_init_freeable+0x2a7/0x2f9 [ 3.443396] ? rest_init+0x2c0/0x2c0 [ 3.443396] kernel_init+0x13/0x180 [ 3.443396] ? rest_init+0x2c0/0x2c0 [ 3.443396] ? rest_init+0x2c0/0x2c0 [ 3.443396] ret_from_fork+0x1f/0x30 [ 3.443396] Modules linked in: [ 3.443396] Dumping ftrace buffer: [ 3.443396] (ftrace buffer empty) [ 3.458593] ---[ end trace 3c6f8f0d8ef59bcd ]--- [ 3.458922] RIP: 0010:kfree+0x26a/0x300 [ 3.459198] Code: e8 3a c3 b9 ff e9 d6 fd ff ff 49 8b 45 00 31 db a9 00 00 01 00 75 4d 49 8b 45 00 a9 00 00 01 00 75 0a 49 8b 45 08 a8 01 75 02 <0f> 0b 89 d9 b8 00 10 00 00 be 06 00 00 00 48 d3 e0 f7 d8 48 63 d0 [ 3.460499] RSP: 0000:ffffc90000017b70 EFLAGS: 00010246 [ 3.460870] RAX: dead000000000100 RBX: 0000000000000000 RCX: 0000000000000000 [ 3.461371] RDX: 0000000000000000 RSI: ffffffff85d3df94 RDI: ffffffff85df38e6 [ 3.461873] RBP: ffffc90000017b90 R08: 0000000000000001 R09: 0000000000000001 [ 3.462372] R10: 0000000000000000 R11: 0000000000000001 R12: ffff888107dc0000 [ 3.462871] R13: ffffea00001f0100 R14: ffff888101a8bf00 R15: ffff888107dc0160 [ 3.463368] FS: 0000000000000000(0000) GS:ffff88817bc80000(0000) knlGS:0000000000000000 [ 3.463949] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3.464356] CR2: 0000000000000000 CR3: 000000000642e000 CR4: 00000000000006e0 [ 3.464856] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3.465356] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3.465860] Kernel panic - not syncing: Fatal exception [ 3.466370] Dumping ftrace buffer: [ 3.466616] (ftrace buffer empty) [ 3.466871] Kernel Offset: disabled [ 3.467122] Rebooting in 1 seconds.. Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21Merge branch 'mptcp-sdeq-fixes'David S. Miller
Mat Martineau says: ==================== mptcp: 32-bit sequence number improvements MPTCP-level sequence numbers are 64 bits, but RFC 8684 allows use of 32-bit sequence numbers in the DSS option to save header space. Those 32-bit numbers are the least significant bits of the full 64-bit sequence number, so the receiver must infer the correct upper 32 bits. These two patches improve the logic for determining the full 64-bit sequence numbers when the 32-bit truncated version has wrapped around. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21mptcp: fix 32 bit DSN expansionPaolo Abeni
The current implementation of 32 bit DSN expansion is buggy. After the previous patch, we can simply reuse the newly introduced helper to do the expansion safely. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/120 Fixes: 648ef4b88673 ("mptcp: Implement MPTCP receive path") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21mptcp: fix bad handling of 32 bit ack wrap-aroundPaolo Abeni
When receiving 32 bits DSS ack from the peer, the MPTCP need to expand them to 64 bits value. The current code is buggy WRT detecting 32 bits ack wrap-around: when the wrap-around happens the current unsigned 32 bit ack value is lower than the previous one. Additionally check for possible reverse wrap and make the helper visible, so that we could re-use it for the next patch. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/204 Fixes: cc9d25669866 ("mptcp: update per unacked sequence on pkt reception") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21netfilter: nf_tables_offload: check FLOW_DISSECTOR_KEY_BASIC in VLAN ↵Pablo Neira Ayuso
transfer logic The VLAN transfer logic should actually check for FLOW_DISSECTOR_KEY_BASIC, not FLOW_DISSECTOR_KEY_CONTROL. Moreover, do not fallback to case 2) .n_proto is set to 802.1q or 802.1ad, if FLOW_DISSECTOR_KEY_BASIC is unset. Fixes: 783003f3bb8a ("netfilter: nftables_offload: special ethertype handling for VLAN") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-21netfilter: nf_tables: memleak in hw offload abort pathPablo Neira Ayuso
Release flow from the abort path, this is easy to reproduce since b72920f6e4a9 ("netfilter: nftables: counter hardware offload support"). If the preparation phase fails, then the abort path is exercised without releasing the flow rule object. unreferenced object 0xffff8881f0fa7700 (size 128): comm "nft", pid 1335, jiffies 4294931120 (age 4163.740s) hex dump (first 32 bytes): 08 e4 de 13 82 88 ff ff 98 e4 de 13 82 88 ff ff ................ 48 e4 de 13 82 88 ff ff 01 00 00 00 00 00 00 00 H............... backtrace: [<00000000634547e7>] flow_rule_alloc+0x26/0x80 [<00000000c8426156>] nft_flow_rule_create+0xc9/0x3f0 [nf_tables] [<0000000075ff8e46>] nf_tables_newrule+0xc79/0x10a0 [nf_tables] [<00000000ba65e40e>] nfnetlink_rcv_batch+0xaac/0xf90 [nfnetlink] [<00000000505c614a>] nfnetlink_rcv+0x1bb/0x1f0 [nfnetlink] [<00000000eb78e1fe>] netlink_unicast+0x34b/0x480 [<00000000a8f72c94>] netlink_sendmsg+0x3af/0x690 [<000000009cb1ddf4>] sock_sendmsg+0x96/0xa0 [<0000000039d06e44>] ____sys_sendmsg+0x3fe/0x440 [<00000000137e82ca>] ___sys_sendmsg+0xd8/0x140 [<000000000c6bf6a6>] __sys_sendmsg+0xb3/0x130 [<0000000043bd6268>] do_syscall_64+0x40/0xb0 [<00000000afdebc2d>] entry_SYSCALL_64_after_hwframe+0x44/0xae Remove flow rule release from the offload commit path, otherwise error from the offload commit phase might trigger a double-free due to the execution of the abort_offload -> abort. After this patch, the abort path takes care of releasing the flow rule. This fix also needs to move the nft_flow_rule_create() call before the transaction object is added otherwise the abort path might find a NULL pointer to the flow rule object for the NFT_CHAIN_HW_OFFLOAD case. While at it, rename BASIC-like goto tags to slightly more meaningful names rather than adding a new "err3" tag. Fixes: 63b48c73ff56 ("netfilter: nf_tables_offload: undo updates if transaction fails") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-21tls: prevent oversized sendfile() hangs by ignoring MSG_MOREJakub Kicinski
We got multiple reports that multi_chunk_sendfile test case from tls selftest fails. This was sort of expected, as the original fix was never applied (see it in the first Link:). The test in question uses sendfile() with count larger than the size of the underlying file. This will make splice set MSG_MORE on all sendpage calls, meaning TLS will never close and flush the last partial record. Eric seem to have addressed a similar problem in commit 35f9c09fe9c7 ("tcp: tcp_sendpages() should call tcp_push() once") by introducing MSG_SENDPAGE_NOTLAST. Unlike MSG_MORE MSG_SENDPAGE_NOTLAST is not set on the last call of a "pipefull" of data (PIPE_DEF_BUFFERS == 16, so every 16 pages or whenever we run out of data). Having a break every 16 pages should be fine, TLS can pack exactly 4 pages into a record, so for aligned reads there should be no difference, unaligned may see one extra record per sendpage(). Sticking to TCP semantics seems preferable to modifying splice, but we can revisit it if real life scenarios show a regression. Reported-by: Vadim Fedorenko <vfedorenko@novek.ru> Reported-by: Seth Forshee <seth.forshee@canonical.com> Link: https://lore.kernel.org/netdev/1591392508-14592-1-git-send-email-pooja.trivedi@stackpath.com/ Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21Merge tag 'linux-can-fixes-for-5.13-20210619' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2021-06-19 this is a pull request of 5 patches for net/master. The first patch is by Thadeu Lima de Souza Cascardo and fixes a potential use-after-free in the CAN broadcast manager socket, by delaying the release of struct bcm_op after synchronize_rcu(). Oliver Hartkopp's patch fixes a similar potential user-after-free in the CAN gateway socket by synchronizing RCU operations before removing gw job entry. Another patch by Oliver Hartkopp fixes a potential use-after-free in the ISOTP socket by omitting unintended hrtimer restarts on socket release. Oleksij Rempel's patch for the j1939 socket fixes a potential use-after-free by setting the SOCK_RCU_FREE flag on the socket. The last patch is by Pavel Skripkin and fixes a use-after-free in the ems_usb CAN driver. All patches are intended for stable and have stable@v.k.o on Cc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21Merge tag 'wireless-drivers-2021-06-19' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for v5.13 Only one important fix for an mwifiex regression. mwifiex * fix deadlock during rmmod or firmware reset, regression from cfg80211 RTNL changes in v5.12-rc1 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21hv_netvsc: Set needed_headroom according to VFHaiyang Zhang
Set needed_headroom according to VF if VF needs a bigger headroom. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21net/netif_receive_skb_core: Use migrate_disable()Sebastian Andrzej Siewior
The preempt disable around do_xdp_generic() has been introduced in commit bbbe211c295ff ("net: rcu lock and preempt disable missing around generic xdp") For BPF it is enough to use migrate_disable() and the code was updated as it can be seen in commit 3c58482a382ba ("bpf: Provide bpf_prog_run_pin_on_cpu() helper") This is a leftover which was not converted. Use migrate_disable() before invoking do_xdp_generic(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21net: sched: add barrier to ensure correct ordering for lockless qdiscYunsheng Lin
The spin_trylock() was assumed to contain the implicit barrier needed to ensure the correct ordering between STATE_MISSED setting/clearing and STATE_MISSED checking in commit a90c57f2cedd ("net: sched: fix packet stuck problem for lockless qdisc"). But it turns out that spin_trylock() only has load-acquire semantic, for strongly-ordered system(like x86), the compiler barrier implicitly contained in spin_trylock() seems enough to ensure the correct ordering. But for weakly-orderly system (like arm64), the store-release semantic is needed to ensure the correct ordering as clear_bit() and test_bit() is store operation, see queued_spin_lock(). So add the explicit barrier to ensure the correct ordering for the above case. Fixes: a90c57f2cedd ("net: sched: fix packet stuck problem for lockless qdisc") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21vrf: do not push non-ND strict packets with a source LLA through packet taps ↵Antoine Tenart
again Non-ND strict packets with a source LLA go through the packet taps again, while non-ND strict packets with other source addresses do not, and we can see a clone of those packets on the vrf interface (we should not). This is due to a series of changes: Commit 6f12fa775530[1] made non-ND strict packets not being pushed again in the packet taps. This changed with commit 205704c618af[2] for those packets having a source LLA, as they need a lookup with the orig_iif. The issue now is those packets do not skip the 'vrf_ip6_rcv' function to the end (as the ones without a source LLA) and go through the check to call packet taps again. This check was changed by commit 6f12fa775530[1] and do not exclude non-strict packets anymore. Packets matching 'need_strict && !is_ndisc && is_ll_src' are now being sent through the packet taps again. This can be seen by dumping packets on the vrf interface. Fix this by having the same code path for all non-ND strict packets and selectively lookup with the orig_iif for those with a source LLA. This has the effect to revert to the pre-205704c618af[2] condition, which should also be easier to maintain. [1] 6f12fa775530 ("vrf: mark skb for multicast or link-local as enslaved to VRF") [2] 205704c618af ("vrf: packets with lladdr src needs dst at input with orig_iif when needs strict") Fixes: 205704c618af ("vrf: packets with lladdr src needs dst at input with orig_iif when needs strict") Cc: Stephen Suryaputra <ssuryaextr@gmail.com> Reported-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-21skmsg: Increase sk->sk_drops when dropping packetsCong Wang
It is hard to observe packet drops without increasing relevant drop counters, here we should increase sk->sk_drops which is a protocol-independent counter. Fortunately psock is always associated with a struct sock, we can just use psock->sk. Suggested-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210615021342.7416-9-xiyou.wangcong@gmail.com
2021-06-21skmsg: Pass source psock to sk_psock_skb_redirect()Cong Wang
sk_psock_skb_redirect() only takes skb as a parameter, we will need to know where this skb is from, so just pass the source psock to this function as a new parameter. This patch prepares for the next one. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210615021342.7416-8-xiyou.wangcong@gmail.com
2021-06-21skmsg: Teach sk_psock_verdict_apply() to return errorsCong Wang
Currently sk_psock_verdict_apply() is void, but it handles some error conditions too. Its caller is impossible to learn whether it succeeds or fails, especially sk_psock_verdict_recv(). Make it return int to indicate error cases and propagate errors to callers properly. Fixes: ef5659280eb1 ("bpf, sockmap: Allow skipping sk_skb parser program") Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210615021342.7416-7-xiyou.wangcong@gmail.com
2021-06-21skmsg: Fix a memory leak in sk_psock_verdict_apply()Cong Wang
If the dest psock does not set SK_PSOCK_TX_ENABLED, the skb can't be queued anywhere so must be dropped. This one is found during code review. Fixes: 799aa7f98d53 ("skmsg: Avoid lock_sock() in sk_psock_backlog()") Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210615021342.7416-6-xiyou.wangcong@gmail.com
2021-06-21skmsg: Clear skb redirect pointer before dropping itCong Wang
When we drop skb inside sk_psock_skb_redirect(), we have to clear its skb->_sk_redir pointer too, otherwise kfree_skb() would misinterpret it as a valid skb->_skb_refdst and dst_release() would eventually complain. Fixes: e3526bb92a20 ("skmsg: Move sk_redir from TCP_SKB_CB to skb") Reported-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210615021342.7416-5-xiyou.wangcong@gmail.com
2021-06-21udp: Fix a memory leak in udp_read_sock()Cong Wang
sk_psock_verdict_recv() clones the skb and uses the clone afterward, so udp_read_sock() should free the skb after using it, regardless of error or not. This fixes a real kmemleak. Fixes: d7f571188ecf ("udp: Implement ->read_sock() for sockmap") Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210615021342.7416-4-xiyou.wangcong@gmail.com
2021-06-21selftests/bpf: Retry for EAGAIN in udp_redir_to_connected()Cong Wang
We use non-blocking sockets for testing sockmap redirections, and got some random EAGAIN errors from UDP tests. There is no guarantee the packet would be immediately available to receive as soon as it is sent out, even on the local host. For UDP, this is especially true because it does not lock the sock during BH (unlike the TCP path). This is probably why we only saw this error in UDP cases. No matter how hard we try to make the queue empty check accurate, it is always possible for recvmsg() to beat ->sk_data_ready(). Therefore, we should just retry in case of EAGAIN. Fixes: d6378af615275 ("selftests/bpf: Add a test case for udp sockmap") Reported-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210615021342.7416-3-xiyou.wangcong@gmail.com
2021-06-21skmsg: Improve udp_bpf_recvmsg() accuracyCong Wang
I tried to reuse sk_msg_wait_data() for different protocols, but it turns out it can not be simply reused. For example, UDP actually uses two queues to receive skb: udp_sk(sk)->reader_queue and sk->sk_receive_queue. So we have to check both of them to know whether we have received any packet. Also, UDP does not lock the sock during BH Rx path, it makes no sense for its ->recvmsg() to lock the sock. It is always possible for ->recvmsg() to be called before packets actually arrive in the receive queue, we just use best effort to make it accurate here. Fixes: 1f5be6b3b063 ("udp: Implement udp_bpf_recvmsg() for sockmap") Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210615021342.7416-2-xiyou.wangcong@gmail.com
2021-06-19net: can: ems_usb: fix use-after-free in ems_usb_disconnect()Pavel Skripkin
In ems_usb_disconnect() dev pointer, which is netdev private data, is used after free_candev() call: | if (dev) { | unregister_netdev(dev->netdev); | free_candev(dev->netdev); | | unlink_all_urbs(dev); | | usb_free_urb(dev->intr_urb); | | kfree(dev->intr_in_buffer); | kfree(dev->tx_msg_buffer); | } Fix it by simply moving free_candev() at the end of the block. Fail log: | BUG: KASAN: use-after-free in ems_usb_disconnect | Read of size 8 at addr ffff88804e041008 by task kworker/1:2/2895 | | CPU: 1 PID: 2895 Comm: kworker/1:2 Not tainted 5.13.0-rc5+ #164 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.4 | Workqueue: usb_hub_wq hub_event | Call Trace: | dump_stack (lib/dump_stack.c:122) | print_address_description.constprop.0.cold (mm/kasan/report.c:234) | kasan_report.cold (mm/kasan/report.c:420 mm/kasan/report.c:436) | ems_usb_disconnect (drivers/net/can/usb/ems_usb.c:683 drivers/net/can/usb/ems_usb.c:1058) Fixes: 702171adeed3 ("ems_usb: Added support for EMS CPC-USB/ARM7 CAN/USB interface") Link: https://lore.kernel.org/r/20210617185130.5834-1-paskripkin@gmail.com Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-06-19can: j1939: j1939_sk_init(): set SOCK_RCU_FREE to call sk_destruct() after ↵Oleksij Rempel
RCU is done Set SOCK_RCU_FREE to let RCU to call sk_destruct() on completion. Without this patch, we will run in to j1939_can_recv() after priv was freed by j1939_sk_release()->j1939_sk_sock_destruct() Fixes: 25fe97cb7620 ("can: j1939: move j1939_priv_put() into sk_destruct callback") Link: https://lore.kernel.org/r/20210617130623.12705-1-o.rempel@pengutronix.de Cc: linux-stable <stable@vger.kernel.org> Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Reported-by: syzbot+bdf710cfc41c186fdff3@syzkaller.appspotmail.com Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-06-19can: isotp: isotp_release(): omit unintended hrtimer restart on socket releaseOliver Hartkopp
When closing the isotp socket, the potentially running hrtimers are canceled before removing the subscription for CAN identifiers via can_rx_unregister(). This may lead to an unintended (re)start of a hrtimer in isotp_rcv_cf() and isotp_rcv_fc() in the case that a CAN frame is received by isotp_rcv() while the subscription removal is processed. However, isotp_rcv() is called under RCU protection, so after calling can_rx_unregister, we may call synchronize_rcu in order to wait for any RCU read-side critical sections to finish. This prevents the reception of CAN frames after hrtimer_cancel() and therefore the unintended (re)start of the hrtimers. Link: https://lore.kernel.org/r/20210618173713.2296-1-socketcan@hartkopp.net Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol") Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-06-19can: gw: synchronize rcu operations before removing gw job entryOliver Hartkopp
can_can_gw_rcv() is called under RCU protection, so after calling can_rx_unregister(), we have to call synchronize_rcu in order to wait for any RCU read-side critical sections to finish before removing the kmem_cache entry with the referenced gw job entry. Link: https://lore.kernel.org/r/20210618173645.2238-1-socketcan@hartkopp.net Fixes: c1aabdf379bc ("can-gw: add netlink based CAN routing") Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-06-19can: bcm: delay release of struct bcm_op after synchronize_rcu()Thadeu Lima de Souza Cascardo
can_rx_register() callbacks may be called concurrently to the call to can_rx_unregister(). The callbacks and callback data, though, are protected by RCU and the struct sock reference count. So the callback data is really attached to the life of sk, meaning that it should be released on sk_destruct. However, bcm_remove_op() calls tasklet_kill(), and RCU callbacks may be called under RCU softirq, so that cannot be used on kernels before the introduction of HRTIMER_MODE_SOFT. However, bcm_rx_handler() is called under RCU protection, so after calling can_rx_unregister(), we may call synchronize_rcu() in order to wait for any RCU read-side critical sections to finish. That is, bcm_rx_handler() won't be called anymore for those ops. So, we only free them, after we do that synchronize_rcu(). Fixes: ffd980f976e7 ("[CAN]: Add broadcast manager (bcm) protocol") Link: https://lore.kernel.org/r/20210619161813.2098382-1-cascardo@canonical.com Cc: linux-stable <stable@vger.kernel.org> Reported-by: syzbot+0f7e7e5e2f4f40fa89c0@syzkaller.appspotmail.com Reported-by: Norbert Slusarek <nslusarek@gmx.net> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-06-19Merge branch 'ezchip-fixes'David S. Miller
Pavel Skripkin says: ==================== net: ethernat: ezchip: bug fixing and code improvments While manual code reviewing, I found some error in ezchip driver. Two of them looks very dangerous: 1. use-after-free in nps_enet_remove Accessing netdev private data after free_netdev() 2. wrong error handling of platform_get_irq() It can cause passing negative irq to request_irq() Also, in 2nd patch I removed redundant check to increase execution speed and make code more straightforward. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-19net: ethernet: ezchip: fix error handlingPavel Skripkin
As documented at drivers/base/platform.c for platform_get_irq: * Gets an IRQ for a platform device and prints an error message if finding the * IRQ fails. Device drivers should check the return value for errors so as to * not pass a negative integer value to the request_irq() APIs. So, the driver should check that platform_get_irq() return value is _negative_, not that it's equal to zero, because -ENXIO (return value from request_irq() if irq was not found) will pass this check and it leads to passing negative irq to request_irq() Fixes: 0dd077093636 ("NET: Add ezchip ethernet driver") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-19net: ethernet: ezchip: remove redundant checkPavel Skripkin
err varibale will be set everytime, when code gets into this path. This check will just slowdown the execution and that's all. Fixes: 0dd077093636 ("NET: Add ezchip ethernet driver") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-19net: ethernet: ezchip: fix UAF in nps_enet_removePavel Skripkin
priv is netdev private data, but it is used after free_netdev(). It can cause use-after-free when accessing priv pointer. So, fix it by moving free_netdev() after netif_napi_del() call. Fixes: 0dd077093636 ("NET: Add ezchip ethernet driver") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-19net: ethernet: aeroflex: fix UAF in greth_of_removePavel Skripkin
static int greth_of_remove(struct platform_device *of_dev) { ... struct greth_private *greth = netdev_priv(ndev); ... unregister_netdev(ndev); free_netdev(ndev); of_iounmap(&of_dev->resource[0], greth->regs, resource_size(&of_dev->resource[0])); ... } greth is netdev private data, but it is used after free_netdev(). It can cause use-after-free when accessing greth pointer. So, fix it by moving free_netdev() after of_iounmap() call. Fixes: d4c41139df6e ("net: Add Aeroflex Gaisler 10/100/1G Ethernet MAC driver") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18Merge tag 'net-5.13-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.13-rc7, including fixes from wireless, bpf, bluetooth, netfilter and can. Current release - regressions: - mlxsw: spectrum_qdisc: Pass handle, not band number to find_class() to fix modifying offloaded qdiscs - lantiq: net: fix duplicated skb in rx descriptor ring - rtnetlink: fix regression in bridge VLAN configuration, empty info is not an error, bot-generated "fix" was not needed - libbpf: s/rx/tx/ typo on umem->rx_ring_setup_done to fix umem creation Current release - new code bugs: - ethtool: fix NULL pointer dereference during module EEPROM dump via the new netlink API - mlx5e: don't update netdev RQs with PTP-RQ, the special purpose queue should not be visible to the stack - mlx5e: select special PTP queue only for SKBTX_HW_TSTAMP skbs - mlx5e: verify dev is present in get devlink port ndo, avoid a panic Previous releases - regressions: - neighbour: allow NUD_NOARP entries to be force GCed - further fixes for fallout from reorg of WiFi locking (staging: rtl8723bs, mac80211, cfg80211) - skbuff: fix incorrect msg_zerocopy copy notifications - mac80211: fix NULL ptr deref for injected rate info - Revert "net/mlx5: Arm only EQs with EQEs" it may cause missed IRQs Previous releases - always broken: - bpf: more speculative execution fixes - netfilter: nft_fib_ipv6: skip ipv6 packets from any to link-local - udp: fix race between close() and udp_abort() resulting in a panic - fix out of bounds when parsing TCP options before packets are validated (in netfilter: synproxy, tc: sch_cake and mptcp) - mptcp: improve operation under memory pressure, add missing wake-ups - mptcp: fix double-lock/soft lookup in subflow_error_report() - bridge: fix races (null pointer deref and UAF) in vlan tunnel egress - ena: fix DMA mapping function issues in XDP - rds: fix memory leak in rds_recvmsg Misc: - vrf: allow larger MTUs - icmp: don't send out ICMP messages with a source address of 0.0.0.0 - cdc_ncm: switch to eth%d interface naming" * tag 'net-5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (139 commits) net: ethernet: fix potential use-after-free in ec_bhf_remove selftests/net: Add icmp.sh for testing ICMP dummy address responses icmp: don't send out ICMP messages with a source address of 0.0.0.0 net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY net: ll_temac: Fix TX BD buffer overwrite net: ll_temac: Add memory-barriers for TX BD access net: ll_temac: Make sure to free skb when it is completely used MAINTAINERS: add Guvenc as SMC maintainer bnxt_en: Call bnxt_ethtool_free() in bnxt_init_one() error path bnxt_en: Fix TQM fastpath ring backing store computation bnxt_en: Rediscover PHY capabilities after firmware reset cxgb4: fix wrong shift. mac80211: handle various extensible elements correctly mac80211: reset profile_periodicity/ema_ap cfg80211: avoid double free of PMSR request cfg80211: make certificate generation more robust mac80211: minstrel_ht: fix sample time check net: qed: Fix memcpy() overflow of qed_dcbx_params() net: cdc_eem: fix tx fixup skb leak net: hamradio: fix memory leak in mkiss_close ...
2021-06-18Merge tag 'for-5.13-rc6-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "One more fix, for a space accounting bug in zoned mode. It happens when a block group is switched back rw->ro and unusable bytes (due to zoned constraints) are subtracted twice. It has user visible effects so I consider it important enough for late -rc inclusion and backport to stable" * tag 'for-5.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zoned: fix negative space_info->bytes_readonly
2021-06-18Merge tag 'pci-v5.13-fixes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: - Clear 64-bit flag for host bridge windows below 4GB to fix a resource allocation regression added in -rc1 (Punit Agrawal) - Fix tegra194 MCFG quirk build regressions added in -rc1 (Jon Hunter) - Avoid secondary bus resets on TI KeyStone C667X devices (Antti Järvinen) - Avoid secondary bus resets on some NVIDIA GPUs (Shanker Donthineni) - Work around FLR erratum on Huawei Intelligent NIC VF (Chiqijun) - Avoid broken ATS on AMD Navi14 GPU (Evan Quan) - Trust Broadcom BCM57414 NIC to isolate functions even though it doesn't advertise ACS support (Sriharsha Basavapatna) - Work around AMD RS690 BIOSes that don't configure DMA above 4GB (Mikel Rychliski) - Fix panic during PIO transfer on Aardvark controller (Pali Rohár) * tag 'pci-v5.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: aardvark: Fix kernel panic during PIO transfer PCI: Add AMD RS690 quirk to enable 64-bit DMA PCI: Add ACS quirk for Broadcom BCM57414 NIC PCI: Mark AMD Navi14 GPU ATS as broken PCI: Work around Huawei Intelligent NIC VF FLR erratum PCI: Mark some NVIDIA GPUs to avoid bus reset PCI: Mark TI C667X to avoid bus reset PCI: tegra194: Fix MCFG quirk build regressions PCI: of: Clear 64-bit flag for non-prefetchable memory below 4GB
2021-06-18afs: Re-enable freezing once a page fault is interruptedMatthew Wilcox (Oracle)
If a task is killed during a page fault, it does not currently call sb_end_pagefault(), which means that the filesystem cannot be frozen at any time thereafter. This may be reported by lockdep like this: ==================================== WARNING: fsstress/10757 still has locks held! 5.13.0-rc4-build4+ #91 Not tainted ------------------------------------ 1 lock held by fsstress/10757: #0: ffff888104eac530 ( sb_pagefaults as filesystem freezing is modelled as a lock. Fix this by removing all the direct returns from within the function, and using 'ret' to indicate whether we were interrupted or successful. Fixes: 1cf7a1518aef ("afs: Implement shared-writeable mmap") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/20210616154900.1958373-1-willy@infradead.org/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-18net: ethernet: fix potential use-after-free in ec_bhf_removePavel Skripkin
static void ec_bhf_remove(struct pci_dev *dev) { ... struct ec_bhf_priv *priv = netdev_priv(net_dev); unregister_netdev(net_dev); free_netdev(net_dev); pci_iounmap(dev, priv->dma_io); pci_iounmap(dev, priv->io); ... } priv is netdev private data, but it is used after free_netdev(). It can cause use-after-free when accessing priv pointer. So, fix it by moving free_netdev() after pci_iounmap() calls. Fixes: 6af55ff52b02 ("Driver for Beckhoff CX5020 EtherCAT master module.") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18Merge tag 'mac80211-for-net-2021-06-18' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== A couple of straggler fixes: * a minstrel HT sample check fix * peer measurement could double-free on races * certificate file generation at build time could sometimes hang * some parameters weren't reset between connections in mac80211 * some extensible elements were treated as non- extensible, possibly causuing bad connections (or failures) if the AP adds data ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18selftests/net: Add icmp.sh for testing ICMP dummy address responsesToke Høiland-Jørgensen
This adds a new icmp.sh selftest for testing that the kernel will respond correctly with an ICMP unreachable message with the dummy (192.0.0.8) source address when there are no IPv4 addresses configured to use as source addresses. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18icmp: don't send out ICMP messages with a source address of 0.0.0.0Toke Høiland-Jørgensen
When constructing ICMP response messages, the kernel will try to pick a suitable source address for the outgoing packet. However, if no IPv4 addresses are configured on the system at all, this will fail and we end up producing an ICMP message with a source address of 0.0.0.0. This can happen on a box routing IPv4 traffic via v6 nexthops, for instance. Since 0.0.0.0 is not generally routable on the internet, there's a good chance that such ICMP messages will never make it back to the sender of the original packet that the ICMP message was sent in response to. This, in turn, can create connectivity and PMTUd problems for senders. Fortunately, RFC7600 reserves a dummy address to be used as a source for ICMP messages (192.0.0.8/32), so let's teach the kernel to substitute that address as a last resort if the regular source address selection procedure fails. Below is a quick example reproducing this issue with network namespaces: ip netns add ns0 ip l add type veth peer netns ns0 ip l set dev veth0 up ip a add 10.0.0.1/24 dev veth0 ip a add fc00:dead:cafe:42::1/64 dev veth0 ip r add 10.1.0.0/24 via inet6 fc00:dead:cafe:42::2 ip -n ns0 l set dev veth0 up ip -n ns0 a add fc00:dead:cafe:42::2/64 dev veth0 ip -n ns0 r add 10.0.0.0/24 via inet6 fc00:dead:cafe:42::1 ip netns exec ns0 sysctl -w net.ipv4.icmp_ratelimit=0 ip netns exec ns0 sysctl -w net.ipv4.ip_forward=1 tcpdump -tpni veth0 -c 2 icmp & ping -w 1 10.1.0.1 > /dev/null tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on veth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 29, seq 1, length 64 IP 0.0.0.0 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92 2 packets captured 2 packets received by filter 0 packets dropped by kernel With this patch the above capture changes to: IP 10.0.0.1 > 10.1.0.1: ICMP echo request, id 31127, seq 1, length 64 IP 192.0.0.8 > 10.0.0.1: ICMP net 10.1.0.1 unreachable, length 92 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Juliusz Chroboczek <jch@irif.fr> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSYEsben Haabendal
As documented in Documentation/networking/driver.rst, the ndo_start_xmit method must not return NETDEV_TX_BUSY under any normal circumstances, and as recommended, we simply stop the tx queue in advance, when there is a risk that the next xmit would cause a NETDEV_TX_BUSY return. Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18net: ll_temac: Fix TX BD buffer overwriteEsben Haabendal
Just as the initial check, we need to ensure num_frag+1 buffers available, as that is the number of buffers we are going to use. This fixes a buffer overflow, which might be seen during heavy network load. Complete lockup of TEMAC was reproducible within about 10 minutes of a particular load. Fixes: 84823ff80f74 ("net: ll_temac: Fix race condition causing TX hang") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18net: ll_temac: Add memory-barriers for TX BD accessEsben Haabendal
Add a couple of memory-barriers to ensure correct ordering of read/write access to TX BDs. In xmit_done, we should ensure that reading the additional BD fields are only done after STS_CTRL_APP0_CMPLT bit is set. When xmit_done marks the BD as free by setting APP0=0, we need to ensure that the other BD fields are reset first, so we avoid racing with the xmit path, which writes to the same fields. Finally, making sure to read APP0 of next BD after the current BD, ensures that we see all available buffers. Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18net: ll_temac: Make sure to free skb when it is completely usedEsben Haabendal
With the skb pointer piggy-backed on the TX BD, we have a simple and efficient way to free the skb buffer when the frame has been transmitted. But in order to avoid freeing the skb while there are still fragments from the skb in use, we need to piggy-back on the TX BD of the skb, not the first. Without this, we are doing use-after-free on the DMA side, when the first BD of a multi TX BD packet is seen as completed in xmit_done, and the remaining BDs are still being processed. Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18MAINTAINERS: add Guvenc as SMC maintainerKarsten Graul
Add Guvenc as maintainer for Shared Memory Communications (SMC) Sockets. Cc: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18Merge branch 'bnxt_en-fixes'David S. Miller
Michael Chan says: ==================== bnxt_en: Bug fixes This patchset includes 3 small bug fixes to reinitialize PHY capabilities after firmware reset, setup the chip's internal TQM fastpath ring backing memory properly for RoCE traffic, and to free ethtool related memory if driver probe fails. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18bnxt_en: Call bnxt_ethtool_free() in bnxt_init_one() error pathSomnath Kotur
bnxt_ethtool_init() may have allocated some memory and we need to call bnxt_ethtool_free() to properly unwind if bnxt_init_one() fails. Fixes: 7c3809181468 ("bnxt_en: Refactor bnxt_init_one() and turn on TPA support on 57500 chips.") Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18bnxt_en: Fix TQM fastpath ring backing store computationRukhsana Ansari
TQM fastpath ring needs to be sized to store both the requester and responder side of RoCE QPs in TQM for supporting bi-directional tests. Fix bnxt_alloc_ctx_mem() to multiply the RoCE QPs by a factor of 2 when computing the number of entries for TQM fastpath ring. This fixes an RX pipeline stall issue when running bi-directional max RoCE QP tests. Fixes: c7dd7ab4b204 ("bnxt_en: Improve TQM ring context memory sizing formulas.") Signed-off-by: Rukhsana Ansari <rukhsana.ansari@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>