summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-05-28x86/ioperm: Prevent a memory leak when fork failsJay Lang
In the copy_process() routine called by _do_fork(), failure to allocate a PID (or further along in the function) will trigger an invocation to exit_thread(). This is done to clean up from an earlier call to copy_thread_tls(). Naturally, the child task is passed into exit_thread(), however during the process, io_bitmap_exit() nullifies the parent's io_bitmap rather than the child's. As copy_thread_tls() has been called ahead of the failure, the reference count on the calling thread's io_bitmap is incremented as we would expect. However, io_bitmap_exit() doesn't accept any arguments, and thus assumes it should trash the current thread's io_bitmap reference rather than the child's. This is pretty sneaky in practice, because in all instances but this one, exit_thread() is called with respect to the current task and everything works out. A determined attacker can issue an appropriate ioctl (i.e. KDENABIO) to get a bitmap allocated, and force a clone3() syscall to fail by passing in a zeroed clone_args structure. The kernel handles the erroneous struct and the buggy code path is followed, and even though the parent's reference to the io_bitmap is trashed, the child still holds a reference and thus the structure will never be freed. Fix this by tweaking io_bitmap_exit() and its subroutines to accept a task_struct argument which to operate on. Fixes: ea5f1cd7ab49 ("x86/ioperm: Remove bitmap if all permissions dropped") Signed-off-by: Jay Lang <jaytlang@mit.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable#@vger.kernel.org Link: https://lkml.kernel.org/r/20200524162742.253727-1-jaytlang@mit.edu
2020-05-28Merge tag 'csky-for-linus-5.7-rc8' of git://github.com/c-sky/csky-linuxLinus Torvalds
Pull csky fixes from Guo Ren: "Another four fixes for csky: - fix req_syscall debug - fix abiv2 syscall_trace - fix preempt enable - clean up regs usage in entry.S" * tag 'csky-for-linus-5.7-rc8' of git://github.com/c-sky/csky-linux: csky: Fixup CONFIG_DEBUG_RSEQ csky: Coding convention in entry.S csky: Fixup abiv2 syscall_trace break a4 & a5 csky: Fixup CONFIG_PREEMPT panic
2020-05-28Revert "block: end bio with BLK_STS_AGAIN in case of non-mq devs and REQ_NOWAIT"Jens Axboe
This reverts commit c58c1f83436b501d45d4050fd1296d71a9760bcb. io_uring does do the right thing for this case, and we're still returning -EAGAIN to userspace for the cases we don't support. Revert this change to avoid doing endless spins of resubmits. Cc: stable@vger.kernel.org # v5.6 Reported-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-28include/asm-generic/topology.h: guard cpumask_of_node() macro argumentArnd Bergmann
drivers/hwmon/amd_energy.c:195:15: error: invalid operands to binary expression ('void' and 'int') (channel - data->nr_cpus)); ~~~~~~~~~^~~~~~~~~~~~~~~~~ include/asm-generic/topology.h:51:42: note: expanded from macro 'cpumask_of_node' #define cpumask_of_node(node) ((void)node, cpu_online_mask) ^~~~ include/linux/cpumask.h:618:72: note: expanded from macro 'cpumask_first_and' #define cpumask_first_and(src1p, src2p) cpumask_next_and(-1, (src1p), (src2p)) ^~~~~ Fixes: f0b848ce6fe9 ("cpumask: Introduce cpumask_of_{node,pcibus} to replace {node,pcibus}_to_cpumask") Fixes: 8abee9566b7e ("hwmon: Add amd_energy driver to report energy counters") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Guenter Roeck <linux@roeck-us.net> Link: http://lkml.kernel.org/r/20200527134623.930247-1-arnd@arndb.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-28fs/binfmt_elf.c: allocate initialized memory in fill_thread_core_info()Alexander Potapenko
KMSAN reported uninitialized data being written to disk when dumping core. As a result, several kilobytes of kmalloc memory may be written to the core file and then read by a non-privileged user. Reported-by: sam <sunhaoyl@outlook.com> Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200419100848.63472-1-glider@google.com Link: https://github.com/google/kmsan/issues/76 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-28mm: remove VM_BUG_ON(PageSlab()) from page_mapcount()Konstantin Khlebnikov
Replace superfluous VM_BUG_ON() with comment about correct usage. Technically reverts commit 1d148e218a0d ("mm: add VM_BUG_ON_PAGE() to page_mapcount()"), but context lines have changed. Function isolate_migratepages_block() runs some checks out of lru_lock when choose pages for migration. After checking PageLRU() it checks extra page references by comparing page_count() and page_mapcount(). Between these two checks page could be removed from lru, freed and taken by slab. As a result this race triggers VM_BUG_ON(PageSlab()) in page_mapcount(). Race window is tiny. For certain workload this happens around once a year. page:ffffea0105ca9380 count:1 mapcount:0 mapping:ffff88ff7712c180 index:0x0 compound_mapcount: 0 flags: 0x500000000008100(slab|head) raw: 0500000000008100 dead000000000100 dead000000000200 ffff88ff7712c180 raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000 page dumped because: VM_BUG_ON_PAGE(PageSlab(page)) ------------[ cut here ]------------ kernel BUG at ./include/linux/mm.h:628! invalid opcode: 0000 [#1] SMP NOPTI CPU: 77 PID: 504 Comm: kcompactd1 Tainted: G W 4.19.109-27 #1 Hardware name: Yandex T175-N41-Y3N/MY81-EX0-Y3N, BIOS R05 06/20/2019 RIP: 0010:isolate_migratepages_block+0x986/0x9b0 The code in isolate_migratepages_block() was added in commit 119d6d59dcc0 ("mm, compaction: avoid isolating pinned pages") before adding VM_BUG_ON into page_mapcount(). This race has been predicted in 2015 by Vlastimil Babka (see link below). [akpm@linux-foundation.org: comment tweaks, per Hugh] Fixes: 1d148e218a0d ("mm: add VM_BUG_ON_PAGE() to page_mapcount()") Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/159032779896.957378.7852761411265662220.stgit@buzz Link: https://lore.kernel.org/lkml/557710E1.6060103@suse.cz/ Link: https://lore.kernel.org/linux-mm/158937872515.474360.5066096871639561424.stgit@buzz/T/ (v1) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-28mm,thp: stop leaking unreleased file pagesHugh Dickins
When collapse_file() calls try_to_release_page(), it has already isolated the page: so if releasing buffers happens to fail (as it sometimes does), remember to putback_lru_page(): otherwise that page is left unreclaimable and unfreeable, and the file extent uncollapsible. Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@surriel.com> Cc: <stable@vger.kernel.org> [5.4+] Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2005231837500.1766@eggly.anvils Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-28mm/z3fold: silence kmemleak false positives of slotsQian Cai
Kmemleak reported many leaks while under memory pressue in, slots = alloc_slots(pool, gfp); which is referenced by "zhdr" in init_z3fold_page(), zhdr->slots = slots; However, "zhdr" could be gone without freeing slots as the later will be freed separately when the last "handle" off of "handles" array is freed. It will be within "slots" which is always aligned. unreferenced object 0xc000000fdadc1040 (size 104): comm "oom04", pid 140476, jiffies 4295359280 (age 3454.970s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: z3fold_zpool_malloc+0x7b0/0xe10 alloc_slots at mm/z3fold.c:214 (inlined by) init_z3fold_page at mm/z3fold.c:412 (inlined by) z3fold_alloc at mm/z3fold.c:1161 (inlined by) z3fold_zpool_malloc at mm/z3fold.c:1735 zpool_malloc+0x34/0x50 zswap_frontswap_store+0x60c/0xda0 zswap_frontswap_store at mm/zswap.c:1093 __frontswap_store+0x128/0x330 swap_writepage+0x58/0x110 pageout+0x16c/0xa40 shrink_page_list+0x1ac8/0x25c0 shrink_inactive_list+0x270/0x730 shrink_lruvec+0x444/0xf30 shrink_node+0x2a4/0x9c0 do_try_to_free_pages+0x158/0x640 try_to_free_pages+0x1bc/0x5f0 __alloc_pages_slowpath.constprop.60+0x4dc/0x15a0 __alloc_pages_nodemask+0x520/0x650 alloc_pages_vma+0xc0/0x420 handle_mm_fault+0x1174/0x1bf0 Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vitaly Wool <vitaly.wool@konsulko.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: http://lkml.kernel.org/r/20200522220052.2225-1-cai@lca.pw Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-28x86/dma: Fix max PFN arithmetic overflow on 32 bit systemsAlexander Dahl
The intermediate result of the old term (4UL * 1024 * 1024 * 1024) is 4 294 967 296 or 0x100000000 which is no problem on 64 bit systems. The patch does not change the later overall result of 0x100000 for MAX_DMA32_PFN (after it has been shifted by PAGE_SHIFT). The new calculation yields the same result, but does not require 64 bit arithmetic. On 32 bit systems the old calculation suffers from an arithmetic overflow in that intermediate term in braces: 4UL aka unsigned long int is 4 byte wide and an arithmetic overflow happens (the 0x100000000 does not fit in 4 bytes), the in braces result is truncated to zero, the following right shift does not alter that, so MAX_DMA32_PFN evaluates to 0 on 32 bit systems. That wrong value is a problem in a comparision against MAX_DMA32_PFN in the init code for swiotlb in pci_swiotlb_detect_4gb() to decide if swiotlb should be active. That comparison yields the opposite result, when compiling on 32 bit systems. This was not possible before 1b7e03ef7570 ("x86, NUMA: Enable emulation on 32bit too") when that MAX_DMA32_PFN was first made visible to x86_32 (and which landed in v3.0). In practice this wasn't a problem, unless CONFIG_SWIOTLB is active on x86-32. However if one has set CONFIG_IOMMU_INTEL, since c5a5dc4cbbf4 ("iommu/vt-d: Don't switch off swiotlb if bounce page is used") there's a dependency on CONFIG_SWIOTLB, which was not necessarily active before. That landed in v5.4, where we noticed it in the fli4l Linux distribution. We have CONFIG_IOMMU_INTEL active on both 32 and 64 bit kernel configs there (I could not find out why, so let's just say historical reasons). The effect is at boot time 64 MiB (default size) were allocated for bounce buffers now, which is a noticeable amount of memory on small systems like pcengines ALIX 2D3 with 256 MiB memory, which are still frequently used as home routers. We noticed this effect when migrating from kernel v4.19 (LTS) to v5.4 (LTS) in fli4l and got that kernel messages for example: Linux version 5.4.22 (buildroot@buildroot) (gcc version 7.3.0 (Buildroot 2018.02.8)) #1 SMP Mon Nov 26 23:40:00 CET 2018 … Memory: 183484K/261756K available (4594K kernel code, 393K rwdata, 1660K rodata, 536K init, 456K bss , 78272K reserved, 0K cma-reserved, 0K highmem) … PCI-DMA: Using software bounce buffering for IO (SWIOTLB) software IO TLB: mapped [mem 0x0bb78000-0x0fb78000] (64MB) The initial analysis and the suggested fix was done by user 'sourcejedi' at stackoverflow and explicitly marked as GPLv2 for inclusion in the Linux kernel: https://unix.stackexchange.com/a/520525/50007 The new calculation, which does not suffer from that overflow, is the same as for arch/mips now as suggested by Robin Murphy. The fix was tested by fli4l users on round about two dozen different systems, including both 32 and 64 bit archs, bare metal and virtualized machines. [ bp: Massage commit message. ] Fixes: 1b7e03ef7570 ("x86, NUMA: Enable emulation on 32bit too") Reported-by: Alan Jenkins <alan.christopher.jenkins@gmail.com> Suggested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Alexander Dahl <post@lespocky.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Link: https://unix.stackexchange.com/q/520065/50007 Link: https://web.nettworks.org/bugs/browse/FFL-2560 Link: https://lkml.kernel.org/r/20200526175749.20742-1-post@lespocky.de
2020-05-28Merge branch '100GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 100GbE Intel Wired LAN Driver Updates 2020-05-27 This series contains updates to the ice driver only. Jesse fixes a number of issues, starting with fixing the remaining signed versus unsigned comparison issues. Cleaned up an unused code define. Fixed the implementation of the manage MAC write command, to simplify it by using a simple array to represent the MAC address when writing it. Paul fixes the setting of the VF default LAN address, by removing a check that assumed that the address had been deleted and zeroed. Surabhi prevents a memory leak on filter management initialization failures and during queue initialization and buffer allocation failures. Brett adds additional receive error counters that are reported by ethtool. Fixed the enabling and disabling of VLAN stripping when the PVID has been set. Evan fixes a race condition between the firmware and software, which can occur between the admin queue setup and the first command sent. Marta fixes the driver when XDP transmit rings are destroyed, also make sure the XDP transmit queues are also destroyed. Update the statistics when XDP transmit programs are loaded and packets are sent. Changed the number of XDP transmit queues to match the number of receive queues, instead of matching the number of transmit queues. Bruce avoids undefined behavior by not writing the 8-bit element init_q_state with the associated internal-to-hardware field which is 122-bits. Anirudh (Ani) refactors the receive checksum checks. Krzysztof notifies the user if the fill queue is not long enough to prepare all buffers before packet processing starts and allocates the buffers during the NAPI poll. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28Merge branch 'remove-most-callers-of-kernel_setsockopt-v3'David S. Miller
Christoph Hellwig says: ==================== remove most callers of kernel_setsockopt v3 this series removes most callers of the kernel_setsockopt functions, and instead switches their users to small functions that implement setting a sockopt directly using a normal kernel function call with type safety and all the other benefits of not having a function call. In some cases these functions seem pretty heavy handed as they do a lock_sock even for just setting a single variable, but this mirrors the real setsockopt implementation unlike a few drivers that just set set the fields directly. Changes since v2: - drop the separately merged kernel_getopt_removal - drop the sctp patches, as there is conflicting cleanup going on - add an additional ACK for the rxrpc changes Changes since v1: - use ->getname for sctp sockets in dlm - add a new ->bind_add struct proto method for dlm/sctp - switch the ipv6 and remaining sctp helpers to inline function so that the ipv6 and sctp modules are not pulled in by any module that could potentially use ipv6 or sctp connections - remove arguments to various sock_* helpers that are always used with the same constant arguments ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28tipc: call tsk_set_importance from tipc_topsrv_create_listenerChristoph Hellwig
Avoid using kernel_setsockopt for the TIPC_IMPORTANCE option when we can just use the internal helper. The only change needed is to pass a struct sock instead of tipc_sock, which is private to socket.c Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28rxrpc: add rxrpc_sock_set_min_security_levelChristoph Hellwig
Add a helper to directly set the RXRPC_MIN_SECURITY_LEVEL sockopt from kernel space without going through a fake uaccess. Thanks to David Howells for the documentation updates. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28ipv6: add ip6_sock_set_recvpktinfoChristoph Hellwig
Add a helper to directly set the IPV6_RECVPKTINFO sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28ipv6: add ip6_sock_set_addr_preferencesChristoph Hellwig
Add a helper to directly set the IPV6_ADD_PREFERENCES sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28ipv6: add ip6_sock_set_recverrChristoph Hellwig
Add a helper to directly set the IPV6_RECVERR sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28ipv6: add ip6_sock_set_v6onlyChristoph Hellwig
Add a helper to directly set the IPV6_V6ONLY sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28ipv4: add ip_sock_set_pktinfoChristoph Hellwig
Add a helper to directly set the IP_PKTINFO sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28ipv4: add ip_sock_set_mtu_discoverChristoph Hellwig
Add a helper to directly set the IP_MTU_DISCOVER sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Howells <dhowells@redhat.com> [rxrpc bits] Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28ipv4: add ip_sock_set_recverrChristoph Hellwig
Add a helper to directly set the IP_RECVERR sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28ipv4: add ip_sock_set_freebindChristoph Hellwig
Add a helper to directly set the IP_FREEBIND sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28ipv4: add ip_sock_set_tosChristoph Hellwig
Add a helper to directly set the IP_TOS sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28tcp: add tcp_sock_set_keepcntChristoph Hellwig
Add a helper to directly set the TCP_KEEPCNT sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28tcp: add tcp_sock_set_keepintvlChristoph Hellwig
Add a helper to directly set the TCP_KEEPINTVL sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28tcp: add tcp_sock_set_keepidleChristoph Hellwig
Add a helper to directly set the TCP_KEEP_IDLE sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28tcp: add tcp_sock_set_user_timeoutChristoph Hellwig
Add a helper to directly set the TCP_USER_TIMEOUT sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28tcp: add tcp_sock_set_syncntChristoph Hellwig
Add a helper to directly set the TCP_SYNCNT sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28tcp: add tcp_sock_set_quickackChristoph Hellwig
Add a helper to directly set the TCP_QUICKACK sockopt from kernel space without going through a fake uaccess. Cleanup the callers to avoid pointless wrappers now that this is a simple function call. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28tcp: add tcp_sock_set_nodelayChristoph Hellwig
Add a helper to directly set the TCP_NODELAY sockopt from kernel space without going through a fake uaccess. Cleanup the callers to avoid pointless wrappers now that this is a simple function call. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28tcp: add tcp_sock_set_corkChristoph Hellwig
Add a helper to directly set the TCP_CORK sockopt from kernel space without going through a fake uaccess. Cleanup the callers to avoid pointless wrappers now that this is a simple function call. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28net: add sock_set_reuseportChristoph Hellwig
Add a helper to directly set the SO_REUSEPORT sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28net: add sock_set_rcvbufChristoph Hellwig
Add a helper to directly set the SO_RCVBUFFORCE sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28net: add sock_set_keepaliveChristoph Hellwig
Add a helper to directly set the SO_KEEPALIVE sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28net: add sock_enable_timestampsChristoph Hellwig
Add a helper to directly enable timestamps instead of setting the SO_TIMESTAMP* sockopts from kernel space and going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28net: add sock_bindtoindexChristoph Hellwig
Add a helper to directly set the SO_BINDTOIFINDEX sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28net: add sock_set_sndtimeoChristoph Hellwig
Add a helper to directly set the SO_SNDTIMEO_NEW sockopt from kernel space without going through a fake uaccess. The interface is simplified to only pass the seconds value, as that is the only thing needed at the moment. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28net: add sock_set_priorityChristoph Hellwig
Add a helper to directly set the SO_PRIORITY sockopt from kernel space without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28net: add sock_no_lingerChristoph Hellwig
Add a helper to directly set the SO_LINGER sockopt from kernel space with onoff set to true and a linger time of 0 without going through a fake uaccess. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28net: add sock_set_reuseaddrChristoph Hellwig
Add a helper to directly set the SO_REUSEADDR sockopt from kernel space without going through a fake uaccess. For this the iscsi target now has to formally depend on inet to avoid a mostly theoretical compile failure. For actual operation it already did depend on having ipv4 or ipv6 support. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28bonding: Fix reference count leak in bond_sysfs_slave_add.Qiushi Wu
kobject_init_and_add() takes reference even when it fails. If this function returns an error, kobject_put() must be called to properly clean up the memory associated with the object. Previous commit "b8eb718348b8" fixed a similar problem. Fixes: 07699f9a7c8d ("bonding: add sysfs /slave dir for bond slave devices.") Signed-off-by: Qiushi Wu <wu000273@umn.edu> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28Merge tag 'mlx5-updates-2020-05-26' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2020-05-26 Updates highlights: 1) From Vu Pham (8): Support VM traffics failover with bonded VF representors and e-switch egress/ingress ACLs This series introduce the support for Virtual Machine running I/O traffic over direct/fast VF path and failing over to slower paravirtualized path using the following features: __________________________________ | VM _________________ | | |FAILOVER device | | | |________________| | | | | | ____|_____ | | | | | | ______ |___ ____|_______ | | | VF PT | |VIRTIO-NET | | | | device | | device | | | |_________| |___________| | |___________|______________|________| | | | HYPERVISOR | | ____|______ | | macvtap | | |virtio BE | | |___________| | | | ____|_____ | |host VF | | |_________| | | _____|______ _____|_____ | PT VF | | host VF | |representor| |representor| |___________| |___________| \ / \ / \ / \ / _________________ \_______/ | | _______|________ | V-SWITCH | |VF representors |________________| (OVS) | | bond | |________________| |________________| | ________|________ | Uplink | | representor | |_________________| Summary: -------- Problem statement: ------------------ Currently in above topology, when netfailover device is configured using VFs and eswitch VF representors, and when traffic fails over to stand-by VF which is exposed using macvtap device to guest VM, eswitch fails to switch the traffic to the stand-by VF representor. This occurs because there is no knowledge at eswitch level of the stand-by representor device. Solution: --------- Using standard bonding driver, a bond netdevice is created over VF representor device which is used for offloading tc rules. Two VF representors are bonded together, one for the passthrough VF device and another one for the stand-by VF device. With this solution, mlx5 driver listens to the failover events occuring at the bond device level to failover traffic to either of the active VF representor of the bond. a. VM with netfailover device of VF pass-thru (PT) device and virtio-net paravirtualized device with same MAC-address to handle failover traffics at VM level. b. Host bond is active-standby mode, with the lower devices being the VM VF PT representor, and the representor of the 2nd VF to handle failover traffics at Hypervisor/V-Switch OVS level. - During the steady state (fast datapath): set the bond active device to be the VM PT VF representor. - During failover: apply bond failover to the second VF representor device which connects to the VM non-accelerated path. c. E-Switch ingress/egress ACL tables to support failover traffics at E-Switch level I. E-Switch egress ACL with forward-to-vport rule: - By default, eswitch vport egress acl forward packets to its counterpart NIC vport. - During port failover, the egress acl forward-to-vport rule will be added to e-switch vport of passive/in-active slave VF representor to forward packets to other e-switch vport ie. the active slave representor's e-switch vport to handle egress "failover" traffics. - Using lower change netdev event to detect a representor is a lower dev (slave) of bond and becomes active, adding egress acl forward-to-vport rule of all other slave netdevs to forward to this representor's vport. - Using upper change netdev event to detect a representor unslaving from bond device to delete its vport's egress acl forward-to-vport rule. II. E-Switch ingress ACL metadata reg_c for match - Bonded representors' vorts sharing tc block have the same root ingress acl table and a unique metadata for match. - Traffics from both representors's vports will be tagged with same unique metadata reg_c. - Using upper change netdev event to detect a representor enslaving/unslaving from bond device to setup shared root ingress acl and unique metadata. 2) From Alex Vesker (2): Slpit RX and TX lock for parallel rule insertion in software steering 3) Eli Britstein (2): Optimize performance for IPv4/IPv6 ethertype use the HW ip_version register rather than parsing eth frames for ethertype. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28tcp: ipv6: support RFC 6069 (TCP-LD)Eric Dumazet
Make tcp_ld_RTO_revert() helper available to IPv6, and implement RFC 6069 : Quoting this RFC : 3. Connectivity Disruption Indication For Internet Protocol version 6 (IPv6) [RFC2460], the counterpart of the ICMP destination unreachable message of code 0 (net unreachable) and of code 1 (host unreachable) is the ICMPv6 destination unreachable message of code 0 (no route to destination) [RFC4443]. As with IPv4, a router should generate an ICMPv6 destination unreachable message of code 0 in response to a packet that cannot be delivered to its destination address because it lacks a matching entry in its routing table. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28net: dsa: sja1105: offload the Credit-Based Shaper qdiscVladimir Oltean
SJA1105, being AVB/TSN switches, provide hardware assist for the Credit-Based Shaper as described in the IEEE 8021Q-2018 document. First generation has 10 shapers, freely assignable to any of the 4 external ports and 8 traffic classes, and second generation has 16 shapers. The Credit-Based Shaper tables are accessed through the dynamic reconfiguration interface, so we have to restore them manually after a switch reset. The tables are backed up by the static config only on P/Q/R/S, and we don't want to add custom code only for that family, since the procedure that is in place now works for both. Tested with the following commands: data_rate_kbps=67000 port_transmit_rate_kbps=1000000 idleslope=$data_rate_kbps sendslope=$(($idleslope - $port_transmit_rate_kbps)) locredit=$((-0x80000000)) hicredit=$((0x7fffffff)) tc qdisc add dev swp2 root handle 1: mqprio hw 0 num_tc 8 \ map 0 1 2 3 4 5 6 7 \ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 tc qdisc replace dev swp2 parent 1:1 cbs \ idleslope $idleslope \ sendslope $sendslope \ hicredit $hicredit \ locredit $locredit \ offload 1 Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28selftests: Add torture tests to nexthop testsDavid Ahern
Add Nik's torture tests as a new set to stress the replace and cleanup paths. Torture test created by Nikolay Aleksandrov and then I adapted to selftest and added IPv6 version. Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Uninitialized when used in __nf_conntrack_update(), from Nathan Chancellor. 2) Comparison of unsigned expression in nf_confirm_cthelper(). 3) Remove 'const' type qualifier with no effect. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28mt76: mt7915: remove set but not used variable 'msta'YueHaibing
Cc: linux-wireless@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, netdev@vger.kernel.org, kernel-janitors@vger.kernel.org Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/wireless/mediatek/mt76/mt7915/mcu.c: In function 'mt7915_mcu_sta_txbf_type': drivers/net/wireless/mediatek/mt76/mt7915/mcu.c:1805:21: warning: variable 'msta' set but not used [-Wunused-but-set-variable] It is never used, so can be removed. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Felix Fietkau <nbd@nbd.name>
2020-05-28mt76: mt7615: Use kmemdup in mt7615_queue_key_update()YueHaibing
Use kmemdup rather than duplicating its implementation Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Felix Fietkau <nbd@nbd.name>
2020-05-28mt76: only iterate over initialized rx queuesFelix Fietkau
Fixes the following reported crash: [ 2.361127] BUG: spinlock bad magic on CPU#0, modprobe/456 [ 2.361583] lock: 0xffffa1287525b3b8, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 [ 2.362250] CPU: 0 PID: 456 Comm: modprobe Not tainted 4.14.177 #5 [ 2.362751] Hardware name: HP Meep/Meep, BIOS Google_Meep.11297.75.0 06/17/2019 [ 2.363343] Call Trace: [ 2.363552] dump_stack+0x97/0xdb [ 2.363826] ? spin_bug+0xa6/0xb3 [ 2.364096] do_raw_spin_lock+0x6a/0x9a [ 2.364417] mt76_dma_rx_fill+0x44/0x1de [mt76] [ 2.364787] ? mt76_dma_kick_queue+0x18/0x18 [mt76] [ 2.365184] mt76_dma_init+0x53/0x85 [mt76] [ 2.365532] mt7615_dma_init+0x3d7/0x546 [mt7615e] [ 2.365928] mt7615_register_device+0xe6/0x1a0 [mt7615e] [ 2.366364] mt7615_mmio_probe+0x14b/0x171 [mt7615e] [ 2.366771] mt7615_pci_probe+0x118/0x13b [mt7615e] [ 2.367169] pci_device_probe+0xaf/0x13d [ 2.367491] driver_probe_device+0x284/0x2ca [ 2.367840] __driver_attach+0x7a/0x9e [ 2.368146] ? driver_attach+0x1f/0x1f [ 2.368451] bus_for_each_dev+0xa0/0xdb [ 2.368765] bus_add_driver+0x132/0x204 [ 2.369078] driver_register+0x8e/0xcd [ 2.369384] do_one_initcall+0x160/0x257 [ 2.369706] ? 0xffffffffc0240000 [ 2.369980] do_init_module+0x60/0x1bb [ 2.370286] load_module+0x18c2/0x1a2b [ 2.370596] ? kernel_read_file+0x141/0x1b9 [ 2.370937] ? kernel_read_file_from_fd+0x46/0x71 [ 2.371320] SyS_finit_module+0xcc/0xf0 [ 2.371636] do_syscall_64+0x6b/0xf7 [ 2.371930] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 2.372344] RIP: 0033:0x7da218ae4199 [ 2.372637] RSP: 002b:00007fffd0608398 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 2.373252] RAX: ffffffffffffffda RBX: 00005a705449df90 RCX: 00007da218ae4199 [ 2.373833] RDX: 0000000000000000 RSI: 00005a7052e73bd8 RDI: 0000000000000006 [ 2.374411] RBP: 00007fffd06083e0 R08: 0000000000000000 R09: 00005a705449d540 [ 2.374989] R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000000 [ 2.375569] R13: 00005a705449def0 R14: 00005a7052e73bd8 R15: 0000000000000000 Reported-by: Sean Wang <sean.wang@mediatek.com> Fixes: d3377b78cec6 ("mt76: add HE phy modes and hardware queue") Signed-off-by: Felix Fietkau <nbd@nbd.name>
2020-05-28mt76: mt7615: add support for MT7611NDENG Qingfang
MT7611N is basically the same as MT7615N, except it only supports 5GHz It is used by some TP-Link and Mercury wireless routers Signed-off-by: DENG Qingfang <dqfext@gmail.com> Signed-off-by: Felix Fietkau <nbd@nbd.name>
2020-05-28mt76: fix wcid allocation issuesFelix Fietkau
mt76 core uses ffs() to find the next free bit. This works well for 32 bit architectures where BITS_PER_LONG is 32. ffs only checks 32 bit values, so allocation fails on 64 bit architectures. Additionally, the wcid mask array was too small in cases where the array was not a multiple of BITS_PER_LONG. Fix this by making the wcid mask array u32 instead and use DIV_ROUND_UP for the size, just in case we ever bump it to a value that's not a multiple of 32. Reported-by: Ryder Lee <ryder.lee@mediatek.com> Signed-off-by: Felix Fietkau <nbd@nbd.name>