summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-03-29mm: fork: fix kernel_stack memcg stats for various stack implementationsRoman Gushchin
Depending on CONFIG_VMAP_STACK and the THREAD_SIZE / PAGE_SIZE ratio the space for task stacks can be allocated using __vmalloc_node_range(), alloc_pages_node() and kmem_cache_alloc_node(). In the first and the second cases page->mem_cgroup pointer is set, but in the third it's not: memcg membership of a slab page should be determined using the memcg_from_slab_page() function, which looks at page->slab_cache->memcg_params.memcg . In this case, using mod_memcg_page_state() (as in account_kernel_stack()) is incorrect: page->mem_cgroup pointer is NULL even for pages charged to a non-root memory cgroup. It can lead to kernel_stack per-memcg counters permanently showing 0 on some architectures (depending on the configuration). In order to fix it, let's introduce a mod_memcg_obj_state() helper, which takes a pointer to a kernel object as a first argument, uses mem_cgroup_from_obj() to get a RCU-protected memcg pointer and calls mod_memcg_state(). It allows to handle all possible configurations (CONFIG_VMAP_STACK and various THREAD_SIZE/PAGE_SIZE values) without spilling any memcg/kmem specifics into fork.c . Note: This is a special version of the patch created for stable backports. It contains code from the following two patches: - mm: memcg/slab: introduce mem_cgroup_from_obj() - mm: fork: fix kernel_stack memcg stats for various stack implementations [guro@fb.com: introduce mem_cgroup_from_obj()] Link: http://lkml.kernel.org/r/20200324004221.GA36662@carbon.dhcp.thefacebook.com Fixes: 4d96ba353075 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages") Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Bharata B Rao <bharata@linux.ibm.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200303233550.251375-1-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-29hugetlb_cgroup: fix illegal access to memoryMina Almasry
This appears to be a mistake in commit faced7e0806cf ("mm: hugetlb controller for cgroups v2"). Essentially that commit does a hugetlb_cgroup_from_counter assuming that page_counter_try_charge has initialized counter. But if that has failed then it seems will not initialize counter, so hugetlb_cgroup_from_counter(counter) ends up pointing to random memory, causing kasan to complain. The solution is to simply use 'h_cg', instead of hugetlb_cgroup_from_counter(counter), since that is a reference to the hugetlb_cgroup anyway. After this change kasan ceases to complain. Fixes: faced7e0806cf ("mm: hugetlb controller for cgroups v2") Reported-by: syzbot+cac0c4e204952cf449b1@syzkaller.appspotmail.com Signed-off-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Giuseppe Scrivano <gscrivan@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/20200313223920.124230-1-almasrymina@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-29drivers/base/memory.c: indicate all memory blocks as removableDavid Hildenbrand
We see multiple issues with the implementation/interface to compute whether a memory block can be offlined (exposed via /sys/devices/system/memory/memoryX/removable) and would like to simplify it (remove the implementation). 1. It runs basically lockless. While this might be good for performance, we see possible races with memory offlining that will require at least some sort of locking to fix. 2. Nowadays, more false positives are possible. No arch-specific checks are performed that validate if memory offlining will not be denied right away (and such check will require locking). For example, arm64 won't allow to offline any memory block that was added during boot - which will imply a very high error rate. Other archs have other constraints. 3. The interface is inherently racy. E.g., if a memory block is detected to be removable (and was not a false positive at that time), there is still no guarantee that offlining will actually succeed. So any caller already has to deal with false positives. 4. It is unclear which performance benefit this interface actually provides. The introducing commit 5c755e9fd813 ("memory-hotplug: add sysfs removable attribute for hotplug memory remove") mentioned "A user-level agent must be able to identify which sections of memory are likely to be removable before attempting the potentially expensive operation." However, no actual performance comparison was included. Known users: - lsmem: Will group memory blocks based on the "removable" property. [1] - chmem: Indirect user. It has a RANGE mode where one can specify removable ranges identified via lsmem to be offlined. However, it also has a "SIZE" mode, which allows a sysadmin to skip the manual "identify removable blocks" step. [2] - powerpc-utils: Uses the "removable" attribute to skip some memory blocks right away when trying to find some to offline+remove. However, with ballooning enabled, it already skips this information completely (because it once resulted in many false negatives). Therefore, the implementation can deal with false positives properly already. [3] According to Nathan Fontenot, DLPAR on powerpc is nowadays no longer driven from userspace via the drmgr command (powerpc-utils). Nowadays it's managed in the kernel - including onlining/offlining of memory blocks - triggered by drmgr writing to /sys/kernel/dlpar. So the affected legacy userspace handling is only active on old kernels. Only very old versions of drmgr on a new kernel (unlikely) might execute slower - totally acceptable. With CONFIG_MEMORY_HOTREMOVE, always indicating "removable" should not break any user space tool. We implement a very bad heuristic now. Without CONFIG_MEMORY_HOTREMOVE we cannot offline anything, so report "not removable" as before. Original discussion can be found in [4] ("[PATCH RFC v1] mm: is_mem_section_removable() overhaul"). Other users of is_mem_section_removable() will be removed next, so that we can remove is_mem_section_removable() completely. [1] http://man7.org/linux/man-pages/man1/lsmem.1.html [2] http://man7.org/linux/man-pages/man8/chmem.8.html [3] https://github.com/ibm-power-utilities/powerpc-utils [4] https://lkml.kernel.org/r/20200117105759.27905-1-david@redhat.com Also, this patch probably fixes a crash reported by Steve. http://lkml.kernel.org/r/CAPcyv4jpdaNvJ67SkjyUJLBnBnXXQv686BiVW042g03FUmWLXw@mail.gmail.com Reported-by: "Scargall, Steve" <steve.scargall@intel.com> Suggested-by: Michal Hocko <mhocko@kernel.org> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Nathan Fontenot <ndfont@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Robert Jennings <rcj@linux.vnet.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Karel Zak <kzak@redhat.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200128093542.6908-1-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-29mm/swapfile.c: move inode_lock out of claim_swapfileNaohiro Aota
claim_swapfile() currently keeps the inode locked when it is successful, or the file is already swapfile (with -EBUSY). And, on the other error cases, it does not lock the inode. This inconsistency of the lock state and return value is quite confusing and actually causing a bad unlock balance as below in the "bad_swap" section of __do_sys_swapon(). This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE check out of claim_swapfile(). The inode is unlocked in "bad_swap_unlock_inode" section, so that the inode is ensured to be unlocked at "bad_swap". Thus, error handling codes after the locking now jumps to "bad_swap_unlock_inode" instead of "bad_swap". ===================================== WARNING: bad unlock balance detected! 5.5.0-rc7+ #176 Not tainted ------------------------------------- swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at: __do_sys_swapon+0x94b/0x3550 but there are no more locks to release! other info that might help us debug this: no locks held by swapon/4294. stack backtrace: CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176 Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014 Call Trace: dump_stack+0xa1/0xea print_unlock_imbalance_bug.cold+0x114/0x123 lock_release+0x562/0xed0 up_write+0x2d/0x490 __do_sys_swapon+0x94b/0x3550 __x64_sys_swapon+0x54/0x80 do_syscall_64+0xa4/0x4b0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f15da0a0dc7 Fixes: 1638045c3677 ("mm: set S_SWAPFILE on blockdev swap devices") Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Qais Youef <qais.yousef@arm.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200206090132.154869-1-naohiro.aota@wdc.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-29block: return NULL in blk_alloc_queue() on errorChaitanya Kulkarni
This patch fixes follwoing warning: block/blk-core.c: In function ‘blk_alloc_queue’: block/blk-core.c:558:10: warning: returning ‘int’ from a function with return type ‘struct request_queue *’ makes pointer from integer without a cast [-Wint-conversion] return -EINVAL; Fixes: 3d745ea5b095a ("block: simplify queue allocation") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-29netfilter: nf_queue: prefer nf_queue_entry_freeFlorian Westphal
Instead of dropping refs+kfree, use the helper added in previous patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-29netfilter: nf_queue: do not release refcouts until nf_reinject is doneFlorian Westphal
nf_queue is problematic when another NF_QUEUE invocation happens from nf_reinject(). 1. nf_queue is invoked, increments state->sk refcount. 2. skb is queued, waiting for verdict. 3. sk is closed/released. 3. verdict comes back, nf_reinject is called. 4. nf_reinject drops the reference -- refcount can now drop to 0 Instead of get_ref/release_ref pattern, we need to nest the get_ref calls: get_ref get_ref release_ref release_ref So that when we invoke the next processing stage (another netfilter or the okfn()), we hold at least one reference count on the devices/socket. After previous patch, it is now safe to put the entry even after okfn() has potentially free'd the skb. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-29netfilter: nf_queue: place bridge physports into queue_entry structFlorian Westphal
The refcount is done via entry->skb, which does work fine. Major problem: When putting the refcount of the bridge ports, we must always put the references while the skb is still around. However, we will need to put the references after okfn() to avoid a possible 1 -> 0 -> 1 refcount transition, so we cannot use the skb pointer anymore. Place the physports in the queue entry structure instead to allow for refcounting changes in the next patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-29netfilter: nf_queue: make nf_queue_entry_release_refs staticFlorian Westphal
This is a preparation patch, no logical changes. Move free_entry into core and rename it to something more sensible. Will ease followup patches which will complicate the refcount handling. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-29IB/qib: Delete struct qib_ivdev.qp_rndGeorge Spelvin
I was checking the field to see if it needed the full get_random_bytes() and discovered it's unused. Only compile-tested, as I don't have the hardware, but I'm still pretty confident. Link: https://lore.kernel.org/r/202003281643.02SGh6eG002694@sdf.org Signed-off-by: George Spelvin <lkml@sdf.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-29RDMA/hns: Fix uninitialized variable bugGustavo A. R. Silva
There is a potential execution path in which variable *ret* is returned without being properly initialized, previously. Fix this by initializing variable *ret* to 0. Link: https://lore.kernel.org/r/20200328023539.GA32016@embeddedor Addresses-Coverity-ID: 1491917 ("Uninitialized scalar variable") Fixes: 2f49de21f3e9 ("RDMA/hns: Optimize mhop get flow for multi-hop addressing") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-29RDMA/hns: Modify the mask of QP number for CQE of hip08Lang Cheng
The hip08 supports up to 1M QPs, so the qpn mask of cqe should be modified. Link: https://lore.kernel.org/r/1585194018-4381-4-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-29RDMA/hns: Reduce the maximum number of extend SGE per WQELang Cheng
Just reduce the default number to 64 for backward compatibility, the driver can still get this configuration from the firmware. Link: https://lore.kernel.org/r/1585194018-4381-3-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-29RDMA/hns: Reduce PFC frames in congestion scenariosJihua Tao
The original value means sending 16 packets at a time, and it should be configured to 0 which means sending 1 packet instead. It is modified to reduce the number of PFC frames to make sure the performance meets expectations when flow control is enabled on hip08. Link: https://lore.kernel.org/r/1585194018-4381-2-git-send-email-liweihang@huawei.com Signed-off-by: Jihua Tao <taojihua4@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-29kbuild: add outputmakefile to no-dot-config-targetsDavid Engraf
The target outputmakefile is used to generate a Makefile for out-of-tree builds and does not depend on the kernel configuration. Signed-off-by: David Engraf <david.engraf@sysgo.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-03-29kbuild: remove AS variableMasahiro Yamada
As commit 5ef872636ca7 ("kbuild: get rid of misleading $(AS) from documents") noted, we rarely use $(AS) directly in the kernel build. Now that the only/last user of $(AS) in drivers/net/wan/Makefile was converted to $(CC), $(AS) is no longer used in the build process. You can still pass in AS=clang, which is just a switch to turn on the LLVM integrated assembler. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
2020-03-29net: wan: wanxl: refactor the firmware rebuild ruleMasahiro Yamada
Split the big recipe into 3 stages: compile, link, and hexdump. After this commit, the build log with CONFIG_WANXL_BUILD_FIRMWARE will look like this: M68KAS drivers/net/wan/wanxlfw.o M68KLD drivers/net/wan/wanxlfw.bin BLDFW drivers/net/wan/wanxlfw.inc CC [M] drivers/net/wan/wanxl.o Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-03-29net: wan: wanxl: use $(M68KCC) instead of $(M68KAS) for rebuilding firmwareMasahiro Yamada
The firmware source, wanxlfw.S, is currently compiled by the combo of $(CPP) and $(M68KAS). This is not what we usually do for compiling *.S files. In fact, this Makefile is the only user of $(AS) in the kernel build. Instead of combining $(CPP) and (AS) from different tool sets, using $(M68KCC) as an assembler driver is simpler, and saner. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-03-29net: wan: wanxl: use allow to pass CROSS_COMPILE_M68k for rebuilding firmwareMasahiro Yamada
As far as I understood from the Kconfig help text, this build rule is used to rebuild the driver firmware, which runs on an old m68k-based chip. So, you need m68k tools for the firmware rebuild. wanxl.c is a PCI driver, but CONFIG_M68K does not select CONFIG_HAVE_PCI. So, you cannot enable CONFIG_WANXL_BUILD_FIRMWARE for ARCH=m68k. In other words, ifeq ($(ARCH),m68k) is false here. I am keeping the dead code for now, but rebuilding the firmware requires 'as68k' and 'ld68k', which I do not have in hand. Instead, the kernel.org m68k GCC [1] successfully built it. Allowing a user to pass in CROSS_COMPILE_M68K= is handier. [1] https://mirrors.edge.kernel.org/pub/tools/crosstool/files/bin/x86_64/9.2.0/x86_64-gcc-9.2.0-nolibc-m68k-linux.tar.xz Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-03-29kbuild: add comment about grouped targetMasahiro Yamada
GNU Make commit 8c888d95f618 ("[SV 8297] Implement "grouped targets" for explicit rules.") added the '&:' syntax. I think '&:' is a perfect fit here, but we cannot use it any time soon. Just add a TODO comment. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-03-29kbuild: add -Wall to KBUILD_HOSTCXXFLAGSMasahiro Yamada
Add -Wall to catch more warnings for C++ host programs. When I submitted the previous version, the 0-day bot reported -Wc++11-compat warnings for old GCC: HOSTCXX -fPIC scripts/gcc-plugins/latent_entropy_plugin.o In file included from /usr/lib/gcc/x86_64-linux-gnu/4.8/plugin/include/tm.h:28:0, from scripts/gcc-plugins/gcc-common.h:15, from scripts/gcc-plugins/latent_entropy_plugin.c:78: /usr/lib/gcc/x86_64-linux-gnu/4.8/plugin/include/config/elfos.h:102:21: warning: C++11 requires a space between string literal and macro [-Wc++11-compat] fprintf ((FILE), "%s"HOST_WIDE_INT_PRINT_UNSIGNED"\n",\ ^ /usr/lib/gcc/x86_64-linux-gnu/4.8/plugin/include/config/elfos.h:170:24: warning: C++11 requires a space between string literal and macro [-Wc++11-compat] fprintf ((FILE), ","HOST_WIDE_INT_PRINT_UNSIGNED",%u\n", \ ^ In file included from /usr/lib/gcc/x86_64-linux-gnu/4.8/plugin/include/tm.h:42:0, from scripts/gcc-plugins/gcc-common.h:15, from scripts/gcc-plugins/latent_entropy_plugin.c:78: /usr/lib/gcc/x86_64-linux-gnu/4.8/plugin/include/defaults.h:126:24: warning: C++11 requires a space between string literal and macro [-Wc++11-compat] fprintf ((FILE), ","HOST_WIDE_INT_PRINT_UNSIGNED",%u\n", \ ^ The source of the warnings is in the plugin headers, so we have no control of it. I just suppressed them by adding -Wno-c++11-compat to scripts/gcc-plugins/Makefile. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Kees Cook <keescook@chromium.org>
2020-03-29kconfig: remove unused variable in qconf.ccMasahiro Yamada
If this file were compiled with -Wall, the following warning would be reported: scripts/kconfig/qconf.cc:312:6: warning: unused variable ‘i’ [-Wunused-variable] int i; ^ The commit prepares to turn on -Wall for C++ host programs. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org>
2020-03-29KEYS: Avoid false positive ENOMEM error on key readWaiman Long
By allocating a kernel buffer with a user-supplied buffer length, it is possible that a false positive ENOMEM error may be returned because the user-supplied length is just too large even if the system do have enough memory to hold the actual key data. Moreover, if the buffer length is larger than the maximum amount of memory that can be returned by kmalloc() (2^(MAX_ORDER-1) number of pages), a warning message will also be printed. To reduce this possibility, we set a threshold (PAGE_SIZE) over which we do check the actual key length first before allocating a buffer of the right size to hold it. The threshold is arbitrary, it is just used to trigger a buffer length check. It does not limit the actual key length as long as there is enough memory to satisfy the memory request. To further avoid large buffer allocation failure due to page fragmentation, kvmalloc() is used to allocate the buffer so that vmapped pages can be used when there is not a large enough contiguous set of pages available for allocation. In the extremely unlikely scenario that the key keeps on being changed and made longer (still <= buflen) in between 2 __keyctl_read_key() calls, the __keyctl_read_key() calling loop in keyctl_read_key() may have to be iterated a large number of times, but definitely not infinite. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com>
2020-03-29KEYS: Don't write out to userspace while holding key semaphoreWaiman Long
A lockdep circular locking dependency report was seen when running a keyutils test: [12537.027242] ====================================================== [12537.059309] WARNING: possible circular locking dependency detected [12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - - [12537.125253] ------------------------------------------------------ [12537.153189] keyctl/25598 is trying to acquire lock: [12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0 [12537.208365] [12537.208365] but task is already holding lock: [12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220 [12537.270476] [12537.270476] which lock already depends on the new lock. [12537.270476] [12537.307209] [12537.307209] the existing dependency chain (in reverse order) is: [12537.340754] [12537.340754] -> #3 (&type->lock_class){++++}: [12537.367434] down_write+0x4d/0x110 [12537.385202] __key_link_begin+0x87/0x280 [12537.405232] request_key_and_link+0x483/0xf70 [12537.427221] request_key+0x3c/0x80 [12537.444839] dns_query+0x1db/0x5a5 [dns_resolver] [12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs] [12537.496731] cifs_reconnect+0xe04/0x2500 [cifs] [12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs] [12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs] [12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs] [12537.601045] kthread+0x30c/0x3d0 [12537.617906] ret_from_fork+0x3a/0x50 [12537.636225] [12537.636225] -> #2 (root_key_user.cons_lock){+.+.}: [12537.664525] __mutex_lock+0x105/0x11f0 [12537.683734] request_key_and_link+0x35a/0xf70 [12537.705640] request_key+0x3c/0x80 [12537.723304] dns_query+0x1db/0x5a5 [dns_resolver] [12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs] [12537.775607] cifs_reconnect+0xe04/0x2500 [cifs] [12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs] [12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs] [12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs] [12537.873477] kthread+0x30c/0x3d0 [12537.890281] ret_from_fork+0x3a/0x50 [12537.908649] [12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}: [12537.935225] __mutex_lock+0x105/0x11f0 [12537.954450] cifs_call_async+0x102/0x7f0 [cifs] [12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs] [12538.000659] cifs_readpages+0x120a/0x1e50 [cifs] [12538.023920] read_pages+0xf5/0x560 [12538.041583] __do_page_cache_readahead+0x41d/0x4b0 [12538.067047] ondemand_readahead+0x44c/0xc10 [12538.092069] filemap_fault+0xec1/0x1830 [12538.111637] __do_fault+0x82/0x260 [12538.129216] do_fault+0x419/0xfb0 [12538.146390] __handle_mm_fault+0x862/0xdf0 [12538.167408] handle_mm_fault+0x154/0x550 [12538.187401] __do_page_fault+0x42f/0xa60 [12538.207395] do_page_fault+0x38/0x5e0 [12538.225777] page_fault+0x1e/0x30 [12538.243010] [12538.243010] -> #0 (&mm->mmap_sem){++++}: [12538.267875] lock_acquire+0x14c/0x420 [12538.286848] __might_fault+0x119/0x1b0 [12538.306006] keyring_read_iterator+0x7e/0x170 [12538.327936] assoc_array_subtree_iterate+0x97/0x280 [12538.352154] keyring_read+0xe9/0x110 [12538.370558] keyctl_read_key+0x1b9/0x220 [12538.391470] do_syscall_64+0xa5/0x4b0 [12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf [12538.435535] [12538.435535] other info that might help us debug this: [12538.435535] [12538.472829] Chain exists of: [12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class [12538.472829] [12538.524820] Possible unsafe locking scenario: [12538.524820] [12538.551431] CPU0 CPU1 [12538.572654] ---- ---- [12538.595865] lock(&type->lock_class); [12538.613737] lock(root_key_user.cons_lock); [12538.644234] lock(&type->lock_class); [12538.672410] lock(&mm->mmap_sem); [12538.687758] [12538.687758] *** DEADLOCK *** [12538.687758] [12538.714455] 1 lock held by keyctl/25598: [12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220 [12538.770573] [12538.770573] stack backtrace: [12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G [12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015 [12538.881963] Call Trace: [12538.892897] dump_stack+0x9a/0xf0 [12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279 [12538.932891] ? save_trace+0xd6/0x250 [12538.948979] check_prev_add.constprop.32+0xc36/0x14f0 [12538.971643] ? keyring_compare_object+0x104/0x190 [12538.992738] ? check_usage+0x550/0x550 [12539.009845] ? sched_clock+0x5/0x10 [12539.025484] ? sched_clock_cpu+0x18/0x1e0 [12539.043555] __lock_acquire+0x1f12/0x38d0 [12539.061551] ? trace_hardirqs_on+0x10/0x10 [12539.080554] lock_acquire+0x14c/0x420 [12539.100330] ? __might_fault+0xc4/0x1b0 [12539.119079] __might_fault+0x119/0x1b0 [12539.135869] ? __might_fault+0xc4/0x1b0 [12539.153234] keyring_read_iterator+0x7e/0x170 [12539.172787] ? keyring_read+0x110/0x110 [12539.190059] assoc_array_subtree_iterate+0x97/0x280 [12539.211526] keyring_read+0xe9/0x110 [12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0 [12539.249076] keyctl_read_key+0x1b9/0x220 [12539.266660] do_syscall_64+0xa5/0x4b0 [12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf One way to prevent this deadlock scenario from happening is to not allow writing to userspace while holding the key semaphore. Instead, an internal buffer is allocated for getting the keys out from the read method first before copying them out to userspace without holding the lock. That requires taking out the __user modifier from all the relevant read methods as well as additional changes to not use any userspace write helpers. That is, 1) The put_user() call is replaced by a direct copy. 2) The copy_to_user() call is replaced by memcpy(). 3) All the fault handling code is removed. Compiling on a x86-64 system, the size of the rxrpc_read() function is reduced from 3795 bytes to 2384 bytes with this patch. Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2") Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com>
2020-03-29efi/libstub/arm: Fix spurious message that an initrd was loadedArd Biesheuvel
Commit: ec93fc371f014a6f ("efi/libstub: Add support for loading the initrd from a device path") added a diagnostic print to the ARM version of the EFI stub that reports whether an initrd has been loaded that was passed via the command line using initrd=. However, it failed to take into account that, for historical reasons, the file loading routines return EFI_SUCCESS when no file was found, and the only way to decide whether a file was loaded is to inspect the 'size' argument that is passed by reference. So let's inspect this returned size, to prevent the print from being emitted even if no initrd was loaded at all. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-efi@vger.kernel.org
2020-03-29efi/libstub/arm64: Avoid image_base value from efi_loaded_imageArd Biesheuvel
Commit: 9f9223778ef3 ("efi/libstub/arm: Make efi_entry() an ordinary PE/COFF entrypoint") did some code refactoring to get rid of the EFI entry point assembler code, and in the process, it got rid of the assignment of image_addr to the value of _text. Instead, it switched to using the image_base field of the efi_loaded_image struct provided by UEFI, which should contain the same value. However, Michael reports that this is not the case: older GRUB builds corrupt this value in some way, and since we can easily switch back to referring to _text to discover this value, let's simply do that. While at it, fix another issue in commit 9f9223778ef3, which may result in the unassigned image_addr to be misidentified as the preferred load offset of the kernel, which is unlikely but will cause a boot crash if it does occur. Finally, let's add a warning if the _text vs. image_base discrepancy is detected, so we can tell more easily how widespread this issue actually is. Reported-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-efi@vger.kernel.org
2020-03-29ALSA: hda/realtek - a fake key event is triggered by running shutupHui Wang
On the Lenovo X1C7 machines, after we plug the headset, the rt_resume() and rt_suspend() of the codec driver will be called periodically, the driver can't stay in the rt_suspend state even users doen't use the sound card. Through debugging, I found when running rt_suspend(), it will call alc225_shutup(), in this function, it will change 3k pull down control by alc_update_coef_idx(codec, 0x4a, 0, 3 << 10), this will trigger a fake key event and that event will resume the codec, when codec suspend agin, it will trigger the fake key event one more time, this process will repeat. If disable the key event before changing the pull down control, it will not trigger fake key event. It also needs to restore the pull down control and re-enable the key event, otherwise the system can't get key event when codec is in rt_suspend state. Also move some functions ahead of alc225_shutup(), this can save the function declaration. Fixes: 76f7dec08fd6 (ALSA: hda/realtek - Add Headset Button supported for ThinkPad X1) Cc: Kailang Yang <kailang@realtek.com> Cc: <stable@vger.kernel.org> Signed-off-by: Hui Wang <hui.wang@canonical.com> Link: https://lore.kernel.org/r/20200329082018.20486-1-hui.wang@canonical.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-03-29i3c: convert to use i2c_new_client_device()Wolfram Sang
Move away from the deprecated API. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Link: https://lore.kernel.org/linux-i3c/20200326211002.13241-2-wsa+renesas@sang-engineering.com
2020-03-29ALSA: hda: default enable CA0132 DSP supportRouven Czerwinski
If SND_HDA_CODEC_CA0132 is enabled, the DSP support should be enabled as well. Disabled DSP support leads to a hanging alsa system and no sound output on the card otherwise. Tested on: 06:00.0 Audio device: Creative Labs Sound Core3D [Sound Blaster Recon3D / Z-Series] (rev 01) Signed-off-by: Rouven Czerwinski <rouven@czerwinskis.de> Link: https://lore.kernel.org/r/20200329053710.4276-1-r.czerwinski@pengutronix.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-03-28ext4: fix incorrect group count in ext4_fill_super error messageJosh Triplett
ext4_fill_super doublechecks the number of groups before mounting; if that check fails, the resulting error message prints the group count from the ext4_sb_info sbi, which hasn't been set yet. Print the freshly computed group count instead (which at that point has just been computed in "blocks_count"). Signed-off-by: Josh Triplett <josh@joshtriplett.org> Fixes: 4ec1102813798 ("ext4: Add sanity checks for the superblock before mounting the filesystem") Link: https://lore.kernel.org/r/8b957cd1513fcc4550fe675c10bcce2175c33a49.1585431964.git.josh@joshtriplett.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-03-28ext4: fix incorrect inodes per group in error messageJosh Triplett
If ext4_fill_super detects an invalid number of inodes per group, the resulting error message printed the number of blocks per group, rather than the number of inodes per group. Fix it to print the correct value. Fixes: cd6bb35bf7f6d ("ext4: use more strict checks for inodes_per_block on mount") Link: https://lore.kernel.org/r/8be03355983a08e5d4eed480944613454d7e2550.1585434649.git.josh@joshtriplett.org Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-03-28ext4: don't set dioread_nolock by default for blocksize < pagesizeRitesh Harjani
Currently on calling echo 3 > drop_caches on host machine, we see FS corruption in the guest. This happens on Power machine where blocksize < pagesize. So as a temporary workaound don't enable dioread_nolock by default for blocksize < pagesize until we identify the root cause. Also emit a warning msg in case if this mount option is manually enabled for blocksize < pagesize. Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20200327200744.12473-1-riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-03-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix memory leak in vti6, from Torsten Hilbrich. 2) Fix double free in xfrm_policy_timer, from YueHaibing. 3) NL80211_ATTR_CHANNEL_WIDTH attribute is put with wrong type, from Johannes Berg. 4) Wrong allocation failure check in qlcnic driver, from Xu Wang. 5) Get ks8851-ml IO operations right, for real this time, from Marek Vasut. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (22 commits) r8169: fix PHY driver check on platforms w/o module softdeps net: ks8851-ml: Fix IO operations, again mlxsw: spectrum_mr: Fix list iteration in error path qlcnic: Fix bad kzalloc null test mac80211: set IEEE80211_TX_CTRL_PORT_CTRL_PROTO for nl80211 TX mac80211: mark station unauthorized before key removal mac80211: Check port authorization in the ieee80211_tx_dequeue() case cfg80211: Do not warn on same channel at the end of CSA mac80211: drop data frames without key on encrypted links ieee80211: fix HE SPR size calculation nl80211: fix NL80211_ATTR_CHANNEL_WIDTH attribute type xfrm: policy: Fix doulbe free in xfrm_policy_timer bpf: Explicitly memset some bpf info structures declared on the stack bpf: Explicitly memset the bpf_attr structure bpf: Sanitize the bpf_struct_ops tcp-cc name vti6: Fix memory leak of skb if input policy check fails esp: remove the skb from the chain when it's enqueued in cryptd_wq ipv6: xfrm6_tunnel.c: Use built-in RCU list checking xfrm: add the missing verify_sec_ctx_len check in xfrm_add_acquire xfrm: fix uctx len check in verify_sec_ctx_len ...
2020-03-28Merge branch 'ifla_xdp_expected_fd'Alexei Starovoitov
Toke Høiland-Jørgensen says: ==================== This series adds support for atomically replacing the XDP program loaded on an interface. This is achieved by means of a new netlink attribute that can specify the expected previous program to replace on the interface. If set, the kernel will compare this "expected fd" attribute with the program currently loaded on the interface, and reject the operation if it does not match. With this primitive, userspace applications can avoid stepping on each other's toes when simultaneously updating the loaded XDP program. Changelog: v4: - Switch back to passing FD instead of ID (Andrii) - Rename flag to XDP_FLAGS_REPLACE (for consistency with other similar uses) v3: - Pass existing ID instead of FD (Jakub) - Use opts struct for new libbpf function (Andrii) v2: - Fix checkpatch nits and add .strict_start_type to netlink policy (Jakub) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-03-28PCI: Add ACS quirk for Zhaoxin Root/Downstream PortsRaymond Pang
Many Zhaoxin Root Ports and Switch Downstream Ports do provide ACS-like capability but have no ACS Capability Structure. Peer-to-Peer transactions could be blocked between these ports, so add quirk so devices behind them could be assigned to different IOMMU group. Link: https://lore.kernel.org/r/20200327091148.5190-4-RaymondPang-oc@zhaoxin.com Signed-off-by: Raymond Pang <RaymondPang-oc@zhaoxin.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-03-28PCI: Add ACS quirk for Zhaoxin multi-function devicesRaymond Pang
Some Zhaoxin endpoints are implemented as multi-function devices without an ACS capability, but they actually don't support peer-to-peer transactions. Add ACS quirks to declare DMA isolation. Link: https://lore.kernel.org/r/20200327091148.5190-3-RaymondPang-oc@zhaoxin.com Signed-off-by: Raymond Pang <RaymondPang-oc@zhaoxin.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-03-28PCI: Add Zhaoxin Vendor IDRaymond Pang
Add Zhaoxin Vendor ID to pci_ids.h Link: https://lore.kernel.org/r/20200327091148.5190-2-RaymondPang-oc@zhaoxin.com Signed-off-by: Raymond Pang <RaymondPang-oc@zhaoxin.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-03-28selftests/bpf: Add tests for attaching XDP programsToke Høiland-Jørgensen
This adds tests for the various replacement operations using IFLA_XDP_EXPECTED_FD. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/158515700967.92963.15098921624731968356.stgit@toke.dk
2020-03-28libbpf: Add function to set link XDP fd while specifying old programToke Høiland-Jørgensen
This adds a new function to set the XDP fd while specifying the FD of the program to replace, using the newly added IFLA_XDP_EXPECTED_FD netlink parameter. The new function uses the opts struct mechanism to be extendable in the future. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/158515700857.92963.7052131201257841700.stgit@toke.dk
2020-03-28tools: Add EXPECTED_FD-related definitions in if_link.hToke Høiland-Jørgensen
This adds the IFLA_XDP_EXPECTED_FD netlink attribute definition and the XDP_FLAGS_REPLACE flag to if_link.h in tools/include. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/158515700747.92963.8615391897417388586.stgit@toke.dk
2020-03-28xdp: Support specifying expected existing program when attaching XDPToke Høiland-Jørgensen
While it is currently possible for userspace to specify that an existing XDP program should not be replaced when attaching to an interface, there is no mechanism to safely replace a specific XDP program with another. This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be set along with IFLA_XDP_FD. If set, the kernel will check that the program currently loaded on the interface matches the expected one, and fail the operation if it does not. This corresponds to a 'cmpxchg' memory operation. Setting the new attribute with a negative value means that no program is expected to be attached, which corresponds to setting the UPDATE_IF_NOEXIST flag. A new companion flag, XDP_FLAGS_REPLACE, is also added to explicitly request checking of the EXPECTED_FD attribute. This is needed for userspace to discover whether the kernel supports the new attribute. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/bpf/158515700640.92963.3551295145441017022.stgit@toke.dk
2020-03-28platform/x86: surface3_power: Fix Kconfig section orderingAndy Shevchenko
Kconfig section is misplaced. Put it in the same order as it is done in Makefile for this driver. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-03-28platform/x86: surface3_power: Add missed headersAndy Shevchenko
We obviously are users of bits.h and types.h. Add them to the list. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-03-28platform/x86: surface3_power: Reformat GUID assignmentAndy Shevchenko
For better readability reformat GUID assignment. While here, add the comment how this GUID looks in a string representation. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-03-28platform/x86: surface3_power: Drop useless macro ACPI_PTR()Andy Shevchenko
Driver depends to ACPI, this marco always is evaluated to the parameter, thus useless. Drop it for good. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-03-28platform/x86: surface3_power: Prefix POLL_INTERVAL with SURFACE_3Andy Shevchenko
For better namespace maintenance prefix POLL_INTERVAL macro with SURFACE_3. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-03-28platform/x86: surface3_power: Simplify mshw0011_adp_psr() to one linerAndy Shevchenko
Refactor mshw0011_adp_psr() to be one liner. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-03-28platform/x86: surface3_power: Use dev_err() instead of pr_err()Andy Shevchenko
We have device and we may use it to print messages. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-03-28platform/x86: surface3_power: Drop unused structure definitionAndy Shevchenko
As reported by kbuild bot the struct mshw0011_lookup in never used. Drop its definition for good. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2020-03-28Merge branch 'i2c/for-current' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Three more driver bugfixes, and two doc improvements fixing build warnings while we are here" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: pca-platform: Use platform_irq_get_optional i2c: st: fix missing struct parameter description i2c: nvidia-gpu: Handle timeout correctly in gpu_i2c_check_status() i2c: fix a doc warning i2c: hix5hd2: add missed clk_disable_unprepare in remove