linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2020-09-26	lib/memregion.c: include memregion.h	Jason Yan
	This addresses the following sparse warning: lib/memregion.c:8:5: warning: symbol 'memregion_alloc' was not declared. Should it be static? lib/memregion.c:14:6: warning: symbol 'memregion_free' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200921142852.875312-1-yanaijie@huawei.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-26	lib/string.c: implement stpcpy	Nick Desaulniers
	LLVM implemented a recent "libcall optimization" that lowers calls to `sprintf(dest, "%s", str)` where the return value is used to `stpcpy(dest, str) - dest`. This generally avoids the machinery involved in parsing format strings. `stpcpy` is just like `strcpy` except it returns the pointer to the new tail of `dest`. This optimization was introduced into clang-12. Implement this so that we don't observe linkage failures due to missing symbol definitions for `stpcpy`. Similar to last year's fire drill with: commit 5f074f3e192f ("lib/string.c: implement a basic bcmp") The kernel is somewhere between a "freestanding" environment (no full libc) and "hosted" environment (many symbols from libc exist with the same type, function signature, and semantics). As Peter Anvin notes, there's not really a great way to inform the compiler that you're targeting a freestanding environment but would like to opt-in to some libcall optimizations (see pr/47280 below), rather than opt-out. Arvind notes, -fno-builtin-* behaves slightly differently between GCC and Clang, and Clang is missing many __builtin_* definitions, which I consider a bug in Clang and am working on fixing. Masahiro summarizes the subtle distinction between compilers justly: To prevent transformation from foo() into bar(), there are two ways in Clang to do that; -fno-builtin-foo, and -fno-builtin-bar. There is only one in GCC; -fno-buitin-foo. (Any difference in that behavior in Clang is likely a bug from a missing __builtin_* definition.) Masahiro also notes: We want to disable optimization from foo() to bar(), but we may still benefit from the optimization from foo() into something else. If GCC implements the same transform, we would run into a problem because it is not -fno-builtin-bar, but -fno-builtin-foo that disables that optimization. In this regard, -fno-builtin-foo would be more future-proof than -fno-built-bar, but -fno-builtin-foo is still potentially overkill. We may want to prevent calls from foo() being optimized into calls to bar(), but we still may want other optimization on calls to foo(). It seems that compilers today don't quite provide the fine grain control over which libcall optimizations pseudo-freestanding environments would prefer. Finally, Kees notes that this interface is unsafe, so we should not encourage its use. As such, I've removed the declaration from any header, but it still needs to be exported to avoid linkage errors in modules. Reported-by: Sami Tolvanen <samitolvanen@google.com> Suggested-by: Andy Lavr <andy.lavr@gmail.com> Suggested-by: Arvind Sankar <nivedita@alum.mit.edu> Suggested-by: Joe Perches <joe@perches.com> Suggested-by: Kees Cook <keescook@chromium.org> Suggested-by: Masahiro Yamada <masahiroy@kernel.org> Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200914161643.938408-1-ndesaulniers@google.com Link: https://bugs.llvm.org/show_bug.cgi?id=47162 Link: https://bugs.llvm.org/show_bug.cgi?id=47280 Link: https://github.com/ClangBuiltLinux/linux/issues/1126 Link: https://man7.org/linux/man-pages/man3/stpcpy.3.html Link: https://pubs.opengroup.org/onlinepubs/9699919799/functions/stpcpy.html Link: https://reviews.llvm.org/D85963 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-26	mm/migrate: correct thp migration stats	Zi Yan
	PageTransHuge returns true for both thp and hugetlb, so thp stats was counting both thp and hugetlb migrations. Exclude hugetlb migration by setting is_thp variable right. Clean up thp handling code too when we are there. Fixes: 1a5bae25e3cf ("mm/vmstat: add events for THP migration without split") Signed-off-by: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lkml.kernel.org/r/20200917210413.1462975-1-zi.yan@sent.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-26	mm/gup: fix gup_fast with dynamic page table folding	Vasily Gorbik
	Currently to make sure that every page table entry is read just once gup_fast walks perform READ_ONCE and pass pXd value down to the next gup_pXd_range function by value e.g.: static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, unsigned int flags, struct page *pages, int nr) ... pudp = pud_offset(&p4d, addr); This function passes a reference on that local value copy to pXd_offset, and might get the very same pointer in return. This happens when the level is folded (on most arches), and that pointer should not be iterated. On s390 due to the fact that each task might have different 5,4 or 3-level address translation and hence different levels folded the logic is more complex and non-iteratable pointer to a local copy leads to severe problems. Here is an example of what happens with gup_fast on s390, for a task with 3-level paging, crossing a 2 GB pud boundary: // addr = 0x1007ffff000, end = 0x10080001000 static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end, unsigned int flags, struct page *pages, int nr) { unsigned long next; pud_t pudp; // pud_offset returns &p4d itself (a pointer to a value on stack) pudp = pud_offset(&p4d, addr); do { // on second iteratation reading "random" stack value pud_t pud = READ_ONCE(pudp); // next = 0x10080000000, due to PUD_SIZE/MASK != PGDIR_SIZE/MASK on s390 next = pud_addr_end(addr, end); ... } while (pudp++, addr = next, addr != end); // pudp++ iterating over stack return 1; } This happens since s390 moved to common gup code with commit d1874a0c2805 ("s390/mm: make the pxd_offset functions more robust") and commit 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fast code"). s390 tried to mimic static level folding by changing pXd_offset primitives to always calculate top level page table offset in pgd_offset and just return the value passed when pXd_offset has to act as folded. What is crucial for gup_fast and what has been overlooked is that PxD_SIZE/MASK and thus pXd_addr_end should also change correspondingly. And the latter is not possible with dynamic folding. To fix the issue in addition to pXd values pass original pXdp pointers down to gup_pXd_range functions. And introduce pXd_offset_lockless helpers, which take an additional pXd entry value parameter. This has already been discussed in https://lkml.kernel.org/r/20190418100218.0a4afd51@mschwideX1 Fixes: 1a42010cdc26 ("s390/mm: convert to the generic get_user_pages_fast code") Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: <stable@vger.kernel.org> [5.2+] Link: https://lkml.kernel.org/r/patch.git-943f1e5dcff2.your-ad-here.call-01599856292-ext-8676@work.hours Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-26	mm: memcontrol: fix missing suffix of workingset_restore	Muchun Song
	We forget to add the suffix to the workingset_restore string, so fix it. And also update the documentation of cgroup-v2.rst. Fixes: 170b04b7ae49 ("mm/workingset: prepare the workingset detection infrastructure for anon LRU") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Zefan Li <lizefan@huawei.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Roman Gushchin <guro@fb.com> Cc: Randy Dunlap <rdunlap@infradead.org> Link: https://lkml.kernel.org/r/20200916100030.71698-1-songmuchun@bytedance.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-26	mm, THP, swap: fix allocating cluster for swapfile by mistake	Gao Xiang
	SWP_FS is used to make swap_{read,write}page() go through the filesystem, and it's only used for swap files over NFS. So, !SWP_FS means non NFS for now, it could be either file backed or device backed. Something similar goes with legacy SWP_FILE. So in order to achieve the goal of the original patch, SWP_BLKDEV should be used instead. FS corruption can be observed with SSD device + XFS + fragmented swapfile due to CONFIG_THP_SWAP=y. I reproduced the issue with the following details: Environment: QEMU + upstream kernel + buildroot + NVMe (2 GB) Kernel config: CONFIG_BLK_DEV_NVME=y CONFIG_THP_SWAP=y Some reproducible steps: mkfs.xfs -f /dev/nvme0n1 mkdir /tmp/mnt mount /dev/nvme0n1 /tmp/mnt bs="32k" sz="1024m" # doesn't matter too much, I also tried 16m xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw xfs_io -f -c "pwrite -F -S 0 -b $bs 0 $sz" -c "fdatasync" /tmp/mnt/sw xfs_io -f -c "pwrite -R -b $bs 0 $sz" -c "fsync" /tmp/mnt/sw mkswap /tmp/mnt/sw swapon /tmp/mnt/sw stress --vm 2 --vm-bytes 600M # doesn't matter too much as well Symptoms: - FS corruption (e.g. checksum failure) - memory corruption at: 0xd2808010 - segfault Fixes: f0eea189e8e9 ("mm, THP, swap: Don't allocate huge cluster for file backed swap device") Fixes: 38d8b4e6bdc8 ("mm, THP, swap: delay splitting THP during swap out") Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Acked-by: Rafael Aquini <aquini@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Eric Sandeen <esandeen@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200820045323.7809-1-hsiangkao@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-26	mm: slab: fix potential double free in ___cache_free	Shakeel Butt
	With the commit 10befea91b61 ("mm: memcg/slab: use a single set of kmem_caches for all allocations"), it becomes possible to call kfree() from the slabs_destroy(). The functions cache_flusharray() and do_drain() calls slabs_destroy() on array_cache of the local CPU without updating the size of the array_cache. This enables the kfree() call from the slabs_destroy() to recursively call cache_flusharray() which can potentially call free_block() on the same elements of the array_cache of the local CPU and causing double free and memory corruption. To fix the issue, simply update the local CPU array_cache cache before calling slabs_destroy(). Fixes: 10befea91b61 ("mm: memcg/slab: use a single set of kmem_caches for all allocations") Signed-off-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Roman Gushchin <guro@fb.com> Tested-by: Ming Lei <ming.lei@redhat.com> Reported-by: kernel test robot <rong.a.chen@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ted Ts'o <tytso@mit.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-26	Documentation/llvm: Fix clang target examples	Florian Fainelli
	clang --target=<triple> is how we can specify a particular toolchain triple to be use, fix the two occurences in the documentation. Fixes: fcf1b6a35c16 ("Documentation/llvm: add documentation on building w/ Clang/LLVM") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-09-25	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull more kvm fixes from Paolo Bonzini: "Five small fixes. The nested migration bug will be fixed with a better API in 5.10 or 5.11, for now this is a fix that works with existing userspace but keeps the current ugly API" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Add a dedicated INVD intercept routine KVM: x86: Reset MMU context if guest toggles CR4.SMAP or CR4.PKE KVM: x86: fix MSR_IA32_TSC read for nested migration selftests: kvm: Fix assert failure in single-step test KVM: x86: VMX: Make smaller physical guest address space support user-configurable
2020-09-25	Merge tag 'mips_fixes_5.9_3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Thomas Bogendoerfer: - fixed FP register access on Loongsoon-3 - added missing 1074 cpu handling - fixed Loongson2ef build error * tag 'mips_fixes_5.9_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: BCM47XX: Remove the needless check with the 1074K MIPS: Add the missing 'CPU_1074K' into __get_cpu_type() MIPS: Loongson2ef: Disable Loongson MMI instructions MIPS: Loongson-3: Fix fp register access if MSA enabled
2020-09-25	Merge tag 'spi-fix-v5.9-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A small collection of driver specific fixes, the fsl-espi and bcm-qspi changes in particular have been causing breakage for users" * tag 'spi-fix-v5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: bcm-qspi: Fix probe regression on iProc platforms spi: fsl-dspi: fix use-after-free in remove path spi: fsl-espi: Only process interrupts for expected events spi: bcm2835: Make polling_limit_us static spi: spi-fsl-dspi: use XSPI mode instead of DMA for DPAA2 SoCs
2020-09-25	Merge tag 'regulator-fix-v5.9-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fix from Mark Brown: "A single fix for incorrect specification of some of the register fields on axp20x devices which would break voltage setting on affected systems" * tag 'regulator-fix-v5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: axp20x: fix LDO2/4 description
2020-09-25	Merge tag 'regmap-fix-v5.9-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap fixes from Mark Brown: "Two issues here - one is a fix for use after free issues in the case where a regmap overrides its name using something dynamically generated, the other is that we weren't handling access checks non-incrementing I/O on registers within paged register regions correctly resulting in spurious errors. Both of these are quite rare but serious if they occur" * tag 'regmap-fix-v5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: fix page selection for noinc writes regmap: fix page selection for noinc reads regmap: debugfs: Add back in erroneously removed initialisation of ret regmap: debugfs: Fix handling of name string for debugfs init delays
2020-09-25	io_uring: ensure async buffered read-retry is setup properly	Jens Axboe
	A previous commit for fixing up short reads botched the async retry path, so we ended up going to worker threads more often than we should. Fix this up, so retries work the way they originally were intended to. Fixes: 227c0c9673d8 ("io_uring: internally retry short reads") Reported-by: Hao_Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-25	Merge tag 'nfsd-5.9-2' of git://git.linux-nfs.org/projects/cel/cel-2.6	Linus Torvalds
	Pull NFS server fix from Chuck Lever: "Fix incorrect calculation on platforms that implement flush_dcache_page()" * tag 'nfsd-5.9-2' of git://git.linux-nfs.org/projects/cel/cel-2.6: SUNRPC: Fix svc_flush_dcache()
2020-09-25	Merge tag 'pm-5.9-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix more fallout of recent RCU-lockdep changes in CPU idle code and two devfreq issues. Specifics: - Export rcu_idle_{enter,exit} to modules to fix build issues introduced by recent RCU-lockdep fixes (Borislav Petkov) - Add missing return statement to a stub function in the ACPI processor driver to fix a build issue introduced by recent RCU-lockdep fixes (Rafael Wysocki) - Fix recently introduced suspicious RCU usage warnings in the PSCI cpuidle driver and drop stale comments regarding RCU_NONIDLE() usage from enter_s2idle_proper() (Ulf Hansson) - Fix error code path in the tegra30 devfreq driver (Dan Carpenter) - Add missing information to devfreq_summary debugfs (Chanwoo Choi)" * tag 'pm-5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: processor: Fix build for ARCH_APICTIMER_STOPS_ON_C3 unset PM / devfreq: tegra30: Disable clock on error in probe PM / devfreq: Add timer type to devfreq_summary debugfs cpuidle: Drop misleading comments about RCU usage cpuidle: psci: Fix suspicious RCU usage rcu/tree: Export rcu_idle_{enter,exit} to modules
2020-09-25	KVM: SVM: Add a dedicated INVD intercept routine	Tom Lendacky
	The INVD instruction intercept performs emulation. Emulation can't be done on an SEV guest because the guest memory is encrypted. Provide a dedicated intercept routine for the INVD intercept. And since the instruction is emulated as a NOP, just skip it instead. Fixes: 1654efcbc431 ("KVM: SVM: Add KVM_SEV_INIT command") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <a0b9a19ffa7fef86a3cc700c7ea01cb2731e04e5.1600972918.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-25	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma	Linus Torvalds
	Pull rdma fix from Jason Gunthorpe: "One fix for a bug that blktests hits when using rxe: tear down the CQ pool before waiting for all references to go away" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/core: Fix ordering of CQ pool destruction
2020-09-25	Merge tag 'drm-fixes-2020-09-25' of git://anongit.freedesktop.org/drm/drm	Linus Torvalds
	Pull drm fixes from Dave Airlie: "Fairly quiet, a couple of i915 fixes, one dma-buf fix, one vc4 and two sun4i changes dma-buf: - Single null pointer deref fix i915: - Fix selftest reference to stack data out of scope - Fix GVT null pointer dereference vc4: - fill asoc card owner sun4i: - program secondary CSC correctly" * tag 'drm-fixes-2020-09-25' of git://anongit.freedesktop.org/drm/drm: drm/i915/selftests: Push the fake iommu device from the stack to data dmabuf: fix NULL pointer dereference in dma_buf_release() drm/i915/gvt: Fix port number for BDW on EDID region setup drm/sun4i: mixer: Extend regmap max_register drm/sun4i: sun8i-csc: Secondary CSC register correction drm/vc4/vc4_hdmi: fill ASoC card owner
2020-09-25	Merge branch 'pm-cpuidle'	Rafael J. Wysocki
	* pm-cpuidle: ACPI: processor: Fix build for ARCH_APICTIMER_STOPS_ON_C3 unset cpuidle: Drop misleading comments about RCU usage cpuidle: psci: Fix suspicious RCU usage rcu/tree: Export rcu_idle_{enter,exit} to modules
2020-09-25	io_uring: don't unconditionally set plug->nowait = true	Jens Axboe
	This causes all the bios to be submitted with REQ_NOWAIT, which can be problematic on either btrfs or on file systems that otherwise use a mix of block devices where only some of them support it. For now, just remove the setting of plug->nowait = true. Reported-by: Dan Melnic <dmm@fb.com> Reported-by: Brian Foster <bfoster@redhat.com> Fixes: b63534c41e20 ("io_uring: re-issue block requests that failed because of resources") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-25	usbcore/driver: Accommodate usbip	M. Vefa Bicakci
	Commit 88b7381a939d ("USB: Select better matching USB drivers when available") inadvertently broke usbip functionality. The commit in question allows USB device drivers to be explicitly matched with USB devices via the use of driver-provided identifier tables and match functions, which is useful for a specialised device driver to be chosen for a device that can also be handled by another, more generic, device driver. Prior, the USB device section of usb_device_match() had an unconditional "return 1" statement, which allowed user-space to bind USB devices to the usbip_host device driver, if desired. However, the aforementioned commit changed the default/fallback return value to zero. This breaks device drivers such as usbip_host, so this commit restores the legacy behaviour, but only if a device driver does not have an id_table and a match() function. In addition, if usb_device_match is called for a device driver and device pair where the device does not match the id_table of the device driver in question, then the device driver will be disqualified for the device. This allows avoiding the default case of "return 1", which prevents undesirable probe() calls to a driver even though its id_table did not match the device. Finally, this commit changes the specialised-driver-to-generic-driver transition code so that when a device driver returns -ENODEV, a more generic device driver is only considered if the current device driver does not have an id_table and a match() function. This ensures that "generic" drivers such as usbip_host will not be considered specialised device drivers and will not cause the device to be locked in to the generic device driver, when a more specialised device driver could be tried. All of these changes restore usbip functionality without regressions, ensure that the specialised/generic device driver selection logic works as expected with the usb and apple-mfi-fastcharge drivers, and do not negatively affect the use of devices provided by dummy_hcd. Fixes: 88b7381a939d ("USB: Select better matching USB drivers when available") Cc: <stable@vger.kernel.org> # 5.8 Cc: Bastien Nocera <hadess@hadess.net> Cc: Valentina Manea <valentina.manea.m@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: <syzkaller@googlegroups.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com> Link: https://lore.kernel.org/r/20200922110703.720960-5-m.v.b@runbox.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-25	usbcore/driver: Fix incorrect downcast	M. Vefa Bicakci
	This commit resolves a minor bug in the selection/discovery of more specific USB device drivers for devices that are currently bound to generic USB device drivers. The bug is related to the way a candidate USB device driver is compared against the generic USB device driver. The code in is_dev_usb_generic_driver() assumes that the device driver in question is a USB device driver by calling to_usb_device_driver(dev->driver) to downcast; however I have observed that this assumption is not always true, through code instrumentation. This commit avoids the incorrect downcast altogether by comparing the USB device's driver (i.e., dev->driver) to the generic USB device driver directly. This method was suggested by Alan Stern. This bug was found while investigating Andrey Konovalov's report indicating usbip device driver misbehaviour with the recently merged generic USB device driver selection feature. The report is linked below. Fixes: d5643d2249b2 ("USB: Fix device driver race") Cc: <stable@vger.kernel.org> # 5.8 Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Bastien Nocera <hadess@hadess.net> Cc: Shuah Khan <shuah@kernel.org> Cc: Valentina Manea <valentina.manea.m@gmail.com> Cc: <syzkaller@googlegroups.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com> Link: https://lore.kernel.org/r/20200922110703.720960-4-m.v.b@runbox.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-25	usbcore/driver: Fix specific driver selection	M. Vefa Bicakci
	This commit resolves a bug in the selection/discovery of more specific USB device drivers for devices that are currently bound to generic USB device drivers. The bug is in the logic that determines whether a device currently bound to a generic USB device driver should be re-probed by a more specific USB device driver or not. The code in __usb_bus_reprobe_drivers() used to have the following lines: if (usb_device_match_id(udev, new_udriver->id_table) == NULL && (!new_udriver->match \|\| new_udriver->match(udev) != 0)) return 0; ret = device_reprobe(dev); As the reader will notice, the code checks whether the USB device in consideration matches the identifier table (id_table) of a specific USB device_driver (new_udriver), followed by a similar check, but this time with the USB device driver's match function. However, the match function's return value is not checked correctly. When match() returns zero, it means that the specific USB device driver is not applicable to the USB device in question, but the code then goes on to reprobe the device with the new USB device driver under consideration. All this to say, the logic is inverted. This bug was found by code inspection and instrumentation while investigating the root cause of the issue reported by Andrey Konovalov, where usbip took over syzkaller's virtual USB devices in an undesired manner. The report is linked below. Fixes: d5643d2249b2 ("USB: Fix device driver race") Cc: <stable@vger.kernel.org> # 5.8 Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Bastien Nocera <hadess@hadess.net> Cc: Shuah Khan <shuah@kernel.org> Cc: Valentina Manea <valentina.manea.m@gmail.com> Cc: <syzkaller@googlegroups.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com> Link: https://lore.kernel.org/r/20200922110703.720960-3-m.v.b@runbox.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-25	Revert "usbip: Implement a match function to fix usbip"	M. Vefa Bicakci
	This commit reverts commit 7a2f2974f265 ("usbip: Implement a match function to fix usbip"). In summary, commit d5643d2249b2 ("USB: Fix device driver race") inadvertently broke usbip functionality, which I resolved in an incorrect manner by introducing a match function to usbip, usbip_match(), that unconditionally returns true. However, the usbip_match function, as is, causes usbip to take over virtual devices used by syzkaller for USB fuzzing, which is a regression reported by Andrey Konovalov. Furthermore, in conjunction with the fix of another bug, handled by another patch titled "usbcore/driver: Fix specific driver selection" in this patch set, the usbip_match function causes unexpected USB subsystem behaviour when the usbip_host driver is loaded. The unexpected behaviour can be qualified as follows: - If commit 41160802ab8e ("USB: Simplify USB ID table match") is included in the kernel, then all USB devices are bound to the usbip_host driver, which appears to the user as if all USB devices were disconnected. - If the same commit (41160802ab8e) is not in the kernel (as is the case with v5.8.10) then all USB devices are re-probed and re-bound to their original device drivers, which appears to the user as a disconnection and re-connection of USB devices. Please note that this commit will make usbip non-operational again, until yet another patch in this patch set is merged, titled "usbcore/driver: Accommodate usbip". Cc: <stable@vger.kernel.org> # 5.8: 41160802ab8e: USB: Simplify USB ID table match Cc: <stable@vger.kernel.org> # 5.8 Cc: Bastien Nocera <hadess@hadess.net> Cc: Valentina Manea <valentina.manea.m@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: <syzkaller@googlegroups.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com> Link: https://lore.kernel.org/r/20200922110703.720960-2-m.v.b@runbox.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-25	btrfs: move btrfs_scratch_superblocks into btrfs_dev_replace_finishing	Josef Bacik
	We need to move the closing of the src_device out of all the device replace locking, but we definitely want to zero out the superblock before we commit the last time to make sure the device is properly removed. Handle this by pushing btrfs_scratch_superblocks into btrfs_dev_replace_finishing, and then later on we'll move the src_device closing and freeing stuff where we need it to be. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-09-25	Merge tag 'devfreq-fixes-for-5.9-rc7' of ↵	Rafael J. Wysocki
	git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux Pull devfreq updates for 5.9-rc7 from Chanwoo Choi: "1. Update devfreq core - Add missing timer type to devfreq_summary debugfs node. 2. Fix devfreq device driver - Fix the exception handling about clock on tegra30-devfreq.c" * tag 'devfreq-fixes-for-5.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux: PM / devfreq: tegra30: Disable clock on error in probe PM / devfreq: Add timer type to devfreq_summary debugfs
2020-09-25	block: remove unused BLK_QC_T_EAGAIN flag	Jeffle Xu
	commit 7b6620d7db56 ("block: remove REQ_NOWAIT_INLINE") removed the REQ_NOWAIT_INLINE related code, but the diff wasn't applied to blk_types.h somehow. Then commit 2771cefeac49 ("block: remove the REQ_NOWAIT_INLINE flag") removed the REQ_NOWAIT_INLINE flag while the BLK_QC_T_EAGAIN flag still remains. Fixes: 7b6620d7db56 ("block: remove REQ_NOWAIT_INLINE") Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-25	io_uring: ensure open/openat2 name is cleaned on cancelation	Jens Axboe
	If we cancel these requests, we'll leak the memory associated with the filename. Add them to the table of ops that need cleaning, if REQ_F_NEED_CLEANUP is set. Cc: stable@vger.kernel.org Fixes: e62753e4e292 ("io_uring: call statx directly") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-25	KVM: x86: Reset MMU context if guest toggles CR4.SMAP or CR4.PKE	Sean Christopherson
	Reset the MMU context during kvm_set_cr4() if SMAP or PKE is toggled. Recent commits to (correctly) not reload PDPTRs when SMAP/PKE are toggled inadvertantly skipped the MMU context reset due to the mask of bits that triggers PDPTR loads also being used to trigger MMU context resets. Fixes: 427890aff855 ("kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode") Fixes: cb957adb4ea4 ("kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode") Cc: Jim Mattson <jmattson@google.com> Cc: Peter Shier <pshier@google.com> Cc: Oliver Upton <oupton@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923215352.17756-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-25	RDMA/mlx5: Fix type warning of sizeof in __mlx5_ib_alloc_counters()	Liu Shixin
	sizeof() when applied to a pointer typed expression should give the size of the pointed data, even if the data is a pointer. Fixes: e1f24a79f424 ("IB/mlx5: Support congestion related counters") Link: https://lore.kernel.org/r/20200917081354.2083293-1-liushixin2@huawei.com Signed-off-by: Liu Shixin <liushixin2@huawei.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-25	Merge tag 'drm-misc-fixes-2020-09-24' of ↵	Dave Airlie
	git://anongit.freedesktop.org/drm/drm-misc into drm-fixes drm-misc-fixes for v5.9: - Single null pointer deref fix for dma-buf. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/4106c21e-f52c-4c05-6cdb-daa743bb8617@linux.intel.com
2020-09-25	Merge tag 'drm-intel-fixes-2020-09-24' of ↵	Dave Airlie
	git://anongit.freedesktop.org/drm/drm-intel into drm-fixes drm/i915 fixes for v5.9-rc7: - Fix selftest reference to stack data out of scope - Fix GVT null pointer dereference - Backmerge from Linus' master to fix build Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87zh5fpmha.fsf@intel.com
2020-09-25	BackMerge commit '98477740630f270aecf648f1d6a9dbc6027d4ff1' into drm-fixes	Dave Airlie
	The dax mess had some fallout, and i915 used a later base to fix their CI. Signed-off-by: Dave Airlie <airlied@redhat.com>
2020-09-24	ep_create_wakeup_source(): dentry name can change under you...	Al Viro
	or get freed, for that matter, if it's a long (separately stored) name. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-24	Merge tag 'nvme-5.9-2020-09-24' of git://git.infradead.org/nvme into block-5.9	Jens Axboe
	Pull NVMe fixes from Christoph: "nvme fixes for 5.9 - fix error during controller probe that cause double free irqs (Keith Busch) - FC connection establishment fix (James Smart) - properly handle completions for invalid tags (Xianting Tian) - pass the correct nsid to the command effects and supported log (Chaitanya Kulkarni)" * tag 'nvme-5.9-2020-09-24' of git://git.infradead.org/nvme: nvme-core: don't use NVME_NSID_ALL for command effects and supported log nvme-fc: fail new connections to a deleted host or remote port nvme-pci: fix NULL req in completion handler nvme: return errors for hwmon init
2020-09-24	RDMA/hns: Support inline data in extented sge space for RC	Weihang Li
	HIP08 supports RC inline up to size of 32 Bytes, and all data should be put into SQWQE. For HIP09, this capability is extended to 1024 Bytes, if length of data is longer than 32 Bytes, they will be filled into extended sge space. Link: https://lore.kernel.org/r/1599744069-9968-1-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24	RDMA/hns: Fix missing sq_sig_type when querying QP	Weihang Li
	The sq_sig_type field should be filled when querying QP, or the users may get a wrong value. Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC") Link: https://lore.kernel.org/r/1600509802-44382-9-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24	RDMA/hns: Fix configuration of ack_req_freq in QPC	Weihang Li
	The hardware will add AckReq flag in BTH header according to the value of ack_req_freq to request ACK from responder for the packets with this flag. It should be greater than or equal to lp_pktn_ini instead of using a fixed value. Fixes: 7b9bd73ed13d ("RDMA/hns: Fix wrong assignment of lp_pktn_ini in QPC") Link: https://lore.kernel.org/r/1600509802-44382-8-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24	RDMA/hns: Fix the wrong value of rnr_retry when querying qp	Wenpeng Liang
	The rnr_retry returned to the user is not correct, it should be got from another fields in QPC. Fixes: bfe860351e31 ("RDMA/hns: Fix cast from or to restricted __le32 for driver") Link: https://lore.kernel.org/r/1600509802-44382-7-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24	RDMA/hns: Solve the overflow of the calc_pg_sz()	Jiaran Zhang
	calc_pg_sz() may gets a data calculation overflow if the PAGE_SIZE is 64 KB and hop_num is 2. It is because that all variables involved in calculation are defined in type of int. So change the type of bt_chunk_size, buf_chunk_size and obj_per_chunk_default to u64. Fixes: ba6bb7e97421 ("RDMA/hns: Add interfaces to get pf capabilities from firmware") Link: https://lore.kernel.org/r/1600509802-44382-6-git-send-email-liweihang@huawei.com Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24	RDMA/hns: Add check for the validity of sl configuration	Jiaran Zhang
	According to the RoCE v1 specification, the sl (service level) 0-7 are mapped directly to priorities 0-7 respectively, sl 8-15 are reserved. The driver should verify whether the the value of sl is larger than 7, if so, an exception should be returned. Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC") Link: https://lore.kernel.org/r/1600509802-44382-5-git-send-email-liweihang@huawei.com Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24	RDMA/hns: Correct typo of hns_roce_create_cq()	Lang Cheng
	Change "initialze" to "initialize". Fixes: 8f3e9f3ea087 ("IB/hns: Add code for refreshing CQ CI using TPTR") Link: https://lore.kernel.org/r/1600509802-44382-4-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24	RDMA/hns: Add interception for resizing SRQs	Yangyang Li
	HIP08 doesn't support modifying the maximum number of outstanding WR in an SRQ. However, the driver does not return a failure message, and users may mistakenly think that the resizing is executed successfully. So the driver needs to intercept this operation. Fixes: ffb1308b88b6 ("RDMA/hns: Move SRQ code to the reasonable place") Link: https://lore.kernel.org/r/1600509802-44382-3-git-send-email-liweihang@huawei.com Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24	RDMA/hns: Refactor process about opcode in post_send()	Weihang Li
	According to the IB specifications, the verbs should return an immediate error when the users set an unsupported opcode. Furthermore, refactor codes about opcode in process of post_send to make the difference between opcodes clearer. Link: https://lore.kernel.org/r/1600509802-44382-2-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24	RDMA/hns: Add support for SCCC in size of 64 Bytes	Yangyang Li
	For HIP09, size of SCCC (Soft Congestion Control Context) is increased to 64 Bytes from 32 Bytes. The hardware will get the configuration of SCCC from driver instead of using a fixed value. Link: https://lore.kernel.org/r/1600245806-56321-5-git-send-email-liweihang@huawei.com Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24	RDMA/hns: Add support for QPC in size of 512 Bytes	Wenpeng Liang
	The new version of RoCEE supports using QPC in size of 256B or 512B, so that HIP09 can supports new congestion control algorithms by using QPC in larger size. Link: https://lore.kernel.org/r/1600245806-56321-4-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24	RDMA/hns: Add support for CQE in size of 64 Bytes	Wenpeng Liang
	The new version of RoCEE supports using CQE in size of 32B or 64B. The performance of bus can be improved by using larger size of CQE. Link: https://lore.kernel.org/r/1600245806-56321-3-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24	RDMA/hns: Add support for EQE in size of 64 Bytes	Wenpeng Liang
	The new version of RoCEE supports using CEQE in size of 4B or 64B, AEQE in size of 16B or 64B. The performance of bus can be improved by using larger size of EQE. Link: https://lore.kernel.org/r/1600245806-56321-2-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-24	KVM: x86: fix MSR_IA32_TSC read for nested migration	Maxim Levitsky
	MSR reads/writes should always access the L1 state, since the (nested) hypervisor should intercept all the msrs it wants to adjust, and these that it doesn't should be read by the guest as if the host had read it. However IA32_TSC is an exception. Even when not intercepted, guest still reads the value + TSC offset. The write however does not take any TSC offset into account. This is documented in Intel's SDM and seems also to happen on AMD as well. This creates a problem when userspace wants to read the IA32_TSC value and then write it. (e.g for migration) In this case it reads L2 value but write is interpreted as an L1 value. To fix this make the userspace initiated reads of IA32_TSC return L1 value as well. Huge thanks to Dave Gilbert for helping me understand this very confusing semantic of MSR writes. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200921103805.9102-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>