linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-02-22	cxl/mem: Fix potential memory leak	Ben Widawsky
	When submitting a command for userspace, input and output payload bounce buffers are allocated. For a given command, both input and output buffers may exist and so when allocation of the input buffer fails, the output buffer must be freed too. As far as I can tell, userspace can't easily exploit the leak to OOM a machine unless the machine was already near OOM state. Fixes: 583fa5e71cae ("cxl/mem: Add basic IOCTL interface") Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: https://lore.kernel.org/r/20210221035846.680145-1-ben.widawsky@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-02-22	Merge tag 'powerpc-5.12-1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - A large series adding wrappers for our interrupt handlers, so that irq/nmi/user tracking can be isolated in the wrappers rather than spread in each handler. - Conversion of the 32-bit syscall handling into C. - A series from Nick to streamline our TLB flushing when using the Radix MMU. - Switch to using queued spinlocks by default for 64-bit server CPUs. - A rework of our PCI probing so that it happens later in boot, when more generic infrastructure is available. - Two small fixes to allow 32-bit little-endian processes to run on 64-bit kernels. - Other smaller features, fixes & cleanups. Thanks to: Alexey Kardashevskiy, Ananth N Mavinakayanahalli, Aneesh Kumar K.V, Athira Rajeev, Bhaskar Chowdhury, Cédric Le Goater, Chengyang Fan, Christophe Leroy, Christopher M. Riedl, Fabiano Rosas, Florian Fainelli, Frederic Barrat, Ganesh Goudar, Hari Bathini, Jiapeng Chong, Joseph J Allen, Kajol Jain, Markus Elfring, Michal Suchanek, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Pingfan Liu, Po-Hsu Lin, Qian Cai, Ram Pai, Randy Dunlap, Sandipan Das, Stephen Rothwell, Tyrel Datwyler, Will Springer, Yury Norov, and Zheng Yongjun. * tag 'powerpc-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (188 commits) powerpc/perf: Adds support for programming of Thresholding in P10 powerpc/pci: Remove unimplemented prototypes powerpc/uaccess: Merge raw_copy_to_user_allowed() into raw_copy_to_user() powerpc/uaccess: Merge __put_user_size_allowed() into __put_user_size() powerpc/uaccess: get rid of small constant size cases in raw_copy_{to,from}_user() powerpc/64: Fix stack trace not displaying final frame powerpc/time: Remove get_tbl() powerpc/time: Avoid using get_tbl() spi: mpc52xx: Avoid using get_tbl() powerpc/syscall: Avoid storing 'current' in another pointer powerpc/32: Handle bookE debugging in C in syscall entry/exit powerpc/syscall: Do not check unsupported scv vector on PPC32 powerpc/32: Remove the counter in global_dbcr0 powerpc/32: Remove verification of MSR_PR on syscall in the ASM entry powerpc/syscall: implement system call entry/exit logic in C for PPC32 powerpc/32: Always save non volatile GPRs at syscall entry powerpc/syscall: Change condition to check MSR_RI powerpc/syscall: Save r3 in regs->orig_r3 powerpc/syscall: Use is_compat_task() powerpc/syscall: Make interrupt.c buildable on PPC32 ...
2021-02-22	Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm	Linus Torvalds
	Pull ARM updates from Russell King: - Generalise byte swapping assembly - Update debug addresses for STI - Validate start of physical memory with DTB - Do not clear SCTLR.nTLSMD in decompressor - amba/locomo/sa1111 devices remove method return type is void - address markers for KASAN in page table dump * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 9065/1: OABI compat: fix build when EPOLL is not enabled ARM: 9055/1: mailbox: arm_mhuv2: make remove callback return void amba: Make use of bus_type functions amba: Make the remove callback return void vfio: platform: simplify device removal amba: reorder functions amba: Fix resource leak for drivers without .remove ARM: 9054/1: arch/arm/mm/mmu.c: Remove duplicate header ARM: 9053/1: arm/mm/ptdump:Add address markers for KASAN regions ARM: 9051/1: vdso: remove unneded extra-y addition ARM: 9050/1: Kconfig: Select ARCH_HAVE_NMI_SAFE_CMPXCHG where possible ARM: 9049/1: locomo: make locomo bus's remove callback return void ARM: 9048/1: sa1111: make sa1111 bus's remove callback return void ARM: 9047/1: smp: remove unused variable ARM: 9046/1: decompressor: Do not clear SCTLR.nTLSMD for ARMv7+ cores ARM: 9045/1: uncompress: Validate start of physical memory against passed DTB ARM: 9042/1: debug: no uncompress debugging while semihosting ARM: 9041/1: sti LL_UART: add STiH418 SBC UART0 support ARM: 9040/1: use DEBUG_UART_PHYS and DEBUG_UART_VIRT for sti LL_UART ARM: 9039/1: assembler: generalize byte swapping macro into rev_l
2021-02-22	Merge tag 'timers-urgent-2021-02-22' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "A small set of clockevent fixes which fell through the cracks before the 5.11 release: - Ensure a clock is enabled on sh_cmt - Trivial compile fail and compile warning fixes" * tag 'timers-urgent-2021-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource/drivers/mxs_timer: Add missing semicolon when DEBUG is defined clocksource/drivers/sh_cmt: Make sure channel clock supply is enabled clocksource/drivers/ixp4xx: Select TIMER_OF when needed
2021-02-22	Merge tag 'trace-v5.12' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - Update to the way irqs and preemption is tracked via the trace event PC field - Fix handling of unregistering event failing due to allocate memory. This is only triggered by failure injection, as it is pretty much guaranteed to have less than a page allocation succeed. - Do not show the useless "filter" or "enable" files for the "ftrace" trace system, as they have no effect on doing anything. - Add a warning if kprobes are registered more than once. - Synthetic events now have their fields parsed by semicolons. Old formats without semicolons will still work, but new features will require them. - New option to allow trace events to show %p without hashing in trace file. The trace file can only be read by root, and reading the raw event buffer did not have any pointers hashed, so this does not expose anything new. - New directory in tools called tools/tracing, where a new tool that reads sequential latency reports from the ftrace latency tracers. - Other minor fixes and cleanups. * tag 'trace-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits) kprobes: Fix to delay the kprobes jump optimization tracing/tools: Add the latency-collector to tools directory tracing: Make hash-ptr option default tracing: Add ptr-hash option to show the hashed pointer value tracing: Update the stage 3 of trace event macro comment tracing: Show real address for trace event arguments selftests/ftrace: Add '!event' synthetic event syntax check selftests/ftrace: Update synthetic event syntax errors tracing: Add a backward-compatibility check for synthetic event creation tracing: Update synth command errors tracing: Rework synthetic event command parsing tracing/dynevent: Delegate parsing to create function kprobes: Warn if the kprobe is reregistered ftrace: Remove unused ftrace_force_update() tracepoints: Code clean up tracepoints: Do not punish non static call users tracepoints: Remove unnecessary "data_args" macro parameter tracing: Do not create "enable" or "filter" files for ftrace event subsystem kernel: trace: preemptirq_delay_test: add cpu affinity tracepoint: Do not fail unregistering a probe due to memory failure ...
2021-02-22	docs: ABI: testing: ima_policy: Fixed missing bracket	Michael Weiß
	This fixes a minor typo. Fixes: 34433332841d ("docs: ABI: testing: make the files compatible with ReST output") Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de> Link: https://lore.kernel.org/r/20210215102031.10622-1-michael.weiss@aisec.fraunhofer.de Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-02-22	Merge tag 'perf-tools-for-v5.12-2020-02-19' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tool updates from Arnaldo Carvalho de Melo: "New features: - Support instruction latency in 'perf report', with both memory latency (weight) and instruction latency information, users can locate expensive load instructions and understand time spent in different stages. - Extend 'perf c2c' to display the number of loads which were blocked by data or address conflict. - Add 'perf stat' support for L2 topdown events in systems such as Intel's Sapphire rapids server. - Add support for PERF_SAMPLE_CODE_PAGE_SIZE in various tools, as a sort key, for instance: perf report --stdio --sort=comm,symbol,code_page_size - New 'perf daemon' command to run long running sessions while providing a way to control the enablement of events without restarting a traditional 'perf record' session. - Enable counting events for BPF programs in 'perf stat' just like for other targets (tid, cgroup, cpu, etc), e.g.: # perf stat -e ref-cycles,cycles -b 254 -I 1000 1.487903822 115,200 ref-cycles 1.487903822 86,012 cycles 2.489147029 80,560 ref-cycles 2.489147029 73,784 cycles ^C The example above counts 'cycles' and 'ref-cycles' of BPF program of id 254. It is similar to bpftool-prog-profile command, but more flexible. - Support the new layout for PERF_RECORD_MMAP2 to carry the DSO build-id using infrastructure generalised from the eBPF subsystem, removing the need for traversing the perf.data file to collect build-ids at the end of 'perf record' sessions and helping with long running sessions where binaries can get replaced in updates, leading to possible mis-resolution of symbols. - Support filtering by hex address in 'perf script'. - Support DSO filter in 'perf script', like in other perf tools. - Add namespaces support to 'perf inject' - Add support for SDT (Dtrace Style Markers) events on ARM64. perf record: - Fix handling of eventfd() when draining a buffer in 'perf record'. - Improvements to the generation of metadata events for pre-existing threads (mmaps, comm, etc), speeding up the work done at the start of system wide or per CPU 'perf record' sessions. Hardware tracing: - Initial support for tracing KVM with Intel PT. - Intel PT fixes for IPC - Support Intel PT PSB (synchronization packets) events. - Automatically group aux-output events to overcome --filter syntax. - Enable PERF_SAMPLE_DATA_SRC on ARMs SPE. - Update ARM's CoreSight hardware tracing OpenCSD library to v1.0.0. perf annotate TUI: - Fix handling of 'k' ("show line number") hotkey - Fix jump parsing for C++ code. perf probe: - Add protection to avoid endless loop. cgroups: - Avoid reading cgroup mountpoint multiple times, caching it. - Fix handling of cgroup v1/v2 in mixed hierarchy. Symbol resolving: - Add OCaml symbol demangling. - Further fixes for handling PE executables when using perf with Wine and .exe/.dll files. - Fix 'perf unwind' DSO handling. - Resolve symbols against debug file first, to deal with artifacts related to LTO. - Fix gap between kernel end and module start on powerpc. Reporting tools: - The DSO filter shouldn't show samples in unresolved maps. - Improve debuginfod support in various tools. build ids: - Fix 16-byte build ids in 'perf buildid-cache', add a 'perf test' entry for that case. perf test: - Support for PERF_SAMPLE_WEIGHT_STRUCT. - Add test case for PERF_SAMPLE_CODE_PAGE_SIZE. - Shell based tests for 'perf daemon's commands ('start', 'stop, 'reconfig', 'list', etc). - ARM cs-etm 'perf test' fixes. - Add parse-metric memory bandwidth testcase. Compiler related: - Fix 'perf probe' kretprobe issue caused by gcc 11 bug when used with -fpatchable-function-entry. - Fix ARM64 build with gcc 11's -Wformat-overflow. - Fix unaligned access in sample parsing test. - Fix printf conversion specifier for IP addresses on arm64, s390 and powerpc. Arch specific: - Support exposing Performance Monitor Counter SPRs as part of extended regs on powerpc. - Add JSON 'perf stat' metrics for ARM64's imx8mp, imx8mq and imx8mn DDR, fix imx8mm ones. - Fix common and uarch events for ARM64's A76 and Ampere eMag" * tag 'perf-tools-for-v5.12-2020-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (148 commits) perf buildid-cache: Don't skip 16-byte build-ids perf buildid-cache: Add test for 16-byte build-id perf symbol: Remove redundant libbfd checks perf test: Output the sub testing result in cs-etm perf test: Suppress logs in cs-etm testing perf tools: Fix arm64 build error with gcc-11 perf intel-pt: Add documentation for tracing virtual machines perf intel-pt: Split VM-Entry and VM-Exit branches perf intel-pt: Adjust sample flags for VM-Exit perf intel-pt: Allow for a guest kernel address filter perf intel-pt: Support decoding of guest kernel perf machine: Factor out machine__idle_thread() perf machine: Factor out machines__find_guest() perf intel-pt: Amend decoder to track the NR flag perf intel-pt: Retain the last PIP packet payload as is perf intel_pt: Add vmlaunch and vmresume as branches perf script: Add branch types for VM-Entry and VM-Exit perf auxtrace: Automatically group aux-output events perf test: Fix unaligned access in sample parsing test perf tools: Support arch specific PERF_SAMPLE_WEIGHT_STRUCT processing ...
2021-02-22	Fix unaesthetic indentation	Matthew Wilcox
	The current documentation build looks like this: $ make htmldocs SPHINX htmldocs --> file:///home/willy/kernel/linux-next/Documentation/output make[2]: Nothing to be done for 'html'. WARNING: The kernel documentation build process support for Sphinx v3.0 and above is brand new. Be prepared for possible issues in the generated output. $ That extra indentation before my next prompt isn't pretty. This patch fixes it, but I'm not a pythonista, and maybe there's a better way. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20210215161757.GD2858050@casper.infradead.org [jc: tweaked for the "better way"] Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-02-22	Merge tag 'nfsd-5.12-1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull more nfsd updates from Chuck Lever: "Here are a few additional NFSD commits for the merge window: Optimization: - Cork the socket while there are queued replies Fixes: - DRC shutdown ordering - svc_rdma_accept() lockdep splat" * tag 'nfsd-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Further clean up svc_tcp_sendmsg() SUNRPC: Remove redundant socket flags from svc_tcp_sendmsg() SUNRPC: Use TCP_CORK to optimise send performance on the server svcrdma: Hold private mutex while invoking rdma_accept() nfsd: register pernet ops last, unregister first
2021-02-22	Merge tag 'ceph-for-5.12-rc1' of git://github.com/ceph/ceph-client	Linus Torvalds
	Pull ceph updates from Ilya Dryomov: "With netfs helper library and fscache rework delayed, just a few cap handling improvements to avoid grabbing mmap_lock in some code paths and deal with capsnaps better and a mount option cleanup" * tag 'ceph-for-5.12-rc1' of git://github.com/ceph/ceph-client: ceph: defer flushing the capsnap if the Fb is used libceph: remove osdtimeout option entirely libceph: deprecate [no]cephx_require_signatures options ceph: allow queueing cap/snap handling after putting cap references ceph: clean up inode work queueing ceph: fix flush_snap logic after putting caps
2021-02-22	Merge tag 'fs_for_v5.12-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull isofs, udf, and quota updates from Jan Kara: "Several udf, isofs, and quota fixes" * tag 'fs_for_v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: parser: Fix kernel-doc markups udf: handle large user and group ID isofs: handle large user and group ID parser: add unsigned int parser udf: fix silent AED tagLocation corruption isofs: release buffer head before return quota: Fix memory leak when handling corrupted quota file
2021-02-22	Merge tag 'fsnotify_for_v5.12-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify update from Jan Kara: "Make inotify groups be charged against appropriate memcgs" * tag 'fsnotify_for_v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: inotify, memcg: account inotify instances to kmemcg
2021-02-22	scripts: kernel-doc: fix array element capture in pointer-to-func parsing	Aditya Srivastava
	Currently, kernel-doc causes an unexpected error when array element (i.e., "type (*foo[bar])(args)") is present as pointer parameter in pointer-to-function parsing. For e.g., running kernel-doc -none on kernel/gcov/gcc_4_7.c causes this error: "Use of uninitialized value $param in regexp compilation at ...", in combination with: "warning: Function parameter or member '' not described in 'gcov_info'" Here, the parameter parsing does not take into account the presence of array element (i.e. square brackets) in $param. Provide a simple fix by adding square brackets in the regex, responsible for capturing $param. A quick evaluation, by running 'kernel-doc -none' on entire kernel-tree, reveals that no additional warning or error has been added or removed by the fix. Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Aditya Srivastava <yashsri421@gmail.com> Tested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Link: https://lore.kernel.org/r/20210217145625.14006-1-yashsri421@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-02-22	Merge tag 'lazytime_for_v5.12-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull lazytime updates from Jan Kara: "Cleanups of the lazytime handling in the writeback code making rules for calling ->dirty_inode() filesystem handlers saner" * tag 'lazytime_for_v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext4: simplify i_state checks in __ext4_update_other_inode_time() gfs2: don't worry about I_DIRTY_TIME in gfs2_fsync() fs: improve comments for writeback_single_inode() fs: drop redundant check from __writeback_single_inode() fs: clean up __mark_inode_dirty() a bit fs: pass only I_DIRTY_INODE flags to ->dirty_inode fs: don't call ->dirty_inode for lazytime timestamp updates fat: only specify I_DIRTY_TIME when needed in fat_update_time() fs: only specify I_DIRTY_TIME when needed in generic_update_time() fs: correctly document the inode dirty flags
2021-02-22	Merge tag 'exfat-for-5.12-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - improve file deletion performance with dirsync mount option - fix shift-out-of-bounds in exfat_fill_super() reported by syzkaller * tag 'exfat-for-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: improve performance of exfat_free_cluster when using dirsync mount option exfat: fix shift-out-of-bounds in exfat_fill_super()
2021-02-22	Merge tag 'zonefs-5.12-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs updates from Damien Le Moal: "Two changes: - A fix that did not make it in time for 5.11, to correct the file size initialization of full sequential zone, from Shin'ichiro - Add file operation tracepoints to help with debugging, from Johannes" * tag 'zonefs-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: Fix file size of zones in full condition zonefs: add tracepoints for file operations
2021-02-22	Merge branch 'work.audit' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull RCU-safe common_lsm_audit() from Al Viro: "Make common_lsm_audit() non-blocking and usable from RCU pathwalk context. We don't really need to grab/drop dentry in there - rcu_read_lock() is enough. There's a couple of followups using that to simplify the logics in selinux, but those hadn't soaked in -next yet, so they'll have to go in next window" * 'work.audit' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: make dump_common_audit_data() safe to be called from RCU pathwalk new helper: d_find_alias_rcu()
2021-02-22	Merge branch 'work.d_name' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull d_name whack-a-mole from Al Viro: "A bunch of places that play with ->d_name in printks instead of using proper formats..." * 'work.d_name' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: orangefs_file_mmap(): use %pD cifs_debug: use %pd instead of messing with ->d_name erofs: use %pd instead of messing with ->d_name cramfs: use %pD instead of messing with file_dentry()->d_name
2021-02-22	Merge tag 'memblock-v5.12-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock update from Mike Rapoport: "Remove return value of memblock_free_all() memblock_free_all() returns the total count of freed pages and its callers used this value to update totalram_pages. This update is now anyway a part of memblock_free_all() and its callers no longer check the return value, so make memblock_free_all() void" * tag 'memblock-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: mm: memblock: remove return value of memblock_free_all()
2021-02-22	doc: use KCFLAGS instead of EXTRA_CFLAGS to pass flags from command line	Masahiro Yamada
	You should use KCFLAGS to pass additional compiler flags from the command line. Using EXTRA_CFLAGS is wrong. EXTRA_CFLAGS is supposed to specify flags applied only to the current Makefile (and now deprecated in favor of ccflags-y). It is still used in arch/mips/kvm/Makefile (and possibly in external modules too). Passing EXTRA_CFLAGS from the command line overwrites it and breaks the build. I also fixed drivers/gpu/drm/tilcdc/Makefile because commit 816175dd1fd7 ("drivers/gpu/drm/tilcdc: Makefile, only -Werror when no -W* in EXTRA_CFLAGS") was based on the same misunderstanding. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Federico Vaga <federico.vaga@vaga.pv.it> Link: https://lore.kernel.org/r/20210221152524.197693-1-masahiroy@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-02-22	Documentation: proc.rst: add more about the 6 fields in loadavg	Randy Dunlap
	Address Jon's feedback on the previous patch by adding info about field separators in the /proc/loadavg file. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20210222034729.22350-1-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-02-22	nbd: handle device refs for DESTROY_ON_DISCONNECT properly	Josef Bacik
	There exists a race where we can be attempting to create a new nbd configuration while a previous configuration is going down, both configured with DESTROY_ON_DISCONNECT. Normally devices all have a reference of 1, as they won't be cleaned up until the module is torn down. However with DESTROY_ON_DISCONNECT we'll make sure that there is only 1 reference (generally) on the device for the config itself, and then once the config is dropped, the device is torn down. The race that exists looks like this TASK1 TASK2 nbd_genl_connect() idr_find() refcount_inc_not_zero(nbd) * count is 2 here ^^ nbd_config_put() nbd_put(nbd) (count is 1) setup new config check DESTROY_ON_DISCONNECT put_dev = true if (put_dev) nbd_put(nbd) * free'd here ^^ In nbd_genl_connect() we assume that the nbd ref count will be 2, however clearly that won't be true if the nbd device had been setup as DESTROY_ON_DISCONNECT with its prior configuration. Fix this by getting rid of the runtime flag to check if we need to mess with the nbd device refcount, and use the device NBD_DESTROY_ON_DISCONNECT flag to check if we need to adjust the ref counts. This was reported by syzkaller with the following kasan dump BUG: KASAN: use-after-free in instrument_atomic_read include/linux/instrumented.h:71 [inline] BUG: KASAN: use-after-free in atomic_read include/asm-generic/atomic-instrumented.h:27 [inline] BUG: KASAN: use-after-free in refcount_dec_not_one+0x71/0x1e0 lib/refcount.c:76 Read of size 4 at addr ffff888143bf71a0 by task systemd-udevd/8451 CPU: 0 PID: 8451 Comm: systemd-udevd Not tainted 5.11.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:230 __kasan_report mm/kasan/report.c:396 [inline] kasan_report.cold+0x79/0xd5 mm/kasan/report.c:413 check_memory_region_inline mm/kasan/generic.c:179 [inline] check_memory_region+0x13d/0x180 mm/kasan/generic.c:185 instrument_atomic_read include/linux/instrumented.h:71 [inline] atomic_read include/asm-generic/atomic-instrumented.h:27 [inline] refcount_dec_not_one+0x71/0x1e0 lib/refcount.c:76 refcount_dec_and_mutex_lock+0x19/0x140 lib/refcount.c:115 nbd_put drivers/block/nbd.c:248 [inline] nbd_release+0x116/0x190 drivers/block/nbd.c:1508 __blkdev_put+0x548/0x800 fs/block_dev.c:1579 blkdev_put+0x92/0x570 fs/block_dev.c:1632 blkdev_close+0x8c/0xb0 fs/block_dev.c:1640 __fput+0x283/0x920 fs/file_table.c:280 task_work_run+0xdd/0x190 kernel/task_work.c:140 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:174 [inline] exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:201 __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline] syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fc1e92b5270 Code: 73 01 c3 48 8b 0d 38 7d 20 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 59 c1 20 00 00 75 10 b8 03 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee fb ff ff 48 89 04 24 RSP: 002b:00007ffe8beb2d18 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000007 RCX: 00007fc1e92b5270 RDX: 000000000aba9500 RSI: 0000000000000000 RDI: 0000000000000007 RBP: 00007fc1ea16f710 R08: 000000000000004a R09: 0000000000000008 R10: 0000562f8cb0b2a8 R11: 0000000000000246 R12: 0000000000000000 R13: 0000562f8cb0afd0 R14: 0000000000000003 R15: 000000000000000e Allocated by task 1: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:401 [inline] ____kasan_kmalloc.constprop.0+0x82/0xa0 mm/kasan/common.c:429 kmalloc include/linux/slab.h:552 [inline] kzalloc include/linux/slab.h:682 [inline] nbd_dev_add+0x44/0x8e0 drivers/block/nbd.c:1673 nbd_init+0x250/0x271 drivers/block/nbd.c:2394 do_one_initcall+0x103/0x650 init/main.c:1223 do_initcall_level init/main.c:1296 [inline] do_initcalls init/main.c:1312 [inline] do_basic_setup init/main.c:1332 [inline] kernel_init_freeable+0x605/0x689 init/main.c:1533 kernel_init+0xd/0x1b8 init/main.c:1421 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296 Freed by task 8451: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38 kasan_set_track+0x1c/0x30 mm/kasan/common.c:46 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:356 ____kasan_slab_free+0xe1/0x110 mm/kasan/common.c:362 kasan_slab_free include/linux/kasan.h:192 [inline] slab_free_hook mm/slub.c:1547 [inline] slab_free_freelist_hook+0x5d/0x150 mm/slub.c:1580 slab_free mm/slub.c:3143 [inline] kfree+0xdb/0x3b0 mm/slub.c:4139 nbd_dev_remove drivers/block/nbd.c:243 [inline] nbd_put.part.0+0x180/0x1d0 drivers/block/nbd.c:251 nbd_put drivers/block/nbd.c:295 [inline] nbd_config_put+0x6dd/0x8c0 drivers/block/nbd.c:1242 nbd_release+0x103/0x190 drivers/block/nbd.c:1507 __blkdev_put+0x548/0x800 fs/block_dev.c:1579 blkdev_put+0x92/0x570 fs/block_dev.c:1632 blkdev_close+0x8c/0xb0 fs/block_dev.c:1640 __fput+0x283/0x920 fs/file_table.c:280 task_work_run+0xdd/0x190 kernel/task_work.c:140 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:174 [inline] exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:201 __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline] syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The buggy address belongs to the object at ffff888143bf7000 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 416 bytes inside of 1024-byte region [ffff888143bf7000, ffff888143bf7400) The buggy address belongs to the page: page:000000005238f4ce refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x143bf0 head:000000005238f4ce order:3 compound_mapcount:0 compound_pincount:0 flags: 0x57ff00000010200(slab\|head) raw: 057ff00000010200 ffffea00004b1400 0000000300000003 ffff888010c41140 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888143bf7080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888143bf7100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff888143bf7180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888143bf7200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Reported-and-tested-by: syzbot+429d3f82d757c211bff3@syzkaller.appspotmail.com Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-22	gfs2: Per-revoke accounting in transactions	Andreas Gruenbacher
	In the log, revokes are stored as a revoke descriptor (struct gfs2_log_descriptor), followed by zero or more additional revoke blocks (struct gfs2_meta_header). On filesystems with a blocksize of 4k, the revoke descriptor contains up to 503 revokes, and the metadata blocks contain up to 509 revokes each. We've so far been reserving space for revokes in transactions in block granularity, so a lot more space than necessary was being allocated and then released again. This patch switches to assigning revokes to transactions individually instead. Initially, space for the revoke descriptor is reserved and handed out to transactions. When more revokes than that are reserved, additional revoke blocks are added. When the log is flushed, the space for the additional revoke blocks is released, but we keep the space for the revoke descriptor block allocated. Transactions may still reserve more revokes than they will actually need in the end, but now we won't overshoot the target as much, and by only returning the space for excess revokes at log flush time, we further reduce the amount of contention between processes. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-02-22	gfs2: Rework the log space allocation logic	Andreas Gruenbacher
	The current log space allocation logic is hard to understand or extend. The principle it that when the log is flushed, we may or may not have a transaction active that has space allocated in the log. To deal with that, we set aside a magical number of blocks to be used in case we don't have an active transaction. It isn't clear that the pool will always be big enough. In addition, we can't return unused log space at the end of a transaction, so the number of blocks allocated must exactly match the number of blocks used. Simplify this as follows: * When transactions are allocated or merged, always reserve enough blocks to flush the transaction (err on the safe side). * In gfs2_log_flush, return any allocated blocks that haven't been used. * Maintain a pool of spare blocks big enough to do one log flush, as before. * In gfs2_log_flush, when we have no active transaction, allocate a suitable number of blocks. For that, use the spare pool when called from logd, and leave the pool alone otherwise. This means that when the log is almost full, logd will still be able to do one more log flush, which will result in more log space becoming available. This will make the log space allocator code easier to work with in the future. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-02-22	gfs2: Minor calc_reserved cleanup	Andreas Gruenbacher
	No functional change. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-02-22	swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single	Christoph Hellwig
	swiotlb_tbl_map_single currently nevers sets a tlb_addr that is not aligned to the tlb bucket size. But we're going to add such a case soon, for which this adjustment would be bogus. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jianxiong Gao <jxgao@google.com> Tested-by: Jianxiong Gao <jxgao@google.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2021-02-22	swiotlb: refactor swiotlb_tbl_map_single	Christoph Hellwig
	Split out a bunch of a self-contained helpers to make the function easier to follow. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jianxiong Gao <jxgao@google.com> Tested-by: Jianxiong Gao <jxgao@google.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2021-02-22	kyber: introduce kyber_depth_updated()	Yang Yang
	Hang occurs when user changes the scheduler queue depth, by writing to the 'nr_requests' sysfs file of that device. The details of the environment that we found the problem are as follows: an eMMC block device total driver tags: 16 default queue_depth: 32 kqd->async_depth initialized in kyber_init_sched() with queue_depth=32 Then we change queue_depth to 256, by writing to the 'nr_requests' sysfs file. But kqd->async_depth don't be updated after queue_depth changes. Now the value of async depth is too small for queue_depth=256, this may cause hang. This patch introduces kyber_depth_updated(), so that kyber can update async depth when queue depth changes. Signed-off-by: Yang Yang <yang.yang@vivo.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-22	mailbox: arm_mhuv2: Skip calling kfree() with invalid pointer	Viresh Kumar
	It is possible that 'data' passed to kfree() is set to a error value instead of allocated space. Make sure it doesn't get called with invalid pointer. Fixes: 5a6338cce9f4 ("mailbox: arm_mhuv2: Add driver") Cc: v5.11 <stable@vger.kernel.org> # v5.11 Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2021-02-22	ice: update the number of available RSS queues	Henry Tieman
	It was possible to have Rx queues that were not available for use by RSS. This would happen when increasing the number of Rx queues while there was a user defined RSS LUT. Always update the number of available RSS queues when changing the number of Rx queues. Fixes: 87324e747fde ("ice: Implement ethtool ops for channels") Signed-off-by: Henry Tieman <henry.w.tieman@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-22	ice: Fix state bits on LLDP mode switch	Dave Ertman
	DCBX_CAP bits were not being adjusted when switching between SW and FW controlled LLDP. Adjust bits to correctly indicate which mode the LLDP logic is in. Fixes: b94b013eb626 ("ice: Implement DCBNL support") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-22	ice: Account for port VLAN in VF max packet size calculation	Brett Creeley
	Currently if an AVF driver doesn't account for the possibility of a port VLAN when determining its max packet size then packets at MTU will be dropped. It is not the VF driver's responsibility to account for a port VLAN so fix this. To fix this, do the following: 1. Add a function that determines the max packet size a VF is allowed by using the port's max packet size and whether the VF is in a port VLAN. If a port VLAN is configured then a VF's max packet size will always be the port's max packet size minus VLAN_HLEN. Otherwise it will be the port's max packet size. 2. Use this function to verify the max packet size from the VF. 3. If there is a port VLAN configured then add 4 bytes (VLAN_HLEN) to the VF's max packet size configuration. Also, the VIRTCHNL_OP_GET_VF_RESOURCES message provides the capability to communicate a VF's max packet size. Use the new function for this purpose. Fixes: 1071a8358a28 ("ice: Implement virtchnl commands for AVF support") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-22	ice: Set trusted VF as default VSI when setting allmulti on	Brett Creeley
	Currently the PF will only set a trusted VF as the default VSI when it requests FLAG_VF_UNICAST_PROMISC over VIRTCHNL. However, when FLAG_VF_MULTICAST_PROMISC is set it's expected that the trusted VF will see multicast packets that don't have a matching destination MAC in the devices internal switch. Fix this by setting the trusted VF as the default VSI if either FLAG_VF_UNICAST_PROMISC or FLAG_VF_MULTICAST_PROMISC is set. Fixes: 01b5e89aab49 ("ice: Add VF promiscuous support") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-22	Merge tag 'kgdb-5.12-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux Pull kgdb updates from Daniel Thompson: "Another fairly small set of changes of changes this cycle. The most significant functional change is a fix to better manage the flags when allocating memory. Additionally there is the removal of some unused code (which is slightly more dramatic than it sounds given it means there are now no tasklets in kgdb) together with a tidy up of the debug prints and some spelling corrections for the documentation" * tag 'kgdb-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux: kgdb: Remove kgdb_schedule_breakpoint() kdb: Make memory allocations more robust kdb: kdb_support: Fix debugging information problem kernel: debug: fix typo issue kgdb: rectify kernel-doc for kgdb_unregister_io_module()
2021-02-22	Merge tag 'livepatching-for-5.12' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching Pull livepatching updates from Petr Mladek: - Practical information how to implement reliable stacktraces needed by the livepatching consistency model by Mark Rutland and Mark Brown. - Automatically generated documentation contents by Mark Brown. * tag 'livepatching-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching: Documentation: livepatch: document reliable stacktrace Documentation: livepatch: Convert to automatically generated contents
2021-02-22	Merge tag 'printk-for-5.12' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - New "no_hash_pointers" kernel parameter causes that %p shows raw pointer values instead of hashed ones. It is intended only for debugging purposes. Misuse is prevented by a fat warning message that is inspired by trace_printk(). - Prevent a possible deadlock when flushing printk_safe buffers during panic(). - Fix performance regression caused by the lockless printk ringbuffer. It was visible with huge log buffer and long messages. - Documentation fix-up. * tag 'printk-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: lib/vsprintf: no_hash_pointers prints all addresses as unhashed kselftest: add support for skipped tests lib: use KSTM_MODULE_GLOBALS macro in kselftest drivers printk: avoid prb_first_valid_seq() where possible printk: fix deadlock when kernel panic printk: rectify kernel-doc for prb_rec_init_wr()
2021-02-22	Merge tag 'linux-kselftest-kunit-5.12-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull KUnit updates from Shuah Khan - support for filtering test suites using glob from Daniel Latypov. "kunit_filter.glob" command line option is passed to the UML kernel, which currently only supports filtering by suite name. This support allows running different subsets of tests, e.g. $ ./tools/testing/kunit/kunit.py build $ ./tools/testing/kunit/kunit.py exec 'list' $ ./tools/testing/kunit/kunit.py exec 'kunit' - several fixes and cleanups also from Daniel Latypov. * tag 'linux-kselftest-kunit-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: tool: fix unintentional statefulness in run_kernel() kunit: tool: add support for filtering suites by glob kunit: add kunit.filter_glob cmdline option to filter suites kunit: don't show `1 == 1` in failed assertion messages kunit: make kunit_tool accept optional path to .kunitconfig fragment Documentation: kunit: add tips.rst for small examples KUnit: Docs: make start.rst example Kconfig follow style.rst kunit: tool: simplify kconfig is_subset_of() logic minor: kunit: tool: fix unit test so it can run from non-root dir kunit: tool: use `with open()` in unit test kunit: tool: stop using bare asserts in unit test kunit: tool: fix unit test cleanup handling
2021-02-22	Merge tag 'linux-kselftest-next-5.12-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest updates from Shuah Khan: - dmabuf-heaps test fixes and cleanups from John Stultz - seccomp test fix to accept any valid fd in user_notification_addfd - Minor fixes to breakpoints and vDSO tests - Minor code cleanups to ipc and x86 tests * tag 'linux-kselftest-next-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/seccomp: Accept any valid fd in user_notification_addfd selftests/timens: add futex binary to .gitignore selftests: breakpoints: Use correct error messages in breakpoint_test_arm64.c selftests/vDSO: fix ABI selftest on riscv selftests/x86/ldt_gdt: remove unneeded semicolon selftests/ipc: remove unneeded semicolon kselftests: dmabuf-heaps: Add extra checking that allocated buffers are zeroed kselftests: dmabuf-heaps: Cleanup test output kselftests: dmabuf-heaps: Softly fail if don't find a vgem device kselftests: dmabuf-heaps: Add clearer checks on DMABUF_BEGIN/END_SYNC kselftests: dmabuf-heaps: Fix Makefile's inclusion of the kernel's usr/include dir
2021-02-22	Merge tag 'docs-5.12' of git://git.lwn.net/linux	Linus Torvalds
	Pull documentation updates from Jonathan Corbet: "It has been a relatively quiet cycle in docsland. - As promised, the minimum Sphinx version to build the docs is now 1.7, and we have dropped support for Python 2 entirely. That allowed the removal of a bunch of compatibility code. - A set of treewide warning fixups from Mauro that I applied after it became clear nobody else was going to deal with them. - The automarkup mechanism can now create cross-references from relative paths to RST files. - More translations, typo fixes, and warning fixes" * tag 'docs-5.12' of git://git.lwn.net/linux: (75 commits) docs: kernel-hacking: be more civil docs: Remove the Microsoft rhetoric Documentation/admin-guide: kernel-parameters: Update nohlt section doc/admin-guide: fix spelling mistake: "perfomance" -> "performance" docs: Document cross-referencing using relative path docs: Enable usage of relative paths to docs on automarkup docs: thermal: fix spelling mistakes Documentation: admin-guide: Update kvm/xen config option docs: Make syscalls' helpers naming consistent coding-style.rst: Avoid comma statements Documentation: /proc/loadavg: add 3 more field descriptions Documentation/submitting-patches: Add blurb about backtraces in commit messages Docs: drop Python 2 support Move our minimum Sphinx version to 1.7 Documentation: input: define ABS_PRESSURE/ABS_MT_PRESSURE resolution as grams scripts/kernel-doc: add internal hyperlink to DOC: sections Update Documentation/admin-guide/sysctl/fs.rst docs: Update DTB format references docs: zh_CN: add iio index.rst translation docs/zh_CN: add iio ep93xx_adc.rst translation ...
2021-02-22	objtool: Fix stack-swizzle for FRAME_POINTER=y	Peter Zijlstra
	When objtool encounters the stack-swizzle: mov %rsp, (%[tos]) mov %[tos], %rsp ... pop %rsp Inside a FRAME_POINTER=y build, things go a little screwy because clearly we're not adjusting the cfa->base. This then results in the pop %rsp not being detected as a restore of cfa->base so it will turn into a regular POP and offset the stack, resulting in: kernel/softirq.o: warning: objtool: do_softirq()+0xdb: return with modified stack frame Therefore, have "mov %[tos], %rsp" act like a PUSH (it sorta is anyway) to balance the things out. We're not too concerned with the actual stack_size for frame-pointer builds, since we don't generate ORC data for them anyway. Fixes: aafeb14e9da2 ("objtool: Support stack-swizzle") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/YC6UC+rc9KKmQrkd@hirez.programming.kicks-ass.net
2021-02-22	Merge tag 'for-5.12/block-ipi-2021-02-21' of git://git.kernel.dk/linux-block	Linus Torvalds
	Pull block IPI updates from Jens Axboe: "Avoid IRQ locking for the block IPI handling (Sebastian Andrzej Siewior)" * tag 'for-5.12/block-ipi-2021-02-21' of git://git.kernel.dk/linux-block: blk-mq: Use llist_head for blk_cpu_done blk-mq: Always complete remote completions requests in softirq smp: Process pending softirqs in flush_smp_call_function_from_idle()
2021-02-22	Merge tag 'iommu-updates-v5.12' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: - ARM SMMU and Mediatek updates from Will Deacon: - Support for MT8192 IOMMU from Mediatek - Arm v7s io-pgtable extensions for MT8192 - Removal of TLBI_ON_MAP quirk - New Qualcomm compatible strings - Allow SVA without hardware broadcast TLB maintenance on SMMUv3 - Virtualization Host Extension support for SMMUv3 (SVA) - Allow SMMUv3 PMU perf driver to be built independently from IOMMU - Some tidy-up in IOVA and core code - Conversion of the AMD IOMMU code to use the generic IO-page-table framework - Intel VT-d updates from Lu Baolu: - Audit capability consistency among different IOMMUs - Add SATC reporting structure support - Add iotlb_sync_map callback support - SDHI support for Renesas IOMMU driver - Misc cleanups and other small improvments * tag 'iommu-updates-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (94 commits) iommu/amd: Fix performance counter initialization MAINTAINERS: repair file pattern in MEDIATEK IOMMU DRIVER iommu/mediatek: Fix error code in probe() iommu/mediatek: Fix unsigned domid comparison with less than zero iommu/vt-d: Parse SATC reporting structure iommu/vt-d: Add new enum value and structure for SATC iommu/vt-d: Add iotlb_sync_map callback iommu/vt-d: Move capability check code to cap_audit files iommu/vt-d: Audit IOMMU Capabilities and add helper functions iommu/vt-d: Fix 'physical' typos iommu: Properly pass gfp_t in _iommu_map() to avoid atomic sleeping iommu/vt-d: Fix compile error [-Werror=implicit-function-declaration] driver/perf: Remove ARM_SMMU_V3_PMU dependency on ARM_SMMU_V3 MAINTAINERS: Add entry for MediaTek IOMMU iommu/mediatek: Add mt8192 support iommu/mediatek: Remove unnecessary check in attach_device iommu/mediatek: Support master use iova over 32bit iommu/mediatek: Add iova reserved function iommu/mediatek: Support for multi domains iommu/mediatek: Add get_domain_id from dev->dma_range_map ...
2021-02-22	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma	Linus Torvalds
	Pull rdma updates from Jason Gunthorpe: "This is quite a small cycle, if not for Lee's 70 patches cleaning the kdocs it would be well below typical for patch count. Most of the interesting work here was in the HNS and rxe drivers which got fairly major internal changes. Summary: - Driver updates and bug fixes: siw, hns, bnxt_re, mlx5, efa - Significant rework in rxe to get it ready to have XRC support added - Several rts bug fixes - Big series to get to 'make W=1' cleanness, primarily updating kdocs - Support for creating a RDMA MR from a DMABUF fd to allow PCI peer to peer transfers to GPU VRAM - Device disassociation now works properly with umad - Work to support more than 255 ports on a RDMA device - Further support for the new HNS HIP09 hardware - Coding style cleanups: comma to semicolon, unneded semicolon/blank lines, remove 'h' printk format, don't check for NULL before kfree, use true/false for bool" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (205 commits) RDMA/rtrs-srv: Do not pass a valid pointer to PTR_ERR() RDMA/srp: Fix support for unpopulated and unbalanced NUMA nodes RDMA/mlx5: Fail QP creation if the device can not support the CQE TS RDMA/mlx5: Allow CQ creation without attached EQs RDMA/rtrs-srv-sysfs: fix missing put_device RDMA/rtrs-srv: fix memory leak by missing kobject free RDMA/rtrs: Only allow addition of path to an already established session RDMA/rtrs-srv: Fix stack-out-of-bounds RDMA/rxe: Remove unused pkt->offset RDMA/ucma: Fix use-after-free bug in ucma_create_uevent RDMA/core: Fix kernel doc warnings for ib_port_immutable_read() RDMA/qedr: Use true and false for bool variable RDMA/hns: Adjust definition of FRMR fields RDMA/hns: Refactor process of posting CMDQ RDMA/hns: Adjust fields and variables about CMDQ tail/head RDMA/hns: Remove redundant operations on CMDQ RDMA/hns: Fixes missing error code of CMDQ RDMA/hns: Remove unused member and variable of CMDQ RDMA/ipoib: Remove racy Subnet Manager sendonly join checks RDMA/mlx5: Support 400Gbps IB rate in mlx5 driver ...
2021-02-22	Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi	Linus Torvalds
	Pull SCSI updates from James Bottomley: "This series consists of the usual driver updates (ufs, ibmvfc, qla2xxx, hisi_sas, pm80xx) plus the removal of the gdth driver (which is bound to cause conflicts with a trivial change somewhere). The only big major rework of note is the one from Hannes trying to clean up our result handling code in the drivers to make it consistent" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (194 commits) scsi: MAINTAINERS: Adjust to reflect gdth scsi driver removal scsi: ufs: Give clk scaling min gear a value scsi: lpfc: Fix 'physical' typos scsi: megaraid_mbox: Fix spelling of 'allocated' scsi: qla2xxx: Simplify the calculation of variables scsi: message: fusion: Fix 'physical' typos scsi: target: core: Change ASCQ for residual write scsi: target: core: Signal WRITE residuals scsi: target: core: Set residuals for 4Kn devices scsi: hisi_sas: Add trace FIFO debugfs support scsi: hisi_sas: Flush workqueue in hisi_sas_v3_remove() scsi: hisi_sas: Enable debugfs support by default scsi: hisi_sas: Don't check .nr_hw_queues in hisi_sas_task_prep() scsi: hisi_sas: Remove deferred probe check in hisi_sas_v2_probe() scsi: lpfc: Add auto select on IRQ_POLL scsi: ncr53c8xx: Fix typos scsi: lpfc: Fix ancient double free scsi: qla2xxx: Fix some memory corruption scsi: qla2xxx: Remove redundant NULL check scsi: megaraid: Fix ifnullfree.cocci warnings ...
2021-02-22	Merge tag 'for-5.12/dm-changes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Fix DM integrity's HMAC support to provide enhanced security of internal_hash and journal_mac capabilities. - Various DM writecache fixes to address performance, fix table output to match what was provided at table creation, fix writing beyond end of device when shrinking underlying data device, and a couple other small cleanups. - Add DM crypt support for using trusted keys. - Fix deadlock when swapping to DM crypt device by throttling number of in-flight REQ_SWAP bios. Implemented in DM core so that other bio-based targets can opt-in by setting ti->limit_swap_bios. - Fix various inverted logic bugs in the .iterate_devices callout functions that are used to assess if specific feature or capability is supported across all devices being combined/stacked by DM. - Fix DM era target bugs that exposed users to lost writes or memory leaks. - Add DM core support for passing through inline crypto support of underlying devices. Includes block/keyslot-manager changes that enable extending this support to DM. - Various small fixes and cleanups (spelling fixes, front padding calculation cleanup, cleanup conditional zoned support in targets, etc). * tag 'for-5.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (31 commits) dm: fix deadlock when swapping to encrypted device dm: simplify target code conditional on CONFIG_BLK_DEV_ZONED dm: set DM_TARGET_PASSES_CRYPTO feature for some targets dm: support key eviction from keyslot managers of underlying devices dm: add support for passing through inline crypto support block/keyslot-manager: Introduce functions for device mapper support block/keyslot-manager: Introduce passthrough keyslot manager dm era: only resize metadata in preresume dm era: Use correct value size in equality function of writeset tree dm era: Fix bitset memory leaks dm era: Verify the data block size hasn't changed dm era: Reinitialize bitset cache before digesting a new writeset dm era: Update in-core bitset after committing the metadata dm era: Recover committed writeset after crash dm writecache: use bdev_nr_sectors() instead of open-coded equivalent dm writecache: fix writing beyond end of underlying device when shrinking dm table: remove needless request_queue NULL pointer checks dm table: fix zoned iterate_devices based device capability checks dm table: fix DAX iterate_devices based device capability checks dm table: fix iterate_devices based device capability checks ...
2021-02-22	ice: report correct max number of TCs	Dave Ertman
	In the driver currently, we are reporting max number of TCs to the DCBNL callback as a kernel define set to 8. This is preventing userspace applications performing DCBx to correctly down map the TCs from requested to actual values. Report the actual max TC value to userspace from the capability struct. Fixes: b94b013eb626 ("ice: Implement DCBNL support") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-22	KVM: x86/mmu: Consider the hva in mmu_notifier retry	David Stevens
	Track the range being invalidated by mmu_notifier and skip page fault retries if the fault address is not affected by the in-progress invalidation. Handle concurrent invalidations by finding the minimal range which includes all ranges being invalidated. Although the combined range may include unrelated addresses and cannot be shrunk as individual invalidation operations complete, it is unlikely the marginal gains of proper range tracking are worth the additional complexity. The primary benefit of this change is the reduction in the likelihood of extreme latency when handing a page fault due to another thread having been preempted while modifying host virtual addresses. Signed-off-by: David Stevens <stevensd@chromium.org> Message-Id: <20210222024522.1751719-3-stevensd@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-22	KVM: x86/mmu: Skip mmu_notifier check when handling MMIO page fault	Sean Christopherson
	Don't retry a page fault due to an mmu_notifier invalidation when handling a page fault for a GPA that did not resolve to a memslot, i.e. an MMIO page fault. Invalidations from the mmu_notifier signal a change in a host virtual address (HVA) mapping; without a memslot, there is no HVA and thus no possibility that the invalidation is relevant to the page fault being handled. Note, the MMIO vs. memslot generation checks handle the case where a pending memslot will create a memslot overlapping the faulting GPA. The mmu_notifier checks are orthogonal to memslot updates. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210222024522.1751719-2-stevensd@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-22	KVM: Documentation: rectify rst markup in KVM_GET_SUPPORTED_HV_CPUID	Lukas Bulwahn
	Commit c21d54f0307f ("KVM: x86: hyper-v: allow KVM_GET_SUPPORTED_HV_CPUID as a system ioctl") added an enumeration in the KVM_GET_SUPPORTED_HV_CPUID documentation improperly for rst, and caused new warnings in make htmldocs: Documentation/virt/kvm/api.rst:4536: WARNING: Unexpected indentation. Documentation/virt/kvm/api.rst:4538: WARNING: Block quote ends without a blank line; unexpected unindent. Fix that issue and another historic rst markup issue from the initial rst conversion in the KVM_GET_SUPPORTED_HV_CPUID documentation. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Message-Id: <20210104095938.24838-1-lukas.bulwahn@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-22	KVM: nSVM: prepare guest save area while is_guest_mode is true	Paolo Bonzini
	Right now, enter_svm_guest_mode is calling nested_prepare_vmcb_save and nested_prepare_vmcb_control. This results in is_guest_mode being false until the end of nested_prepare_vmcb_control. This is a problem because nested_prepare_vmcb_save can in turn cause changes to the intercepts and these have to be applied to the "host VMCB" (stored in svm->nested.hsave) and then merged with the VMCB12 intercepts into svm->vmcb. In particular, without this change we forget to set the CR0 read and CR0 write intercepts when running a real mode L2 guest with NPT disabled. The guest is therefore able to see the CR0.PG bit that KVM sets to enable "paged real mode". This patch fixes the svm.flat mode_switch test case with npt=0. There are no other problematic calls in nested_prepare_vmcb_save. Moving is_guest_mode to the end is done since commit 06fc7772690d ("KVM: SVM: Activate nested state only when guest state is complete", 2010-04-25). However, back then KVM didn't grab a different VMCB when updating the intercepts, it had already copied/merged L1's stuff to L0's VMCB, and then updated L0's VMCB regardless of is_nested(). Later recalc_intercepts was introduced in commit 384c63684397 ("KVM: SVM: Add function to recalculate intercept masks", 2011-01-12). This introduced the bug, because recalc_intercepts now throws away the intercept manipulations that svm_set_cr0 had done in the meanwhile to svm->vmcb. [1] https://lore.kernel.org/kvm/1266493115-28386-1-git-send-email-joerg.roedel@amd.com/ Reviewed-by: Sean Christopherson <seanjc@google.com> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>