summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-03-11docs: serial: fix a reference file name in driver.rstWan Jiabing
Fix the following 'make refcheckdocs' warning: Warning: Documentation/driver-api/serial/driver.rst references a file that doesn't exist: Documentation/driver-api/serial/tty.rst Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20220304100315.6732-1-wanjiabing@vivo.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-03-11docs: UML: Mention telnetd for port channelVincent Whitchurch
It is not obvious from the documentation that using the "port" channel for the console requires telnetd to be installed (see port_connection() in arch/um/drivers/port_user.c). Mention this, and the fact that UML will not boot until a client connects. Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://lore.kernel.org/r/20220310124230.3069354-1-vincent.whitchurch@axis.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-03-11ARM: Spectre-BHB: provide empty stub for non-configRandy Dunlap
When CONFIG_GENERIC_CPU_VULNERABILITIES is not set, references to spectre_v2_update_state() cause a build error, so provide an empty stub for that function when the Kconfig option is not set. Fixes this build error: arm-linux-gnueabi-ld: arch/arm/mm/proc-v7-bugs.o: in function `cpu_v7_bugs_init': proc-v7-bugs.c:(.text+0x52): undefined reference to `spectre_v2_update_state' arm-linux-gnueabi-ld: proc-v7-bugs.c:(.text+0x82): undefined reference to `spectre_v2_update_state' Fixes: b9baf5c8c5c3 ("ARM: Spectre-BHB workaround") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: patches@armlinux.org.uk Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11docs/zh_CN: add damon reclaim translationYanteng Si
Translate .../admin-guide/mm/damon/reclaim.rst into Chinese. Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Reviewed-by: Alex Shi <alexs@kernel.org> Link: https://lore.kernel.org/r/0a436fa78814bb0a7b9c2f3049e544b1e1802560.1646899089.git.siyanteng@loongson.cn Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-03-11docs/zh_CN: add damon usage translationYanteng Si
Translate .../admin-guide/mm/damon/usage.rst into Chinese. Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Reviewed-by: Alex Shi <alexs@kernel.org> Link: https://lore.kernel.org/r/431f1c2a158c61a6556f58048cb54961ab7a8790.1646899089.git.siyanteng@loongson.cn Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-03-11docs/zh_CN: add admin-guide damon start translationYanteng Si
Translate Documentation/admin-guide/mm/damon/start.rst into Chinese. Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Reviewed-by: Alex Shi <alexs@kernel.org> Link: https://lore.kernel.org/r/e6e328be018cbf5f9105adfdad56c951acbb8c8f.1646899089.git.siyanteng@loongson.cn Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-03-11docs/zh_CN: add admin-guide damon index translationYanteng Si
Translate .../admin-guide/mm/damon/index.rst into Chinese. Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Reviewed-by: Alex Shi <alexs@kernel.org> Link: https://lore.kernel.org/r/0251f09dc926972068329b87b0563dd432849497.1646899089.git.siyanteng@loongson.cn Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-03-11docs/zh_CN: Refactoring the admin-guide directory indexYanteng Si
The Todolist in the html document looks a mess, now give it a nice looking format. Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Reviewed-by: Alex Shi <alexs@kernel.org> Link: https://lore.kernel.org/r/d410408ec13d6e9cff97da50a13d793a428e05cf.1646899089.git.siyanteng@loongson.cn Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-03-11zh_CN: Add translation for admin-guide/mm/index.rstxu xin
Translate Documentation/admin-guide/mm/index.rst into Chinese. Update Documentation/admin-guide/index.rst. Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> Reviewed-by: Alex Shi <alexs@kernel.org> Reviewed-by: Yanteng Si <siyanteng@loongson.cn> Signed-off-by: xu xin <xu.xin16@zte.com.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Link: https://lore.kernel.org/r/2d695dac05efc012b99fbc7525be65a421c7de03.1646899056.git.siyanteng@loongson.cn Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-03-11zh_CN: Add translations for admin-guide/mm/ksm.rstxu xin
Translate Documentation/admin-guide/mm/ksm.rst into Chinese. Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> Reviewed-by: Alex Shi <alexs@kernel.org> Reviewed-by: Yanteng Si <siyanteng@loongson.cn> Signed-off-by: xu xin <xu.xin16@zte.com.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Link: https://lore.kernel.org/r/f987a3a2cbffaad64f6e2377a5e393d9afbb099c.1646899056.git.siyanteng@loongson.cn Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-03-11Add Chinese translation for vm/ksm.rstxu xin
Translate Documentation/vm/ksm.rst into Chinese. Update Documentation/translations/zh_CN/vm/index.rst. Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> Reviewed-by: Alex Shi <alexs@kernel.org> Reviewed-by: Yanteng Si <siyanteng@loongson.cn> Signed-off-by: xu xin <xu.xin16@zte.com.cn> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Link: https://lore.kernel.org/r/ceb82d6458cd79bc3b7060199db0c3518adc3b8b.1646899056.git.siyanteng@loongson.cn Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-03-11Merge tag 'riscv-for-linus-5.17-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - prevent users from enabling the alternatives framework (and thus errata handling) on XIP kernels, where runtime code patching does not function correctly. - properly detect offset overflow for AUIPC-based relocations in modules. This may manifest as modules calling arbitrary invalid addresses, depending on the address allocated when a module is loaded. * tag 'riscv-for-linus-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fix auipc+jalr relocation range checks riscv: alternative only works on !XIP_KERNEL
2022-03-11Merge tag 'powerpc-5.17-6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: "Fix STACKTRACE=n build, in particular for skiroot_defconfig" * tag 'powerpc-5.17-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc: Fix STACKTRACE=n build
2022-03-11ARM: fix Thumb2 regression with Spectre BHBRussell King (Oracle)
When building for Thumb2, the vectors make use of a local label. Sadly, the Spectre BHB code also uses a local label with the same number which results in the Thumb2 reference pointing at the wrong place. Fix this by changing the number used for the Spectre BHB local label. Fixes: b9baf5c8c5c3 ("ARM: Spectre-BHB workaround") Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11Merge tag 'mmc-v5.17-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC core: - Restore (mostly) the busy polling for MMC_SEND_OP_COND MMC host: - meson-gx: Fix DMA usage of meson_mmc_post_req()" * tag 'mmc-v5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: core: Restore (almost) the busy polling for MMC_SEND_OP_COND mmc: meson: Fix usage of meson_mmc_post_req()
2022-03-11Merge branch irq/qcom-mpm into irq/irqchip-nextMarc Zyngier
* irq/qcom-mpm: : . : Add support for Qualcomm's MPM wakeup controller, courtesy : of Shawn Guo. : . irqchip: Add Qualcomm MPM controller driver dt-bindings: interrupt-controller: Add Qualcomm MPM support Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-03-11irqchip: Add Qualcomm MPM controller driverShawn Guo
Qualcomm SoCs based on the RPM architecture have a MSM Power Manager (MPM) in always-on domain. In addition to managing resources during sleep, the hardware also has an interrupt controller that monitors the interrupts when the system is asleep, wakes up the APSS when one of these interrupts occur and replays it to GIC after it becomes operational. It adds an irqchip driver for this interrupt controller, and here are some notes about it. - For given SoC, a fixed number of MPM pins are supported, e.g. 96 pins on QCM2290. Each of these MPM pins can be either a MPM_GIC pin or a MPM_GPIO pin. The mapping between MPM_GIC pin and GIC interrupt is defined by SoC, as well as the mapping between MPM_GPIO pin and GPIO number. The former mapping is retrieved from device tree, while the latter is defined in TLMM pinctrl driver. - The power domain (PD) .power_off hook is used to notify RPM that APSS is about to power collapse. This requires MPM PD be the parent PD of CPU cluster. - When SoC gets awake from sleep mode, the driver will receive an interrupt from RPM, so that it can replay interrupt for particular polarity. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220308080534.3384532-3-shawn.guo@linaro.org
2022-03-11dt-bindings: interrupt-controller: Add Qualcomm MPM supportShawn Guo
It adds DT binding support for Qualcomm MPM interrupt controller. Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220308080534.3384532-2-shawn.guo@linaro.org
2022-03-11parisc: Increase parisc_cache_flush_threshold settingJohn David Anglin
In testing the "Fix non-access data TLB cache flush faults" change, I noticed a significant improvement in glibc build and check times. This led me to investigate the parisc_cache_flush_threshold setting. It determines when we switch from line flushing to whole cache flushing. It turned out that the parisc_cache_flush_threshold setting on mako and mako2 machines (PA8800 and PA8900 processors) was way too small. Adjusting this setting provided almost a factor two improvement in the glibc build and check time. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
2022-03-11parisc/unaligned: Enhance user-space visible outputHelge Deller
Userspace is up to now limited to 32-bit, so it's sufficient to print only 32-bit values when showing pointer addresses. Signed-off-by: Helge Deller <deller@gmx.de>
2022-03-11parisc/unaligned: Rewrite 32-bit inline assembly of emulate_sth()Helge Deller
Convert to use real temp variables instead of clobbering processor registers. Signed-off-by: Helge Deller <deller@gmx.de>
2022-03-11parisc/unaligned: Rewrite 32-bit inline assembly of emulate_ldd()Helge Deller
Convert to use real temp variables instead of clobbering processor registers. Signed-off-by: Helge Deller <deller@gmx.de>
2022-03-11parisc/unaligned: Rewrite inline assembly of emulate_ldw()Helge Deller
Convert to use real temp variables instead of clobbering processor registers. Signed-off-by: Helge Deller <deller@gmx.de>
2022-03-11parisc/unaligned: Rewrite inline assembly of emulate_ldh()Helge Deller
Convert to use real temp variables instead of clobbering processor registers. Signed-off-by: Helge Deller <deller@gmx.de>
2022-03-11parisc/unaligned: Use EFAULT fixup handler in unaligned handlersHelge Deller
Convert the inline assembly code to use the automatic EFAULT exception handler. With that the fixup code can be dropped. The other change is to allow double-word only when a 64-bit kernel is used instead of depending on CONFIG_PA20. Signed-off-by: Helge Deller <deller@gmx.de>
2022-03-11parisc: Reduce code size by optimizing get_current() function callsHelge Deller
The get_current() code uses the mfctl() macro to get the pointer to the current task struct from %cr30. The problem with the mfctl() macro is, that it is marked volatile which is basically correct, because mfctl() is used to get e.g. the current internal timer or interrupt flags as well. But specifically the task struct pointer (%cr30) doesn't change over time when the kernel executes code for a task. So, by dropping the volatile when retrieving %cr30 the compiler is now able to get this value only once and optimize the generated code a lot. A bloat-o-meter comparism shows that this patch saves ~5kB kernel code on a 32-bit kernel and ~6kB kernel code on a 64-bit kernel. Signed-off-by: Helge Deller <deller@gmx.de>
2022-03-11parisc: Use constants to encode the space registers like SR_KERNELHelge Deller
Use the provided space register constants instead of hardcoded values. Signed-off-by: Helge Deller <deller@gmx.de>
2022-03-11parisc: Use SR_USER and SR_KERNEL in get_user() and put_user()Helge Deller
Instead of hardcoding the space registers as strings, use the SR_USER and SR_KERNEL constants to form the space register in the access functions. Signed-off-by: Helge Deller <deller@gmx.de>
2022-03-11parisc: Add defines for various space registerHelge Deller
Provide defines for space registers (SR_KERNEL, SR_USER, ...) which should be used instead of hardcoding the values. Signed-off-by: Helge Deller <deller@gmx.de>
2022-03-11parisc: Always use the self-extracting kernel featureHelge Deller
This patch drops the CONFIG_PARISC_SELF_EXTRACT option. The palo boot loader is able to decompress a kernel which was compressed with gzip. That possibility was useful when the Linux kernel self-extracting feature wasn't implemented yet. Beside the fact that the self-extracting feature offers much better compression rates, we do support self-extracting kernels already since kernel v4.14, so now it's really time to get rid of that old option and always use the self-extractor. Signed-off-by: Helge Deller <deller@gmx.de>
2022-03-11video/fbdev/stifb: Implement the stifb_fillrect() functionHelge Deller
The stifb driver (for Artist/HCRX graphics on PA-RISC) was missing the fillrect function. Tested on a 715/64 PA-RISC machine and in qemu. Signed-off-by: Helge Deller <deller@gmx.de>
2022-03-11parisc: Add vDSO supportHelge Deller
Add minimal vDSO support, which provides the signal trampoline helpers, but none of the userspace syscall helpers like time wrappers. The big benefit of this vDSO implementation is, that we now don't need an executeable stack any longer. PA-RISC is one of the last architectures where an executeable stack was needed in oder to implement the signal trampolines by putting assembly instructions on the stack which then gets executed. Instead the kernel will provide the relevant code in the vDSO page and only put the pointers to the signal information on the stack. By dropping the need for executable stacks we avoid running into issues with applications which want non executable stacks for security reasons. Additionally, alternative stacks on memory areas without exec permissions are supported too. This code is based on an initial implementation by Randolph Chung from 2006: https://lore.kernel.org/linux-parisc/4544A34A.6080700@tausq.org/ I did the porting and lifted the code to current code base. Dave fixed the unwind code so that gdb and glibc are able to backtrace through the code. An additional patch to gdb will be pushed upstream by Dave. Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Dave Anglin <dave.anglin@bell.net> Cc: Randolph Chung <randolph@tausq.org> Signed-off-by: Helge Deller <deller@gmx.de>
2022-03-11parisc: Simplify fast path for non-access data TLB faultsJohn David Anglin
With the latest cache fix for non-access faults and the support for non-access faults (code 17) in handle_interruption, we can remove the fast path emulation for fdc, fic, pdc, lpa, probe and probei instructions. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
2022-03-11parisc: Fix handling off probe non-access faultsJohn David Anglin
Currently, the parisc kernel does not fully support non-access TLB fault handling for probe instructions. In the fast path, we set the target register to zero if it is not a shadowed register. The slow path is not implemented, so we call do_page_fault. The architecture indicates that non-access faults should not cause a page fault from disk. This change adds to code to provide non-access fault support for probe instructions. It also modifies the handling of faults on userspace so that if the address lies in a valid VMA and the access type matches that for the VMA, the probe target register is set to one. Otherwise, the target register is set to zero. This was done to make probe instructions more useful for userspace. Probe instructions are not very useful if they set the target register to zero whenever a page is not present in memory. Nominally, the purpose of the probe instruction is determine whether read or write access to a given address is allowed. This fixes a problem in function pointer comparison noticed in the glibc testsuite (stdio-common/tst-vfprintf-user-type). The same problem is likely in glibc (_dl_lookup_address). V2 adds flush and lpa instruction support to handle_nadtlb_fault. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
2022-03-11parisc: Fix non-access data TLB cache flush faultsJohn David Anglin
When a page is not present, we get non-access data TLB faults from the fdc and fic instructions in flush_user_dcache_range_asm and flush_user_icache_range_asm. When these occur, the cache line is not invalidated and potentially we get memory corruption. The problem was hidden by the nullification of the flush instructions. These faults also affect performance. With pa8800/pa8900 processors, there will be 32 faults per 4 KB page since the cache line is 128 bytes. There will be more faults with earlier processors. The problem is fixed by using flush_cache_pages(). It does the flush using a tmp alias mapping. The flush_cache_pages() call in flush_cache_range() flushed too large a range. V2: Remove unnecessary preempt_disable() and preempt_enable() calls. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
2022-03-11x86/sgx: Free backing memory after faulting the enclave pageJarkko Sakkinen
There is a limited amount of SGX memory (EPC) on each system. When that memory is used up, SGX has its own swapping mechanism which is similar in concept but totally separate from the core mm/* code. Instead of swapping to disk, SGX swaps from EPC to normal RAM. That normal RAM comes from a shared memory pseudo-file and can itself be swapped by the core mm code. There is a hierarchy like this: EPC <-> shmem <-> disk After data is swapped back in from shmem to EPC, the shmem backing storage needs to be freed. Currently, the backing shmem is not freed. This effectively wastes the shmem while the enclave is running. The memory is recovered when the enclave is destroyed and the backing storage freed. Sort this out by freeing memory with shmem_truncate_range(), as soon as a page is faulted back to the EPC. In addition, free the memory for PCMD pages as soon as all PCMD's in a page have been marked as unused by zeroing its contents. Cc: stable@vger.kernel.org Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer") Reported-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20220303223859.273187-1-jarkko@kernel.org
2022-03-11Merge branch 'davidh' (fixes from David Howells)Linus Torvalds
Merge misc fixes from David Howells: "A set of patches for watch_queue filter issues noted by Jann. I've added in a cleanup patch from Christophe Jaillet to convert to using formal bitmap specifiers for the note allocation bitmap. Also two filesystem fixes (afs and cachefiles)" * emailed patches from David Howells <dhowells@redhat.com>: cachefiles: Fix volume coherency attribute afs: Fix potential thrashing in afs writeback watch_queue: Make comment about setting ->defunct more accurate watch_queue: Fix lack of barrier/sync/lock between post and read watch_queue: Free the alloc bitmap when the watch_queue is torn down watch_queue: Fix the alloc bitmap size to reflect notes allocated watch_queue: Use the bitmap API when applicable watch_queue: Fix to always request a pow-of-2 pipe ring size watch_queue: Fix to release page in ->release() watch_queue, pipe: Free watchqueue state after clearing pipe ring watch_queue: Fix filter limit check
2022-03-11cachefiles: Fix volume coherency attributeDavid Howells
A network filesystem may set coherency data on a volume cookie, and if given, cachefiles will store this in an xattr on the directory in the cache corresponding to the volume. The function that sets the xattr just stores the contents of the volume coherency buffer directly into the xattr, with nothing added; the checking function, on the other hand, has a cut'n'paste error whereby it tries to interpret the xattr contents as would be the xattr on an ordinary file (using the cachefiles_xattr struct). This results in a failure to match the coherency data because the buffer ends up being shifted by 18 bytes. Fix this by defining a structure specifically for the volume xattr and making both the setting and checking functions use it. Since the volume coherency doesn't work if used, take the opportunity to insert a reserved field for future use, set it to 0 and check that it is 0. Log mismatch through the appropriate tracepoint. Note that this only affects cifs; 9p, afs, ceph and nfs don't use the volume coherency data at the moment. Fixes: 32e150037dce ("fscache, cachefiles: Store the volume coherency data") Reported-by: Rohith Surabattula <rohiths.msft@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Steve French <smfrench@gmail.com> cc: linux-cifs@vger.kernel.org cc: linux-cachefs@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11afs: Fix potential thrashing in afs writebackDavid Howells
In afs_writepages_region(), if the dirty page we find is undergoing writeback or write to cache, but the sync_mode is WB_SYNC_NONE, we go round the loop trying the same page again and again with no pausing or waiting unless and until another thread manages to clear the writeback and fscache flags. Fix this with three measures: (1) Advance start to after the page we found. (2) Break out of the loop and return if rescheduling is requested. (3) Arbitrarily give up after a maximum of 5 skips. Fixes: 31143d5d515e ("AFS: implement basic file write support") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Marc Dionne <marc.dionne@auristor.com> Acked-by: Marc Dionne <marc.dionne@auristor.com> Link: https://lore.kernel.org/r/164692725757.2097000.2060513769492301854.stgit@warthog.procyon.org.uk/ # v1 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11x86/traps: Mark do_int3() NOKPROBE_SYMBOLLi Huafei
Since kprobe_int3_handler() is called in do_int3(), probing do_int3() can cause a breakpoint recursion and crash the kernel. Therefore, do_int3() should be marked as NOKPROBE_SYMBOL. Fixes: 21e28290b317 ("x86/traps: Split int3 handler up") Signed-off-by: Li Huafei <lihuafei1@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220310120915.63349-1-lihuafei1@huawei.com
2022-03-11watch_queue: Make comment about setting ->defunct more accurateDavid Howells
watch_queue_clear() has a comment stating that setting ->defunct to true preventing new additions as well as preventing notifications. Whilst the latter is true, the first bit is superfluous since at the time this function is called, the pipe cannot be accessed to add new event sources. Remove the "new additions" bit from the comment. Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11watch_queue: Fix lack of barrier/sync/lock between post and readDavid Howells
There's nothing to synchronise post_one_notification() versus pipe_read(). Whilst posting is done under pipe->rd_wait.lock, the reader only takes pipe->mutex which cannot bar notification posting as that may need to be made from contexts that cannot sleep. Fix this by setting pipe->head with a barrier in post_one_notification() and reading pipe->head with a barrier in pipe_read(). If that's not sufficient, the rd_wait.lock will need to be taken, possibly in a ->confirm() op so that it only applies to notifications. The lock would, however, have to be dropped before copy_page_to_iter() is invoked. Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11watch_queue: Free the alloc bitmap when the watch_queue is torn downDavid Howells
Free the watch_queue note allocation bitmap when the watch_queue is destroyed. Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11watch_queue: Fix the alloc bitmap size to reflect notes allocatedDavid Howells
Currently, watch_queue_set_size() sets the number of notes available in wqueue->nr_notes according to the number of notes allocated, but sets the size of the bitmap to the unrounded number of notes originally asked for. Fix this by setting the bitmap size to the number of notes we're actually going to make available (ie. the number allocated). Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11watch_queue: Use the bitmap API when applicableChristophe JAILLET
Use bitmap_alloc() to simplify code, improve the semantic and reduce some open-coded arithmetic in allocator arguments. Also change a memset(0xff) into an equivalent bitmap_fill() to keep consistency. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11watch_queue: Fix to always request a pow-of-2 pipe ring sizeDavid Howells
The pipe ring size must always be a power of 2 as the head and tail pointers are masked off by AND'ing with the size of the ring - 1. watch_queue_set_size(), however, lets you specify any number of notes between 1 and 511. This number is passed through to pipe_resize_ring() without checking/forcing its alignment. Fix this by rounding the number of slots required up to the nearest power of two. The request is meant to guarantee that at least that many notifications can be generated before the queue is full, so rounding down isn't an option, but, alternatively, it may be better to give an error if we aren't allowed to allocate that much ring space. Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11watch_queue: Fix to release page in ->release()David Howells
When a pipe ring descriptor points to a notification message, the refcount on the backing page is incremented by the generic get function, but the release function, which marks the bitmap, doesn't drop the page ref. Fix this by calling generic_pipe_buf_release() at the end of watch_queue_pipe_buf_release(). Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11watch_queue, pipe: Free watchqueue state after clearing pipe ringDavid Howells
In free_pipe_info(), free the watchqueue state after clearing the pipe ring as each pipe ring descriptor has a release function, and in the case of a notification message, this is watch_queue_pipe_buf_release() which tries to mark the allocation bitmap that was previously released. Fix this by moving the put of the pipe's ref on the watch queue to after the ring has been cleared. We still need to call watch_queue_clear() before doing that to make sure that the pipe is disconnected from any notification sources first. Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11watch_queue: Fix filter limit checkDavid Howells
In watch_queue_set_filter(), there are a couple of places where we check that the filter type value does not exceed what the type_filter bitmap can hold. One place calculates the number of bits by: if (tf[i].type >= sizeof(wfilter->type_filter) * 8) which is fine, but the second does: if (tf[i].type >= sizeof(wfilter->type_filter) * BITS_PER_LONG) which is not. This can lead to a couple of out-of-bounds writes due to a too-large type: (1) __set_bit() on wfilter->type_filter (2) Writing more elements in wfilter->filters[] than we allocated. Fix this by just using the proper WATCH_TYPE__NR instead, which is the number of types we actually know about. The bug may cause an oops looking something like: BUG: KASAN: slab-out-of-bounds in watch_queue_set_filter+0x659/0x740 Write of size 4 at addr ffff88800d2c66bc by task watch_queue_oob/611 ... Call Trace: <TASK> dump_stack_lvl+0x45/0x59 print_address_description.constprop.0+0x1f/0x150 ... kasan_report.cold+0x7f/0x11b ... watch_queue_set_filter+0x659/0x740 ... __x64_sys_ioctl+0x127/0x190 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Allocated by task 611: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 watch_queue_set_filter+0x23a/0x740 __x64_sys_ioctl+0x127/0x190 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae The buggy address belongs to the object at ffff88800d2c66a0 which belongs to the cache kmalloc-32 of size 32 The buggy address is located 28 bytes inside of 32-byte region [ffff88800d2c66a0, ffff88800d2c66c0) Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-11block: flush plug based on hardware and software queue orderJens Axboe
We used to sort the plug list if we had multiple queues before dispatching requests to the IO scheduler. This usually isn't needed, but for certain workloads that interleave requests to disks, it's a less efficient to process the plug list one-by-one if everything is interleaved. Don't sort the list, but skip through it and flush out entries that have the same target at the same time. Fixes: df87eb0fce8f ("block: get rid of plug list sorting") Reported-and-tested-by: Song Liu <song@kernel.org> Reviewed-by: Song Liu <songliubraving@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>