summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-12-12mips, kexec: fix the incorrect ifdeffery and dependency of CONFIG_KEXECBaoquan He
The select of KEXEC for CRASH_DUMP in kernel/Kconfig.kexec will be dropped, then compiling errors will be triggered if below config items are set: === CONFIG_CRASH_CORE=y CONFIG_KEXEC_CORE=y CONFIG_CRASH_DUMP=y === -------------------------------------------------------------------- mipsel-linux-ld: kernel/kexec_core.o: in function `kimage_free': kernel/kexec_core.c:(.text+0x2200): undefined reference to `machine_kexec_cleanup' mipsel-linux-ld: kernel/kexec_core.o: in function `__crash_kexec': kernel/kexec_core.c:(.text+0x2480): undefined reference to `machine_crash_shutdown' mipsel-linux-ld: kernel/kexec_core.c:(.text+0x2488): undefined reference to `machine_kexec' mipsel-linux-ld: kernel/kexec_core.o: in function `kernel_kexec': kernel/kexec_core.c:(.text+0x29b8): undefined reference to `machine_shutdown' mipsel-linux-ld: kernel/kexec_core.c:(.text+0x29c0): undefined reference to `machine_kexec' -------------------------------------------------------------------- Here, change the dependency of building kexec_core related object files, and the ifdeffery in mips from CONFIG_KEXEC to CONFIG_KEXEC_CORE. Link: https://lkml.kernel.org/r/20231208073036.7884-4-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202311302042.sn8cDPIX-lkp@intel.com/ Cc: Eric DeVolder <eric_devolder@yahoo.com> Cc: Ignat Korchagin <ignat@cloudflare.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-12m68k, kexec: fix the incorrect ifdeffery and build dependency of CONFIG_KEXECBaoquan He
The select of KEXEC for CRASH_DUMP in kernel/Kconfig.kexec will be dropped, then compiling errors will be triggered if below config items are set: === CONFIG_CRASH_CORE=y CONFIG_KEXEC_CORE=y CONFIG_CRASH_DUMP=y === Here, change the dependency of buinding machine_kexec.o relocate_kernel.o and the ifdeffery in asm/kexe.h to CONFIG_KEXEC_CORE. Link: https://lkml.kernel.org/r/20231208073036.7884-3-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Eric DeVolder <eric_devolder@yahoo.com> Cc: Ignat Korchagin <ignat@cloudflare.com> Cc: kernel test robot <lkp@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-12loongarch, kexec: change dependency of object filesBaoquan He
Patch series "kexec: fix the incorrect ifdeffery and dependency of CONFIG_KEXEC". The select of KEXEC for CRASH_DUMP in kernel/Kconfig.kexec will be dropped, then compiling errors will be triggered if below config items are set: === CONFIG_CRASH_CORE=y CONFIG_KEXEC_CORE=y CONFIG_CRASH_DUMP=y === E.g on mips, below link error are seen: -------------------------------------------------------------------- mipsel-linux-ld: kernel/kexec_core.o: in function `kimage_free': kernel/kexec_core.c:(.text+0x2200): undefined reference to `machine_kexec_cleanup' mipsel-linux-ld: kernel/kexec_core.o: in function `__crash_kexec': kernel/kexec_core.c:(.text+0x2480): undefined reference to `machine_crash_shutdown' mipsel-linux-ld: kernel/kexec_core.c:(.text+0x2488): undefined reference to `machine_kexec' mipsel-linux-ld: kernel/kexec_core.o: in function `kernel_kexec': kernel/kexec_core.c:(.text+0x29b8): undefined reference to `machine_shutdown' mipsel-linux-ld: kernel/kexec_core.c:(.text+0x29c0): undefined reference to `machine_kexec' -------------------------------------------------------------------- Here, change the incorrect dependency of building kexec_core related object files, and the ifdeffery on architectures from CONFIG_KEXEC to CONFIG_KEXEC_CORE. Testing: ======== Passed on mips and loognarch with the LKP reproducer. This patch (of 5): Currently, in arch/loongarch/kernel/Makefile, building machine_kexec.o relocate_kernel.o depends on CONFIG_KEXEC. Whereas, since we will drop the select of KEXEC for CRASH_DUMP in kernel/Kconfig.kexec, compiling error will be triggered if below config items are set: === CONFIG_CRASH_CORE=y CONFIG_KEXEC_CORE=y CONFIG_CRASH_DUMP=y === --------------------------------------------------------------- loongarch64-linux-ld: kernel/kexec_core.o: in function `.L209': >> kexec_core.c:(.text+0x1660): undefined reference to `machine_kexec_cleanup' loongarch64-linux-ld: kernel/kexec_core.o: in function `.L287': >> kexec_core.c:(.text+0x1c5c): undefined reference to `machine_crash_shutdown' >> loongarch64-linux-ld: kexec_core.c:(.text+0x1c64): undefined reference to `machine_kexec' loongarch64-linux-ld: kernel/kexec_core.o: in function `.L2^B5': >> kexec_core.c:(.text+0x2090): undefined reference to `machine_shutdown' loongarch64-linux-ld: kexec_core.c:(.text+0x20a0): undefined reference to `machine_kexec' --------------------------------------------------------------- Here, change the dependency of machine_kexec.o relocate_kernel.o to CONFIG_KEXEC_CORE can fix above building error. Link: https://lkml.kernel.org/r/20231208073036.7884-1-bhe@redhat.com Link: https://lkml.kernel.org/r/20231208073036.7884-2-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202311300946.kHE9Iu71-lkp@intel.com/ Cc: Eric DeVolder <eric_devolder@yahoo.com> Cc: Ignat Korchagin <ignat@cloudflare.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-12mm/damon/core: make damon_start() waits until kdamond_fn() startsSeongJae Park
The cleanup tasks of kdamond threads including reset of corresponding DAMON context's ->kdamond field and decrease of global nr_running_ctxs counter is supposed to be executed by kdamond_fn(). However, commit 0f91d13366a4 ("mm/damon: simplify stop mechanism") made neither damon_start() nor damon_stop() ensure the corresponding kdamond has started the execution of kdamond_fn(). As a result, the cleanup can be skipped if damon_stop() is called fast enough after the previous damon_start(). Especially the skipped reset of ->kdamond could cause a use-after-free. Fix it by waiting for start of kdamond_fn() execution from damon_start(). Link: https://lkml.kernel.org/r/20231208175018.63880-1-sj@kernel.org Fixes: 0f91d13366a4 ("mm/damon: simplify stop mechanism") Signed-off-by: SeongJae Park <sj@kernel.org> Reported-by: Jakub Acs <acsjakub@amazon.de> Cc: Changbin Du <changbin.du@intel.com> Cc: Jakub Acs <acsjakub@amazon.de> Cc: <stable@vger.kernel.org> # 5.15.x Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-12selftests/mm: cow: print ksft header before printing anything elseDavid Hildenbrand
Doing a ksft_print_msg() before the ksft_print_header() seems to confuse the ksft framework in a strange way: running the test on the cmdline results in the expected output. But piping the output somewhere else, results in some odd output, whereby we repeatedly get the same info printed: # [INFO] detected THP size: 2048 KiB # [INFO] detected hugetlb page size: 2048 KiB # [INFO] detected hugetlb page size: 1048576 KiB # [INFO] huge zeropage is enabled TAP version 13 1..190 # [INFO] Anonymous memory tests in private mappings # [RUN] Basic COW after fork() ... with base page # [INFO] detected THP size: 2048 KiB # [INFO] detected hugetlb page size: 2048 KiB # [INFO] detected hugetlb page size: 1048576 KiB # [INFO] huge zeropage is enabled TAP version 13 1..190 # [INFO] Anonymous memory tests in private mappings # [RUN] Basic COW after fork() ... with base page ok 1 No leak from parent into child # [RUN] Basic COW after fork() ... with swapped out base page # [INFO] detected THP size: 2048 KiB # [INFO] detected hugetlb page size: 2048 KiB # [INFO] detected hugetlb page size: 1048576 KiB # [INFO] huge zeropage is enabled Doing the ksft_print_header() first seems to resolve that and gives us the output we expect: TAP version 13 # [INFO] detected THP size: 2048 KiB # [INFO] detected hugetlb page size: 2048 KiB # [INFO] detected hugetlb page size: 1048576 KiB # [INFO] huge zeropage is enabled 1..190 # [INFO] Anonymous memory tests in private mappings # [RUN] Basic COW after fork() ... with base page ok 1 No leak from parent into child # [RUN] Basic COW after fork() ... with swapped out base page ok 2 No leak from parent into child # [RUN] Basic COW after fork() ... with THP ok 3 No leak from parent into child # [RUN] Basic COW after fork() ... with swapped-out THP ok 4 No leak from parent into child # [RUN] Basic COW after fork() ... with PTE-mapped THP ok 5 No leak from parent into child Link: https://lkml.kernel.org/r/20231206103558.38040-1-david@redhat.com Fixes: f4b5fd6946e2 ("selftests/vm: anon_cow: THP tests") Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: Nico Pache <npache@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-12mm: fix VMA heap bounds checkingKefeng Wang
After converting selinux to VMA heap check helper, the gcl triggers an execheap SELinux denial, which is caused by a changed logic check. Previously selinux only checked that the VMA range was within the VMA heap range, and the implementation checks the intersection between the two ranges, but the corner case (vm_end=start_brk, brk=vm_start) isn't handled correctly. Since commit 11250fd12eb8 ("mm: factor out VMA stack and heap checks") was only a function extraction, it seems that the issue was introduced by commit 0db0c01b53a1 ("procfs: fix /proc/<pid>/maps heap check"). Let's fix above corner cases, meanwhile, correct the wrong indentation of the stack and heap check helpers. Fixes: 11250fd12eb8 ("mm: factor out VMA stack and heap checks") Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reported-by: Ondrej Mosnacek <omosnace@redhat.com> Closes: https://lore.kernel.org/selinux/CAFqZXNv0SVT0fkOK6neP9AXbj3nxJ61JAY4+zJzvxqJaeuhbFw@mail.gmail.com/ Tested-by: Ondrej Mosnacek <omosnace@redhat.com> Link: https://lkml.kernel.org/r/20231207152525.2607420-1-wangkefeng.wang@huawei.com Cc: David Hildenbrand <david@redhat.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-12riscv: fix VMALLOC_START definitionBaoquan He
When below config items are set, compiler complained: -------------------- CONFIG_CRASH_CORE=y CONFIG_KEXEC_CORE=y CONFIG_CRASH_DUMP=y ...... ----------------------- ------------------------------------------------------------------- arch/riscv/kernel/crash_core.c: In function 'arch_crash_save_vmcoreinfo': arch/riscv/kernel/crash_core.c:11:58: warning: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'int' [-Wformat=] 11 | vmcoreinfo_append_str("NUMBER(VMALLOC_START)=0x%lx\n", VMALLOC_START); | ~~^ | | | long unsigned int | %x ---------------------------------------------------------------------- This is because on riscv macro VMALLOC_START has different type when CONFIG_MMU is set or unset. arch/riscv/include/asm/pgtable.h: -------------------------------------------------- Changing it to _AC(0, UL) in case CONFIG_MMU=n can fix the warning. Link: https://lkml.kernel.org/r/ZW7OsX4zQRA3mO4+@MiWiFi-R3L-srv Signed-off-by: Baoquan He <bhe@redhat.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Cc: Eric DeVolder <eric_devolder@yahoo.com> Cc: Ignat Korchagin <ignat@cloudflare.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-12kexec: drop dependency on ARCH_SUPPORTS_KEXEC from CRASH_DUMPIgnat Korchagin
In commit f8ff23429c62 ("kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP") we tried to fix a config regression, where CONFIG_CRASH_DUMP required CONFIG_KEXEC. However, it was not enough at least for arm64 platforms. While further testing the patch with our arm64 config I noticed that CONFIG_CRASH_DUMP is unavailable in menuconfig. This is because CONFIG_CRASH_DUMP still depends on the new CONFIG_ARCH_SUPPORTS_KEXEC introduced in commit 91506f7e5d21 ("arm64/kexec: refactor for kernel/Kconfig.kexec") and on arm64 CONFIG_ARCH_SUPPORTS_KEXEC requires CONFIG_PM_SLEEP_SMP=y, which in turn requires either CONFIG_SUSPEND=y or CONFIG_HIBERNATION=y neither of which are set in our config. Given that we already established that CONFIG_KEXEC (which is a switch for kexec system call itself) is not required for CONFIG_CRASH_DUMP drop CONFIG_ARCH_SUPPORTS_KEXEC dependency as well. The arm64 kernel builds just fine with CONFIG_CRASH_DUMP=y and with both CONFIG_KEXEC=n and CONFIG_KEXEC_FILE=n after f8ff23429c62 ("kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP") and this patch are applied given that the necessary shared bits are included via CONFIG_KEXEC_CORE dependency. [bhe@redhat.com: don't export some symbols when CONFIG_MMU=n] Link: https://lkml.kernel.org/r/ZW03ODUKGGhP1ZGU@MiWiFi-R3L-srv [bhe@redhat.com: riscv, kexec: fix dependency of two items] Link: https://lkml.kernel.org/r/ZW04G/SKnhbE5mnX@MiWiFi-R3L-srv Link: https://lkml.kernel.org/r/20231129220409.55006-1-ignat@cloudflare.com Fixes: 91506f7e5d21 ("arm64/kexec: refactor for kernel/Kconfig.kexec") Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: <stable@vger.kernel.org> # 6.6+: f8ff234: kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP Cc: <stable@vger.kernel.org> # 6.6+ Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-12Merge tag 'hid-for-linus-2023121201' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Jiri Kosina: - Lenovo ThinkPad TrackPoint Keyboard II firmware-specific regression fix (Mikhail Khvainitski) - device-specific fixes (various authors) * tag 'hid-for-linus-2023121201' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: apple: Add "hfd.cn" and "WKB603" to the list of non-apple keyboards HID: lenovo: Restrict detection of patched firmware only to USB cptkbd HID: Add quirk for Labtec/ODDOR/aikeec handbrake HID: i2c-hid: Add IDEA5002 to i2c_hid_acpi_blacklist[] mailmap: add address mapping for Jiri Kosina
2023-12-12dpll: sanitize possible null pointer dereference in dpll_pin_parent_pin_set()Jiri Pirko
User may not pass DPLL_A_PIN_STATE attribute in the pin set operation message. Sanitize that by checking if the attr pointer is not null and process the passed state attribute value only in that case. Reported-by: Xingyuan Mo <hdthky0@gmail.com> Fixes: 9d71b54b65b1 ("dpll: netlink: Add DPLL framework base functions") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://lore.kernel.org/r/20231211083758.1082853-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12Merge branch 'ena-driver-xdp-bug-fixes'Jakub Kicinski
David Arinzon says: ==================== ENA driver XDP bug fixes This patchset contains multiple XDP-related bug fixes in the ENA driver. ==================== Link: https://lore.kernel.org/r/20231211062801.27891-1-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12net: ena: Fix XDP redirection errorDavid Arinzon
When sending TX packets, the meta descriptor can be all zeroes as no meta information is required (as in XDP). This patch removes the validity check, as when `disable_meta_caching` is enabled, such TX packets will be dropped otherwise. Fixes: 0e3a3f6dacf0 ("net: ena: support new LLQ acceleration mode") Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20231211062801.27891-5-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12net: ena: Fix DMA syncing in XDP path when SWIOTLB is onDavid Arinzon
This patch fixes two issues: Issue 1 ------- Description ``````````` Current code does not call dma_sync_single_for_cpu() to sync data from the device side memory to the CPU side memory before the XDP code path uses the CPU side data. This causes the XDP code path to read the unset garbage data in the CPU side memory, resulting in incorrect handling of the packet by XDP. Solution ```````` 1. Add a call to dma_sync_single_for_cpu() before the XDP code starts to use the data in the CPU side memory. 2. The XDP code verdict can be XDP_PASS, in which case there is a fallback to the non-XDP code, which also calls dma_sync_single_for_cpu(). To avoid calling dma_sync_single_for_cpu() twice: 2.1. Put the dma_sync_single_for_cpu() in the code in such a place where it happens before XDP and non-XDP code. 2.2. Remove the calls to dma_sync_single_for_cpu() in the non-XDP code for the first buffer only (rx_copybreak and non-rx_copybreak cases), since the new call that was added covers these cases. The call to dma_sync_single_for_cpu() for the second buffer and on stays because only the first buffer is handled by the newly added dma_sync_single_for_cpu(). And there is no need for special handling of the second buffer and on for the XDP path since currently the driver supports only single buffer packets. Issue 2 ------- Description ``````````` In case the XDP code forwarded the packet (ENA_XDP_FORWARDED), ena_unmap_rx_buff_attrs() is called with attrs set to 0. This means that before unmapping the buffer, the internal function dma_unmap_page_attrs() will also call dma_sync_single_for_cpu() on the whole buffer (not only on the data part of it). This sync is both wasteful (since a sync was already explicitly called before) and also causes a bug, which will be explained using the below diagram. The following diagram shows the flow of events causing the bug. The order of events is (1)-(4) as shown in the diagram. CPU side memory area (3)convert_to_xdp_frame() initializes the headroom with xdpf metadata || \/ ___________________________________ | | 0 | V 4K --------------------------------------------------------------------- | xdpf->data | other xdpf | < data > | tailroom ||...| | | fields | | GARBAGE || | --------------------------------------------------------------------- /\ /\ || || (4)ena_unmap_rx_buff_attrs() calls (2)dma_sync_single_for_cpu() dma_sync_single_for_cpu() on the copies data from device whole buffer page, overwriting side to CPU side memory the xdpf->data with GARBAGE. || 0 4K --------------------------------------------------------------------- | headroom | < data > | tailroom ||...| | GARBAGE | | GARBAGE || | --------------------------------------------------------------------- Device side memory area /\ || (1) device writes RX packet data After the call to ena_unmap_rx_buff_attrs() in (4), the xdpf->data becomes corrupted, and so when it is later accessed in ena_clean_xdp_irq()->xdp_return_frame(), it causes a page fault, crashing the kernel. Solution ```````` Explicitly tell ena_unmap_rx_buff_attrs() not to call dma_sync_single_for_cpu() by passing it the ENA_DMA_ATTR_SKIP_CPU_SYNC flag. Fixes: f7d625adeb7b ("net: ena: Add dynamic recycling mechanism for rx buffers") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20231211062801.27891-4-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12net: ena: Fix xdp drops handling due to multibuf packetsDavid Arinzon
Current xdp code drops packets larger than ENA_XDP_MAX_MTU. This is an incorrect condition since the problem is not the size of the packet, rather the number of buffers it contains. This commit: 1. Identifies and drops XDP multi-buffer packets at the beginning of the function. 2. Increases the xdp drop statistic when this drop occurs. 3. Adds a one-time print that such drops are happening to give better indication to the user. Fixes: 838c93dc5449 ("net: ena: implement XDP drop support") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20231211062801.27891-3-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12net: ena: Destroy correct number of xdp queues upon failureDavid Arinzon
The ena_setup_and_create_all_xdp_queues() function freed all the resources upon failure, after creating only xdp_num_queues queues, instead of freeing just the created ones. In this patch, the only resources that are freed, are the ones allocated right before the failure occurs. Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action") Signed-off-by: Shahar Itzko <itzko@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20231211062801.27891-2-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12tracing: Update snapshot buffer on resize if it is allocatedSteven Rostedt (Google)
The snapshot buffer is to mimic the main buffer so that when a snapshot is needed, the snapshot and main buffer are swapped. When the snapshot buffer is allocated, it is set to the minimal size that the ring buffer may be at and still functional. When it is allocated it becomes the same size as the main ring buffer, and when the main ring buffer changes in size, it should do. Currently, the resize only updates the snapshot buffer if it's used by the current tracer (ie. the preemptirqsoff tracer). But it needs to be updated anytime it is allocated. When changing the size of the main buffer, instead of looking to see if the current tracer is utilizing the snapshot buffer, just check if it is allocated to know if it should be updated or not. Also fix typo in comment just above the code change. Link: https://lore.kernel.org/linux-trace-kernel/20231210225447.48476a6a@rorschach.local.home Cc: stable@vger.kernel.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Fixes: ad909e21bbe69 ("tracing: Add internal tracing_snapshot() functions") Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-12ring-buffer: Fix memory leak of free pageSteven Rostedt (Google)
Reading the ring buffer does a swap of a sub-buffer within the ring buffer with a empty sub-buffer. This allows the reader to have full access to the content of the sub-buffer that was swapped out without having to worry about contention with the writer. The readers call ring_buffer_alloc_read_page() to allocate a page that will be used to swap with the ring buffer. When the code is finished with the reader page, it calls ring_buffer_free_read_page(). Instead of freeing the page, it stores it as a spare. Then next call to ring_buffer_alloc_read_page() will return this spare instead of calling into the memory management system to allocate a new page. Unfortunately, on freeing of the ring buffer, this spare page is not freed, and causes a memory leak. Link: https://lore.kernel.org/linux-trace-kernel/20231210221250.7b9cc83c@rorschach.local.home Cc: stable@vger.kernel.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Fixes: 73a757e63114d ("ring-buffer: Return reader page back into existing ring buffer") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-12eventfs: Fix events beyond NAME_MAX blocking tasksBeau Belgrave
Eventfs uses simple_lookup(), however, it will fail if the name of the entry is beyond NAME_MAX length. When this error is encountered, eventfs still tries to create dentries instead of skipping the dentry creation. When the dentry is attempted to be created in this state d_wait_lookup() will loop forever, waiting for the lookup to be removed. Fix eventfs to return the error in simple_lookup() back to the caller instead of continuing to try to create the dentry. Link: https://lore.kernel.org/linux-trace-kernel/20231210213534.497-1-beaub@linux.microsoft.com Fixes: 63940449555e ("eventfs: Implement eventfs lookup, read, open functions") Link: https://lore.kernel.org/linux-trace-kernel/20231208183601.GA46-beaub@linux.microsoft.com/ Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-12tracing: Have large events show up as '[LINE TOO BIG]' instead of nothingSteven Rostedt (Google)
If a large event was added to the ring buffer that is larger than what the trace_seq can handle, it just drops the output: ~# cat /sys/kernel/tracing/trace # tracer: nop # # entries-in-buffer/entries-written: 2/2 #P:8 # # _-----=> irqs-off/BH-disabled # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> migrate-disable # |||| / delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | <...>-859 [001] ..... 141.118951: tracing_mark_write <...>-859 [001] ..... 141.148201: tracing_mark_write: 78901234 Instead, catch this case and add some context: ~# cat /sys/kernel/tracing/trace # tracer: nop # # entries-in-buffer/entries-written: 2/2 #P:8 # # _-----=> irqs-off/BH-disabled # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> migrate-disable # |||| / delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | <...>-852 [001] ..... 121.550551: tracing_mark_write[LINE TOO BIG] <...>-852 [001] ..... 121.550581: tracing_mark_write: 78901234 This now emulates the same output as trace_pipe. Link: https://lore.kernel.org/linux-trace-kernel/20231209171058.78c1a026@gandalf.local.home Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-12ring-buffer: Fix writing to the buffer with max_data_sizeSteven Rostedt (Google)
The maximum ring buffer data size is the maximum size of data that can be recorded on the ring buffer. Events must be smaller than the sub buffer data size minus any meta data. This size is checked before trying to allocate from the ring buffer because the allocation assumes that the size will fit on the sub buffer. The maximum size was calculated as the size of a sub buffer page (which is currently PAGE_SIZE minus the sub buffer header) minus the size of the meta data of an individual event. But it missed the possible adding of a time stamp for events that are added long enough apart that the event meta data can't hold the time delta. When an event is added that is greater than the current BUF_MAX_DATA_SIZE minus the size of a time stamp, but still less than or equal to BUF_MAX_DATA_SIZE, the ring buffer would go into an infinite loop, looking for a page that can hold the event. Luckily, there's a check for this loop and after 1000 iterations and a warning is emitted and the ring buffer is disabled. But this should never happen. This can happen when a large event is added first, or after a long period where an absolute timestamp is prefixed to the event, increasing its size by 8 bytes. This passes the check and then goes into the algorithm that causes the infinite loop. For events that are the first event on the sub-buffer, it does not need to add a timestamp, because the sub-buffer itself contains an absolute timestamp, and adding one is redundant. The fix is to check if the event is to be the first event on the sub-buffer, and if it is, then do not add a timestamp. This also fixes 32 bit adding a timestamp when a read of before_stamp or write_stamp is interrupted. There's still no need to add that timestamp if the event is going to be the first event on the sub buffer. Also, if the buffer has "time_stamp_abs" set, then also check if the length plus the timestamp is greater than the BUF_MAX_DATA_SIZE. Link: https://lore.kernel.org/all/20231212104549.58863438@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20231212071837.5fdd6c13@gandalf.local.home Link: https://lore.kernel.org/linux-trace-kernel/20231212111617.39e02849@gandalf.local.home Cc: stable@vger.kernel.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Fixes: a4543a2fa9ef3 ("ring-buffer: Get timestamp after event is allocated") Fixes: 58fbc3c63275c ("ring-buffer: Consolidate add_timestamp to remove some branches") Reported-by: Kent Overstreet <kent.overstreet@linux.dev> # (on IRC) Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-12net: Remove acked SYN flag from packet in the transmit queue correctlyDong Chenchen
syzkaller report: kernel BUG at net/core/skbuff.c:3452! invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.7.0-rc4-00009-gbee0e7762ad2-dirty #135 RIP: 0010:skb_copy_and_csum_bits (net/core/skbuff.c:3452) Call Trace: icmp_glue_bits (net/ipv4/icmp.c:357) __ip_append_data.isra.0 (net/ipv4/ip_output.c:1165) ip_append_data (net/ipv4/ip_output.c:1362 net/ipv4/ip_output.c:1341) icmp_push_reply (net/ipv4/icmp.c:370) __icmp_send (./include/net/route.h:252 net/ipv4/icmp.c:772) ip_fragment.constprop.0 (./include/linux/skbuff.h:1234 net/ipv4/ip_output.c:592 net/ipv4/ip_output.c:577) __ip_finish_output (net/ipv4/ip_output.c:311 net/ipv4/ip_output.c:295) ip_output (net/ipv4/ip_output.c:427) __ip_queue_xmit (net/ipv4/ip_output.c:535) __tcp_transmit_skb (net/ipv4/tcp_output.c:1462) __tcp_retransmit_skb (net/ipv4/tcp_output.c:3387) tcp_retransmit_skb (net/ipv4/tcp_output.c:3404) tcp_retransmit_timer (net/ipv4/tcp_timer.c:604) tcp_write_timer (./include/linux/spinlock.h:391 net/ipv4/tcp_timer.c:716) The panic issue was trigered by tcp simultaneous initiation. The initiation process is as follows: TCP A TCP B 1. CLOSED CLOSED 2. SYN-SENT --> <SEQ=100><CTL=SYN> ... 3. SYN-RECEIVED <-- <SEQ=300><CTL=SYN> <-- SYN-SENT 4. ... <SEQ=100><CTL=SYN> --> SYN-RECEIVED 5. SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ... // TCP B: not send challenge ack for ack limit or packet loss // TCP A: close tcp_close tcp_send_fin if (!tskb && tcp_under_memory_pressure(sk)) tskb = skb_rb_last(&sk->tcp_rtx_queue); //pick SYN_ACK packet TCP_SKB_CB(tskb)->tcp_flags |= TCPHDR_FIN; // set FIN flag 6. FIN_WAIT_1 --> <SEQ=100><ACK=301><END_SEQ=102><CTL=SYN,FIN,ACK> ... // TCP B: send challenge ack to SYN_FIN_ACK 7. ... <SEQ=301><ACK=101><CTL=ACK> <-- SYN-RECEIVED //challenge ack // TCP A: <SND.UNA=101> 8. FIN_WAIT_1 --> <SEQ=101><ACK=301><END_SEQ=102><CTL=SYN,FIN,ACK> ... // retransmit panic __tcp_retransmit_skb //skb->len=0 tcp_trim_head len = tp->snd_una - TCP_SKB_CB(skb)->seq // len=101-100 __pskb_trim_head skb->data_len -= len // skb->len=-1, wrap around ... ... ip_fragment icmp_glue_bits //BUG_ON If we use tcp_trim_head() to remove acked SYN from packet that contains data or other flags, skb->len will be incorrectly decremented. We can remove SYN flag that has been acked from rtx_queue earlier than tcp_trim_head(), which can fix the problem mentioned above. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Co-developed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Link: https://lore.kernel.org/r/20231210020200.1539875-1-dongchenchen2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-12qed: Fix a potential use-after-free in qed_cxt_tables_allocDinghao Liu
qed_ilt_shadow_alloc() will call qed_ilt_shadow_free() to free p_hwfn->p_cxt_mngr->ilt_shadow on error. However, qed_cxt_tables_alloc() accesses the freed pointer on failure of qed_ilt_shadow_alloc() through calling qed_cxt_mngr_free(), which may lead to use-after-free. Fix this issue by setting p_mngr->ilt_shadow to NULL in qed_ilt_shadow_free(). Fixes: fe56b9e6a8d9 ("qed: Add module with basic common support") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Link: https://lore.kernel.org/r/20231210045255.21383-1-dinghao.liu@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13net: 9p: avoid freeing uninit memory in p9pdu_vreadfFedor Pchelkin
If some of p9pdu_readf() calls inside case 'T' in p9pdu_vreadf() fails, the error path is not handled properly. *wnames or members of *wnames array may be left uninitialized and invalidly freed. Initialize *wnames to NULL in beginning of case 'T'. Initialize the first *wnames array element to NULL and nullify the failing *wnames element so that the error path freeing loop stops on the first NULL element and doesn't proceed further. Found by Linux Verification Center (linuxtesting.org). Fixes: ace51c4dd2f9 ("9p: add new protocol support code") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Message-ID: <20231206200913.16135-1-pchelkin@ispras.ru> Cc: stable@vger.kernel.org Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2023-12-12Merge tag 'ext4_for_linus-6.7-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Fix various bugs / regressions for ext4, including a soft lockup, a WARN_ON, and a BUG" * tag 'ext4_for_linus-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd2: fix soft lockup in journal_finish_inode_data_buffers() ext4: fix warning in ext4_dio_write_end_io() jbd2: increase the journal IO's priority jbd2: correct the printing of write_flags in jbd2_write_superblock() ext4: prevent the normalized size from exceeding EXT_MAX_BLOCKS
2023-12-12iavf: Fix iavf_shutdown to call iavf_remove instead iavf_closeSlawomir Laba
Make the flow for pci shutdown be the same to the pci remove. iavf_shutdown was implementing an incomplete version of iavf_remove. It misses several calls to the kernel like iavf_free_misc_irq, iavf_reset_interrupt_capability, iounmap that might break the system on reboot or hibernation. Implement the call of iavf_remove directly in iavf_shutdown to close this gap. Fixes below error messages (dmesg) during shutdown stress tests - [685814.900917] ice 0000:88:00.0: MAC 02:d0:5f:82:43:5d does not exist for VF 0 [685814.900928] ice 0000:88:00.0: MAC 33:33:00:00:00:01 does not exist for VF 0 Reproduction: 1. Create one VF interface: echo 1 > /sys/class/net/<interface_name>/device/sriov_numvfs 2. Run live dmesg on the host: dmesg -wH 3. On SUT, script below steps into vf_namespace_assignment.sh <#!/bin/sh> // Remove <>. Git removes # line if=<VF name> (edit this per VF name) loop=0 while true; do echo test round $loop let loop++ ip netns add ns$loop ip link set dev $if up ip link set dev $if netns ns$loop ip netns exec ns$loop ip link set dev $if up ip netns exec ns$loop ip link set dev $if netns 1 ip netns delete ns$loop done 4. Run the script for at least 1000 iterations on SUT: ./vf_namespace_assignment.sh Expected result: No errors in dmesg. Fixes: 129cf89e5856 ("iavf: rename functions and structs to new name") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Co-developed-by: Ranganatha Rao <ranganatha.rao@intel.com> Signed-off-by: Ranganatha Rao <ranganatha.rao@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-12iavf: Handle ntuple on/off based on new state machines for flow directorPiotr Gardocki
ntuple-filter feature on/off: Default is on. If turned off, the filters will be removed from both PF and iavf list. The removal is irrespective of current filter state. Steps to reproduce: ------------------- 1. Ensure ntuple is on. ethtool -K enp8s0 ntuple-filters on 2. Create a filter to receive the traffic into non-default rx-queue like 15 and ensure traffic is flowing into queue into 15. Now, turn off ntuple. Traffic should not flow to configured queue 15. It should flow to default RX queue. Fixes: 0dbfbabb840d ("iavf: Add framework to enable ethtool ntuple filters") Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Ranganatha Rao <ranganatha.rao@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-12iavf: Introduce new state machines for flow directorPiotr Gardocki
New states introduced: IAVF_FDIR_FLTR_DIS_REQUEST IAVF_FDIR_FLTR_DIS_PENDING IAVF_FDIR_FLTR_INACTIVE Current FDIR state machines (SM) are not adequate to handle a few scenarios in the link DOWN/UP event, reset event and ntuple-feature. For example, when VF link goes DOWN and comes back UP administratively, the expectation is that previously installed filters should also be restored. But with current SM, filters are not restored. So with new SM, during link DOWN filters are marked as INACTIVE in the iavf list but removed from PF. After link UP, SM will transition from INACTIVE to ADD_REQUEST to restore the filter. Similarly, with VF reset, filters will be removed from the PF, but marked as INACTIVE in the iavf list. Filters will be restored after reset completion. Steps to reproduce: ------------------- 1. Create a VF. Here VF is enp8s0. 2. Assign IP addresses to VF and link partner and ping continuously from remote. Here remote IP is 1.1.1.1. 3. Check default RX Queue of traffic. ethtool -S enp8s0 | grep -E "rx-[[:digit:]]+\.packets" 4. Add filter - change default RX Queue (to 15 here) ethtool -U ens8s0 flow-type ip4 src-ip 1.1.1.1 action 15 loc 5 5. Ensure filter gets added and traffic is received on RX queue 15 now. Link event testing: ------------------- 6. Bring VF link down and up. If traffic flows to configured queue 15, test is success, otherwise it is a failure. Reset event testing: -------------------- 7. Reset the VF. If traffic flows to configured queue 15, test is success, otherwise it is a failure. Fixes: 0dbfbabb840d ("iavf: Add framework to enable ethtool ntuple filters") Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Ranganatha Rao <ranganatha.rao@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-12Merge tag 'fuse-fixes-6.7-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: - Fix a couple of potential crashes, one introduced in 6.6 and one in 5.10 - Fix misbehavior of virtiofs submounts on memory pressure - Clarify naming in the uAPI for a recent feature * tag 'fuse-fixes-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: disable FOPEN_PARALLEL_DIRECT_WRITES with FUSE_DIRECT_IO_ALLOW_MMAP fuse: dax: set fc->dax to NULL in fuse_dax_conn_free() fuse: share lookup state between submount and its parent docs/fuse-io: Document the usage of DIRECT_IO_ALLOW_MMAP fuse: Rename DIRECT_IO_RELAX to DIRECT_IO_ALLOW_MMAP
2023-12-12Merge tag '6.7-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: - Memory leak fix (in lock error path) - Two fixes for create with allocation size - FIx for potential UAF in lease break error path - Five directory lease (caching) fixes found during additional recent testing * tag '6.7-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: fix wrong name of SMB2_CREATE_ALLOCATION_SIZE ksmbd: fix wrong allocation size update in smb2_open() ksmbd: avoid duplicate opinfo_put() call on error of smb21_lease_break_ack() ksmbd: lazy v2 lease break on smb2_write() ksmbd: send v2 lease break notification for directory ksmbd: downgrade RWH lease caching state to RH for directory ksmbd: set v2 lease capability ksmbd: set epoch in create context v2 lease ksmbd: fix memory leak in smb2_lock()
2023-12-12drm/amd/display: Disable PSR-SU on Parade 0803 TCON againMario Limonciello
When screen brightness is rapidly changed and PSR-SU is enabled the display hangs on panels with this TCON even on the latest DCN 3.1.4 microcode (0x8002a81 at this time). This was disabled previously as commit 072030b17830 ("drm/amd: Disable PSR-SU on Parade 0803 TCON") but reverted as commit 1e66a17ce546 ("Revert "drm/amd: Disable PSR-SU on Parade 0803 TCON"") in favor of testing for a new enough microcode (commit cd2e31a9ab93 ("drm/amd/display: Set minimum requirement for using PSR-SU on Phoenix")). As hangs are still happening specifically with this TCON, disable PSR-SU again for it until it can be root caused. Cc: stable@vger.kernel.org Cc: aaron.ma@canonical.com Cc: binli@gnome.org Cc: Marc Rossi <Marc.Rossi@amd.com> Cc: Hamza Mahfooz <Hamza.Mahfooz@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2046131 Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-12drm/amd/display: Populate dtbclk from bounding boxFangzhi Zuo
dtbclk is unavaliable from pmfw. Try to grab the value from bounding box Reviewed-by: Charlene Liu <charlene.liu@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-12drm/amd/display: Revert "Fix conversions between bytes and KB"Taimur Hassan
[Why & How] HostVMMinPageSize is expected to be in KB according to spec, the checks later down the line reflect this as well. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Taimur Hassan <syed.hassan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-12drm/amdgpu/jpeg: configure doorbell for each playbackSaleemkhan Jamadar
Doorbell is configured during start of each playback. v1 - add comment for the doorbell programming change Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-12KVM: selftests: Ensure sysreg-defs.h is generated at the expected pathOliver Upton
Building the KVM selftests from the main selftests Makefile (as opposed to the kvm subdirectory) doesn't work as OUTPUT is set, forcing the generated header to spill into the selftests directory. Additionally, relative paths do not work when building outside of the srctree, as the canonical selftests path is replaced with 'kselftest' in the output. Work around both of these issues by explicitly overriding OUTPUT on the submake cmdline. Move the whole fragment below the point lib.mk gets included such that $(abs_objdir) is available. Reviewed-by: Cornelia Huck <cohuck@redhat.com> Tested-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20231212070431.145544-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-12-12arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modifyJames Houghton
It is currently possible for a userspace application to enter an infinite page fault loop when using HugeTLB pages implemented with contiguous PTEs when HAFDBS is not available. This happens because: 1. The kernel may sometimes write PTEs that are sw-dirty but hw-clean (PTE_DIRTY | PTE_RDONLY | PTE_WRITE). 2. If, during a write, the CPU uses a sw-dirty, hw-clean PTE in handling the memory access on a system without HAFDBS, we will get a page fault. 3. HugeTLB will check if it needs to update the dirty bits on the PTE. For contiguous PTEs, it will check to see if the pgprot bits need updating. In this case, HugeTLB wants to write a sequence of sw-dirty, hw-dirty PTEs, but it finds that all the PTEs it is about to overwrite are all pte_dirty() (pte_sw_dirty() => pte_dirty()), so it thinks no update is necessary. We can get the kernel to write a sw-dirty, hw-clean PTE with the following steps (showing the relevant VMA flags and pgprot bits): i. Create a valid, writable contiguous PTE. VMA vmflags: VM_SHARED | VM_READ | VM_WRITE VMA pgprot bits: PTE_RDONLY | PTE_WRITE PTE pgprot bits: PTE_DIRTY | PTE_WRITE ii. mprotect the VMA to PROT_NONE. VMA vmflags: VM_SHARED VMA pgprot bits: PTE_RDONLY PTE pgprot bits: PTE_DIRTY | PTE_RDONLY iii. mprotect the VMA back to PROT_READ | PROT_WRITE. VMA vmflags: VM_SHARED | VM_READ | VM_WRITE VMA pgprot bits: PTE_RDONLY | PTE_WRITE PTE pgprot bits: PTE_DIRTY | PTE_WRITE | PTE_RDONLY Make it impossible to create a writeable sw-dirty, hw-clean PTE with pte_modify(). Such a PTE should be impossible to create, and there may be places that assume that pte_dirty() implies pte_hw_dirty(). Signed-off-by: James Houghton <jthoughton@google.com> Fixes: 031e6e6b4e12 ("arm64: hugetlb: Avoid unnecessary clearing in huge_ptep_set_access_flags") Cc: <stable@vger.kernel.org> Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20231204172646.2541916-3-jthoughton@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-12-12jbd2: fix soft lockup in journal_finish_inode_data_buffers()Ye Bin
There's issue when do io test: WARN: soft lockup - CPU#45 stuck for 11s! [jbd2/dm-2-8:4170] CPU: 45 PID: 4170 Comm: jbd2/dm-2-8 Kdump: loaded Tainted: G OE Call trace: dump_backtrace+0x0/0x1a0 show_stack+0x24/0x30 dump_stack+0xb0/0x100 watchdog_timer_fn+0x254/0x3f8 __hrtimer_run_queues+0x11c/0x380 hrtimer_interrupt+0xfc/0x2f8 arch_timer_handler_phys+0x38/0x58 handle_percpu_devid_irq+0x90/0x248 generic_handle_irq+0x3c/0x58 __handle_domain_irq+0x68/0xc0 gic_handle_irq+0x90/0x320 el1_irq+0xcc/0x180 queued_spin_lock_slowpath+0x1d8/0x320 jbd2_journal_commit_transaction+0x10f4/0x1c78 [jbd2] kjournald2+0xec/0x2f0 [jbd2] kthread+0x134/0x138 ret_from_fork+0x10/0x18 Analyzed informations from vmcore as follows: (1) There are about 5k+ jbd2_inode in 'commit_transaction->t_inode_list'; (2) Now is processing the 855th jbd2_inode; (3) JBD2 task has TIF_NEED_RESCHED flag; (4) There's no pags in address_space around the 855th jbd2_inode; (5) There are some process is doing drop caches; (6) Mounted with 'nodioread_nolock' option; (7) 128 CPUs; According to informations from vmcore we know 'journal->j_list_lock' spin lock competition is fierce. So journal_finish_inode_data_buffers() maybe process slowly. Theoretically, there is scheduling point in the filemap_fdatawait_range_keep_errors(). However, if inode's address_space has no pages which taged with PAGECACHE_TAG_WRITEBACK, will not call cond_resched(). So may lead to soft lockup. journal_finish_inode_data_buffers filemap_fdatawait_range_keep_errors __filemap_fdatawait_range while (index <= end) nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end, PAGECACHE_TAG_WRITEBACK); if (!nr_pages) break; --> If 'nr_pages' is equal zero will break, then will not call cond_resched() for (i = 0; i < nr_pages; i++) wait_on_page_writeback(page); cond_resched(); To solve above issue, add scheduling point in the journal_finish_inode_data_buffers(); Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231211112544.3879780-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-12-12wifi: mt76: fix crash with WED rx support enabledFelix Fietkau
If WED rx is enabled, rx buffers are added to a buffer pool that can be filled from multiple page pools. Because buffers freed from rx poll are not guaranteed to belong to the processed queue's page pool, lockless caching must not be used in this case. Cc: stable@vger.kernel.org Fixes: 2f5c3c77fc9b ("wifi: mt76: switch to page_pool allocator") Signed-off-by: Felix Fietkau <nbd@nbd.name> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20231208075004.69843-1-nbd@nbd.name
2023-12-12HID: apple: Add "hfd.cn" and "WKB603" to the list of non-apple keyboardsYan Jun
JingZao(京造) WKB603 keyboard is a rebranded product of Jamesdonkey RS2 keyboard, identified as "hfd.cn WKB603" in wired mode, "WKB603" in bluetooth mode. Adding them to the list of non-apple keyboards fixes function key. Signed-off-by: Yan Jun <jerrysteve1101@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
2023-12-12HID: lenovo: Restrict detection of patched firmware only to USB cptkbdMikhail Khvainitski
Commit 46a0a2c96f0f ("HID: lenovo: Detect quirk-free fw on cptkbd and stop applying workaround") introduced a regression for ThinkPad TrackPoint Keyboard II which has similar quirks to cptkbd (so it uses the same workarounds) but slightly different so that there are false-positives during detecting well-behaving firmware. This commit restricts detecting well-behaving firmware to the only model which known to have one and have stable enough quirks to not cause false-positives. Fixes: 46a0a2c96f0f ("HID: lenovo: Detect quirk-free fw on cptkbd and stop applying workaround") Link: https://lore.kernel.org/linux-input/ZXRiiPsBKNasioqH@jekhomev/ Link: https://bbs.archlinux.org/viewtopic.php?pid=2135468#p2135468 Signed-off-by: Mikhail Khvainitski <me@khvoinitsky.org> Tested-by: Yauhen Kharuzhy <jekhor@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
2023-12-12net/rose: Fix Use-After-Free in rose_ioctlHyunwoo Kim
Because rose_ioctl() accesses sk->sk_receive_queue without holding a sk->sk_receive_queue.lock, it can cause a race with rose_accept(). A use-after-free for skb occurs with the following flow. ``` rose_ioctl() -> skb_peek() rose_accept() -> skb_dequeue() -> kfree_skb() ``` Add sk->sk_receive_queue.lock to rose_ioctl() to fix this issue. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Hyunwoo Kim <v4bel@theori.io> Link: https://lore.kernel.org/r/20231209100538.GA407321@v4bel-B760M-AORUS-ELITE-AX Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-12atm: Fix Use-After-Free in do_vcc_ioctlHyunwoo Kim
Because do_vcc_ioctl() accesses sk->sk_receive_queue without holding a sk->sk_receive_queue.lock, it can cause a race with vcc_recvmsg(). A use-after-free for skb occurs with the following flow. ``` do_vcc_ioctl() -> skb_peek() vcc_recvmsg() -> skb_recv_datagram() -> skb_free_datagram() ``` Add sk->sk_receive_queue.lock to do_vcc_ioctl() to fix this issue. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Hyunwoo Kim <v4bel@theori.io> Link: https://lore.kernel.org/r/20231209094210.GA403126@v4bel-B760M-AORUS-ELITE-AX Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-12perf/arm-cmn: Fail DTC counter allocation correctlyRobin Murphy
Calling arm_cmn_event_clear() before all DTC indices are allocated is wrong, and can lead to arm_cmn_event_add() erroneously clearing live counters from full DTCs where allocation fails. Since the DTC counters are only updated by arm_cmn_init_counter() after all DTC and DTM allocations succeed, nothing actually needs cleaning up in this case anyway, and it should just return directly as it did before. Fixes: 7633ec2c262f ("perf/arm-cmn: Rework DTC counters (again)") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/ed589c0d8e4130dc68b8ad1625226d28bdc185d4.1702322847.git.robin.murphy@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-12-12wifi: iwlwifi: pcie: avoid a NULL pointer dereferenceAvraham Stern
It possible that while the rx rb is being handled, the transport has been stopped and re-started. In this case the tx queue pointer is not yet initialized, which will lead to a NULL pointer dereference. Fix it. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20231207044813.cd0898cafd89.I0b84daae753ba9612092bf383f5c6f761446e964@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-12-12wifi: mac80211: mesh_plink: fix matches_local logicJohannes Berg
During refactoring the "else" here got lost, add it back. Fixes: c99a89edb106 ("mac80211: factor out plink event gathering") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20231211085121.795480fa0e0b.I017d501196a5bbdcd9afd33338d342d6fe1edd79@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-12-12wifi: mac80211: mesh: check element parsing succeededJohannes Berg
ieee802_11_parse_elems() can return NULL, so we must check for the return value. Fixes: 5d24828d05f3 ("mac80211: always allocate struct ieee802_11_elems") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20231211085121.93dea364f3d3.Ie87781c6c48979fb25a744b90af4a33dc2d83a28@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-12-12wifi: mac80211: check defragmentation succeededJohannes Berg
We need to check that cfg80211_defragment_element() didn't return an error, since it can fail due to bad input, and we didn't catch that before. Fixes: 8eb8dd2ffbbb ("wifi: mac80211: Support link removal using Reconfiguration ML element") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20231211085121.8595a6b67fc0.I1225edd8f98355e007f96502e358e476c7971d8c@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-12-12wifi: mac80211: don't re-add debugfs during reconfigJohannes Berg
If we're doing reconfig, then we cannot add the debugfs files that are already there from before the reconfig. Skip that in drv_change_sta_links() during reconfig. Fixes: d2caad527c19 ("wifi: mac80211: add API to show the link STAs in debugfs") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Gregory Greenman <gregory.greenman@intel.com> Reviewed-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20231211085121.88a950f43e16.Id71181780994649219685887c0fcad33d387cc78@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-12-12net: rfkill: gpio: set GPIO directionRouven Czerwinski
Fix the undefined usage of the GPIO consumer API after retrieving the GPIO description with GPIO_ASIS. The API documentation mentions that GPIO_ASIS won't set a GPIO direction and requires the user to set a direction before using the GPIO. This can be confirmed on i.MX6 hardware, where rfkill-gpio is no longer able to enabled/disable a device, presumably because the GPIO controller was never configured for the output direction. Fixes: b2f750c3a80b ("net: rfkill: gpio: prevent value glitch during probe") Cc: stable@vger.kernel.org Signed-off-by: Rouven Czerwinski <r.czerwinski@pengutronix.de> Link: https://msgid.link/20231207075835.3091694-1-r.czerwinski@pengutronix.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-12-12wifi: mac80211: check if the existing link config remains unchangedEdward Adam Davis
[Syz report] WARNING: CPU: 1 PID: 5067 at net/mac80211/rate.c:48 rate_control_rate_init+0x540/0x690 net/mac80211/rate.c:48 Modules linked in: CPU: 1 PID: 5067 Comm: syz-executor413 Not tainted 6.7.0-rc3-syzkaller-00014-gdf60cee26a2e #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023 RIP: 0010:rate_control_rate_init+0x540/0x690 net/mac80211/rate.c:48 Code: 48 c7 c2 00 46 0c 8c be 08 03 00 00 48 c7 c7 c0 45 0c 8c c6 05 70 79 0b 05 01 e8 1b a0 6f f7 e9 e0 fd ff ff e8 61 b3 8f f7 90 <0f> 0b 90 e9 36 ff ff ff e8 53 b3 8f f7 e8 5e 0b 78 f7 31 ff 89 c3 RSP: 0018:ffffc90003c57248 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff888016bc4000 RCX: ffffffff89f7d519 RDX: ffff888076d43b80 RSI: ffffffff89f7d6df RDI: 0000000000000005 RBP: ffff88801daaae20 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000001 R13: 0000000000000000 R14: ffff888020030e20 R15: ffff888078f08000 FS: 0000555556b94380(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000005fdeb8 CR3: 0000000076d22000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> sta_apply_auth_flags.constprop.0+0x4b7/0x510 net/mac80211/cfg.c:1674 sta_apply_parameters+0xaf1/0x16c0 net/mac80211/cfg.c:2002 ieee80211_add_station+0x3fa/0x6c0 net/mac80211/cfg.c:2068 rdev_add_station net/wireless/rdev-ops.h:201 [inline] nl80211_new_station+0x13ba/0x1a70 net/wireless/nl80211.c:7603 genl_family_rcv_msg_doit+0x1fc/0x2e0 net/netlink/genetlink.c:972 genl_family_rcv_msg net/netlink/genetlink.c:1052 [inline] genl_rcv_msg+0x561/0x800 net/netlink/genetlink.c:1067 netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2545 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076 netlink_unicast_kernel net/netlink/af_netlink.c:1342 [inline] netlink_unicast+0x53b/0x810 net/netlink/af_netlink.c:1368 netlink_sendmsg+0x93c/0xe40 net/netlink/af_netlink.c:1910 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0xd5/0x180 net/socket.c:745 ____sys_sendmsg+0x6ac/0x940 net/socket.c:2584 ___sys_sendmsg+0x135/0x1d0 net/socket.c:2638 __sys_sendmsg+0x117/0x1e0 net/socket.c:2667 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b [Analysis] It is inappropriate to make a link configuration change judgment on an non-existent and non new link. [Fix] Quickly exit when there is a existent link and the link configuration has not changed. Fixes: b303835dabe0 ("wifi: mac80211: accept STA changes without link changes") Reported-and-tested-by: syzbot+62d7eef57b09bfebcd84@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Link: https://msgid.link/tencent_DE67FF86DB92ED465489A36ECD2EDDCC8C06@qq.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-12-12wifi: cfg80211: Add my certificateChen-Yu Tsai
As announced [1][2], I have taken over maintainership of the wireless-regdb project. Add my certificate so that newer releases are valid to the kernel. Seth's certificate should be kept around for awhile, at least until a few new releases by me happen. This should also be applied to stable trees so that stable kernels can utilize newly released database binaries. [1] https://lore.kernel.org/linux-wireless/CAGb2v657baNMPKU3QADijx7hZa=GUcSv2LEDdn6N=QQaFX8r-g@mail.gmail.com/ [2] https://lore.kernel.org/linux-wireless/ZWmRR5ul7EDfxCan@wens.tw/ Cc: stable@vger.kernel.org Signed-off-by: Chen-Yu Tsai <wens@kernel.org> Acked-by: Seth Forshee <sforshee@kernel.org> Link: https://msgid.link/ZXHGsqs34qZyzZng@wens.tw Signed-off-by: Johannes Berg <johannes.berg@intel.com>