summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-03-04mm, mmap: fix vma_merge() case 7 with vma_ops->closeVlastimil Babka
When debugging issues with a workload using SysV shmem, Michal Hocko has come up with a reproducer that shows how a series of mprotect() operations can result in an elevated shm_nattch and thus leak of the resource. The problem is caused by wrong assumptions in vma_merge() commit 714965ca8252 ("mm/mmap: start distinguishing if vma can be removed in mergeability test"). The shmem vmas have a vma_ops->close callback that decrements shm_nattch, and we remove the vma without calling it. vma_merge() has thus historically avoided merging vma's with vma_ops->close and commit 714965ca8252 was supposed to keep it that way. It relaxed the checks for vma_ops->close in can_vma_merge_after() assuming that it is never called on a vma that would be a candidate for removal. However, the vma_merge() code does also use the result of this check in the decision to remove a different vma in the merge case 7. A robust solution would be to refactor vma_merge() code in a way that the vma_ops->close check is only done for vma's that are actually going to be removed, and not as part of the preliminary checks. That would both solve the existing bug, and also allow additional merges that the checks currently prevent unnecessarily in some cases. However to fix the existing bug first with a minimized risk, and for easier stable backports, this patch only adds a vma_ops->close check to the buggy case 7 specifically. All other cases of vma removal are covered by the can_vma_merge_before() check that includes the test for vma_ops->close. The reproducer code, adapted from Michal Hocko's code: int main(int argc, char *argv[]) { int segment_id; size_t segment_size = 20 * PAGE_SIZE; char * sh_mem; struct shmid_ds shmid_ds; key_t key = 0x1234; segment_id = shmget(key, segment_size, IPC_CREAT | IPC_EXCL | S_IRUSR | S_IWUSR); sh_mem = (char *)shmat(segment_id, NULL, 0); mprotect(sh_mem + 2*PAGE_SIZE, PAGE_SIZE, PROT_NONE); mprotect(sh_mem + PAGE_SIZE, PAGE_SIZE, PROT_WRITE); mprotect(sh_mem + 2*PAGE_SIZE, PAGE_SIZE, PROT_WRITE); shmdt(sh_mem); shmctl(segment_id, IPC_STAT, &shmid_ds); printf("nattch after shmdt(): %lu (expected: 0)\n", shmid_ds.shm_nattch); if (shmctl(segment_id, IPC_RMID, 0)) printf("IPCRM failed %d\n", errno); return (shmid_ds.shm_nattch) ? 1 : 0; } Link: https://lkml.kernel.org/r/20240222215930.14637-2-vbabka@suse.cz Fixes: 714965ca8252 ("mm/mmap: start distinguishing if vma can be removed in mergeability test") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-04mm: userfaultfd: fix unexpected change to src_folio when UFFDIO_MOVE failsQi Zheng
After ptep_clear_flush(), if we find that src_folio is pinned we will fail UFFDIO_MOVE and put src_folio back to src_pte entry, but the change to src_folio->{mapping,index} is not restored in this process. This is not what we expected, so fix it. This can cause the rmap for that page to be invalid, possibly resulting in memory corruption. At least swapout+migration would no longer work, because we might fail to locate the mappings of that folio. Link: https://lkml.kernel.org/r/20240222080815.46291-1-zhengqi.arch@bytedance.com Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI") Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-04mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL ↵Vlastimil Babka
allocations Sven reports an infinite loop in __alloc_pages_slowpath() for costly order __GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO. Such combination can happen in a suspend/resume context where a GFP_KERNEL allocation can have __GFP_IO masked out via gfp_allowed_mask. Quoting Sven: 1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER) with __GFP_RETRY_MAYFAIL set. 2. page alloc's __alloc_pages_slowpath tries to get a page from the freelist. This fails because there is nothing free of that costly order. 3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim, which bails out because a zone is ready to be compacted; it pretends to have made a single page of progress. 4. page alloc tries to compact, but this always bails out early because __GFP_IO is not set (it's not passed by the snd allocator, and even if it were, we are suspending so the __GFP_IO flag would be cleared anyway). 5. page alloc believes reclaim progress was made (because of the pretense in item 3) and so it checks whether it should retry compaction. The compaction retry logic thinks it should try again, because: a) reclaim is needed because of the early bail-out in item 4 b) a zonelist is suitable for compaction 6. goto 2. indefinite stall. (end quote) The immediate root cause is confusing the COMPACT_SKIPPED returned from __alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be indicating a lack of order-0 pages, and in step 5 evaluating that in should_compact_retry() as a reason to retry, before incrementing and limiting the number of retries. There are however other places that wrongly assume that compaction can happen while we lack __GFP_IO. To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO evaluation and switch the open-coded test in try_to_compact_pages() to use it. Also use the new helper in: - compaction_ready(), which will make reclaim not bail out in step 3, so there's at least one attempt to actually reclaim, even if chances are small for a costly order - in_reclaim_compaction() which will make should_continue_reclaim() return false and we don't over-reclaim unnecessarily - in __alloc_pages_slowpath() to set a local variable can_compact, which is then used to avoid retrying reclaim/compaction for costly allocations (step 5) if we can't compact and also to skip the early compaction attempt that we do in some cases Link: https://lkml.kernel.org/r/20240221114357.13655-2-vbabka@suse.cz Fixes: 3250845d0526 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Sven van Ashbrook <svenva@chromium.org> Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBzPUVOZF%2Bg@mail.gmail.com/ Tested-by: Karthikeyan Ramasubramanian <kramasub@chromium.org> Cc: Brian Geffon <bgeffon@google.com> Cc: Curtis Malainey <cujomalainey@chromium.org> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Takashi Iwai <tiwai@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-04io_uring/net: fix overflow check in io_recvmsg_mshot_prep()Dan Carpenter
The "controllen" variable is type size_t (unsigned long). Casting it to int could lead to an integer underflow. The check_add_overflow() function considers the type of the destination which is type int. If we add two positive values and the result cannot fit in an integer then that's counted as an overflow. However, if we cast "controllen" to an int and it turns negative, then negative values *can* fit into an int type so there is no overflow. Good: 100 + (unsigned long)-4 = 96 <-- overflow Bad: 100 + (int)-4 = 96 <-- no overflow I deleted the cast of the sizeof() as well. That's not a bug but the cast is unnecessary. Fixes: 9b0fc3c054ff ("io_uring: fix types in io_recvmsg_multishot_overflow") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/138bd2e2-ede8-4bcc-aa7b-f3d9de167a37@moroto.mountain Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-04io_uring/net: correct the type of variableMuhammad Usama Anjum
The namelen is of type int. It shouldn't be made size_t which is unsigned. The signed number is needed for error checking before use. Fixes: c55978024d12 ("io_uring/net: move receive multishot out of the generic msghdr path") Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20240301144349.2807544-1-usama.anjum@collabora.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-04arm64: defconfig: Enable support for cbmem entries in the coreboot tableNícolas F. R. A. Prado
Enable the cbmem driver and dependencies in order to support reading cbmem entries from the coreboot table, which are used to store logs from coreboot on arm64 Chromebooks, and provide useful information for debugging the boot process on those devices. Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Link: https://lore.kernel.org/r/20240304-coreboot-defconfig-v1-1-02dc1940408f@collabora.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-03-04kselftest: Add basic test for probing the rust sample modulesLaura Nao
Add new basic kselftest that checks if the available rust sample modules can be added and removed correctly. Signed-off-by: Laura Nao <laura.nao@collabora.com> Reviewed-by: Sergio Gonzalez Collado <sergio.collado@gmail.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-03-04firmware: microchip: Fix over-requested allocation sizeDawei Li
cocci warnings: (new ones prefixed by >>) >> drivers/firmware/microchip/mpfs-auto-update.c:387:72-78: ERROR: application of sizeof to pointer drivers/firmware/microchip/mpfs-auto-update.c:170:72-78: ERROR: application of sizeof to pointer response_msg is a pointer to u32, so the size of element it points to is supposed to be a multiple of sizeof(u32), rather than sizeof(u32 *). Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202403040516.CYxoWTXw-lkp@intel.com/ Signed-off-by: Dawei Li <dawei.li@shingroup.cn> Fixes: ec5b0f1193ad ("firmware: microchip: add PolarFire SoC Auto Update support") Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2024-03-04Merge tag 'vfs-6.9.rw_hint' of ↵Christian Brauner
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull write hint fix from Christian Brauner: UFS devices are widely used in mobile applications, e.g. in smartphones. UFS vendors need data lifetime information to achieve good performance. Providing data lifetime information to UFS devices can result in up to 40% lower write amplification. Hence this patch series that restores the bi_write_hint member in struct bio. After this patch series has been merged, patches that implement data lifetime support in the SCSI disk (sd) driver will be sent to the Linux kernel SCSI maintainer. The following changes are included in this patch series: - Improvements for the F_GET_RW_HINT and F_SET_RW_HINT fcntls. - Move enum rw_hint into a new header file. - Support F_SET_RW_HINT for block devices to make it easy to test data lifetime support. - Restore the bio.bi_write_hint member and restore support in the VFS layer and also in the block layer for data lifetime information. The shell script that has been used to test the patch series combined with the SCSI patches is available at the end of this cover letter. * tag 'vfs-6.9.rw_hint' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: block, fs: Restore the per-bio/request data lifetime fields fs: Propagate write hints to the struct block_device inode fs: Move enum rw_hint into a new header file fs: Split fcntl_rw_hint() fs: Verify write lifetime constants at compile time fs: Fix rw_hint validation Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-03-04ASoC: amd: yc: Add HP Pavilion Aero Laptop 13-be2xxx(8BD6) into DMI quirk tableAl Raj Hassain
The HP Pavilion Aero Laptop 13-be2xxx(8BD6) requires a quirk entry for its internal microphone to function. Signed-off-by: Al Raj Hassain <alrajhassain@gmail.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://msgid.link/r/20240304103924.13673-1-alrajhassain@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-03-04ASoC: rcar: adg: correct TIMSEL setting for SSI9Andreas Pape
Timing select registers for SRC and CMD are by default referring to the corresponding SSI word select. The calculation rule from HW spec skips SSI8, which has no clock connection. >From section 43.2.18 CMD Output Timing Select Register (CMDOUT_TIMSEL), of R-Car Series, 3rd Generation Hardware User’s Manual Rev.2.20: CMD0_OUT_DIVCLK_ Output Timing SEL [4:0] Signal Select B'0 0110: ssi_ws0 B'0 0111: ssi_ws1 B'0 1000: ssi_ws2 B'0 1001: ssi_ws3 B'0 1010: ssi_ws4 B'0 1011: ssi_ws5 B'0 1100: ssi_ws6 B'0 1101: ssi_ws7 <GAP> B'0 1110: ssi_ws9 B'0 1111: Setting prohibited Fix the erroneous prohibited setting of timsel value 1111 (0xf) for SSI9 by using timsel value 1110 (0xe) instead. This is possible because SSI8 is not connected as shown by <GAP> in the table above. [21.695055] rcar_sound ec500000.sound: b adg[0]-CMDOUT_TIMSEL (32):00000f00/00000f1f Correct the timsel assignment. Fixes: 629509c5bc478c ("ASoC: rsnd: add Gen2 SRC and DMAEngine support") Suggested-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Andreas Pape <Andreas.Pape4@bosch.com> Signed-off-by: Yeswanth Rayapati <yeswanth.rayapati@in.bosch.com> Tested-by: Yeswanth Rayapati <yeswanth.rayapati@in.bosch.com> [erosca: massage commit description] Signed-off-by: Eugeniu Rosca <eugeniu.rosca@bosch.com> Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Link: https://msgid.link/r/20240301085003.3057-1-erosca@de.adit-jv.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-03-04x86/sev: Move early startup code into .head.text sectionArd Biesheuvel
In preparation for implementing rigorous build time checks to enforce that only code that can support it will be called from the early 1:1 mapping of memory, move SEV init code that is called in this manner to the .head.text section. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20240227151907.387873-19-ardb+git@google.com
2024-03-04x86/sme: Move early SME kernel encryption handling into .head.textArd Biesheuvel
The .head.text section is the initial primary entrypoint of the core kernel, and is entered with the CPU executing from a 1:1 mapping of memory. Such code must never access global variables using absolute references, as these are based on the kernel virtual mapping which is not active yet at this point. Given that the SME startup code is also called from this early execution context, move it into .head.text as well. This will allow more thorough build time checks in the future to ensure that early startup code only uses RIP-relative references to global variables. Also replace some occurrences of __pa_symbol() [which relies on the compiler generating an absolute reference, which is not guaranteed] and an open coded RIP-relative access with RIP_REL_REF(). Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20240227151907.387873-18-ardb+git@google.com
2024-03-04x86/boot: Move mem_encrypt= parsing to the decompressorArd Biesheuvel
The early SME/SEV code parses the command line very early, in order to decide whether or not memory encryption should be enabled, which needs to occur even before the initial page tables are created. This is problematic for a number of reasons: - this early code runs from the 1:1 mapping provided by the decompressor or firmware, which uses a different translation than the one assumed by the linker, and so the code needs to be built in a special way; - parsing external input while the entire kernel image is still mapped writable is a bad idea in general, and really does not belong in security minded code; - the current code ignores the built-in command line entirely (although this appears to be the case for the entire decompressor) Given that the decompressor/EFI stub is an intrinsic part of the x86 bootable kernel image, move the command line parsing there and out of the core kernel. This removes the need to build lib/cmdline.o in a special way, or to use RIP-relative LEA instructions in inline asm blocks. This involves a new xloadflag in the setup header to indicate that mem_encrypt=on appeared on the kernel command line. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20240227151907.387873-17-ardb+git@google.com
2024-03-04efi/libstub: Add generic support for parsing mem_encrypt=Ard Biesheuvel
Parse the mem_encrypt= command line parameter from the EFI stub if CONFIG_ARCH_HAS_MEM_ENCRYPT=y, so that it can be passed to the early boot code by the arch code in the stub. This avoids the need for the core kernel to do any string parsing very early in the boot. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20240227151907.387873-16-ardb+git@google.com
2024-03-04x86/startup_64: Simplify virtual switch on primary bootArd Biesheuvel
The secondary startup code is used on the primary boot path as well, but in this case, the initial part runs from a 1:1 mapping, until an explicit cross-jump is made to the kernel virtual mapping of the same code. On the secondary boot path, this jump is pointless as the code already executes from the mapping targeted by the jump. So combine this cross-jump with the jump from startup_64() into the common boot path. This simplifies the execution flow, and clearly separates code that runs from a 1:1 mapping from code that runs from the kernel virtual mapping. Note that this requires a page table switch, so hoist the CR3 assignment into startup_64() as well. And since absolute symbol references will no longer be permitted in .head.text once we enable the associated build time checks, a RIP-relative memory operand is used in the JMP instruction, referring to an absolute constant in the .init.rodata section. Given that the secondary startup code does not require a special placement inside the executable, move it to the .text section. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20240227151907.387873-15-ardb+git@google.com
2024-03-04x86/startup_64: Simplify calculation of initial page table addressArd Biesheuvel
Determining the address of the initial page table to program into CR3 involves: - taking the physical address - adding the SME encryption mask On the primary entry path, the code is mapped using a 1:1 virtual to physical translation, so the physical address can be taken directly using a RIP-relative LEA instruction. On the secondary entry path, the address can be obtained by taking the offset from the virtual kernel base (__START_kernel_map) and adding the physical kernel base. This is implemented in a slightly confusing way, so clean this up. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20240227151907.387873-14-ardb+git@google.com
2024-03-04x86/startup_64: Defer assignment of 5-level paging global variablesArd Biesheuvel
Assigning the 5-level paging related global variables from the earliest C code using explicit references that use the 1:1 translation of memory is unnecessary, as the startup code itself does not rely on them to create the initial page tables, and this is all it should be doing. So defer these assignments to the primary C entry code that executes via the ordinary kernel virtual mapping. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20240227151907.387873-13-ardb+git@google.com
2024-03-04x86/startup_64: Simplify CR4 handling in startup codeArd Biesheuvel
When paging is enabled, the CR4.PAE and CR4.LA57 control bits cannot be changed, and so they can simply be preserved rather than reason about whether or not they need to be set. CR4.MCE should be preserved unless the kernel was built without CONFIG_X86_MCE, in which case it must be cleared. CR4.PSE should be set explicitly, regardless of whether or not it was set before. CR4.PGE is set explicitly, and then cleared and set again after programming CR3 in order to flush TLB entries based on global translations. This makes the first assignment redundant, and can therefore be omitted. So clear PGE by omitting it from the preserve mask, and set it again explicitly after switching to the new page tables. [ bp: Document the exact operation of CR4.PGE ] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20240227151907.387873-12-ardb+git@google.com
2024-03-04drm/panel: boe-tv101wum-nl6: Fine tune Himax83102-j02 panel HFP and HBP (again)Cong Yang
The current measured frame rate is 59.95Hz, which does not meet the requirements of touch-stylus and stylus cannot work normally. After adjustment, the actual measurement is 60.001Hz. Now this panel looks like it's only used by me on the MTK platform, so let's change this set of parameters. [ dianders: Added "(again") to subject and fixed the "Fixes" line ] Fixes: cea7008190ad ("drm/panel: boe-tv101wum-nl6: Fine tune Himax83102-j02 panel HFP and HBP") Signed-off-by: Cong Yang <yangcong5@huaqin.corp-partner.google.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240301061128.3145982-1-yangcong5@huaqin.corp-partner.google.com
2024-03-04x86/idle: Select idle routine only onceThomas Gleixner
The idle routine selection is done on every CPU bringup operation and has a guard in place which is effective after the first invocation, which is a pointless exercise. Invoke it once on the boot CPU and mark the related functions __init. The guard check has to stay as xen_set_default_idle() runs early. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/87edcu6vaq.ffs@tglx
2024-03-04x86/idle: Let prefer_mwait_c1_over_halt() return boolThomas Gleixner
The return value is truly boolean. Make it so. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240229142248.518723854@linutronix.de
2024-03-04x86/idle: Cleanup idle_setup()Thomas Gleixner
Updating the static call for x86_idle() from idle_setup() is counter-intuitive. Let select_idle_routine() handle it like the other idle choices, which allows to simplify the idle selection later on. While at it rewrite comments and return a proper error code and not -1. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240229142248.455616019@linutronix.de
2024-03-04x86/idle: Clean up idle selectionThomas Gleixner
Clean up the code to make it readable. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240229142248.392017685@linutronix.de
2024-03-04x86/idle: Sanitize X86_BUG_AMD_E400 handlingThomas Gleixner
amd_e400_idle(), the idle routine for AMD CPUs which are affected by erratum 400 violates the RCU constraints by invoking tick_broadcast_enter() and tick_broadcast_exit() after the core code has marked RCU non-idle. The functions can end up in lockdep or tracing, which rightfully triggers a RCU warning. The core code provides now a static branch conditional invocation of the broadcast functions. Remove amd_e400_idle(), enforce default_idle() and enable the static branch on affected CPUs to cure this. [ bp: Fold in a fix for a IS_ENABLED() check fail missing a "CONFIG_" prefix which tglx spotted. ] Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/877cim6sis.ffs@tglx
2024-03-04nvme-fabrics: typo in nvmf_parse_key()Hannes Reinecke
Of course we should use the key if there is no error ... Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-04tee: make tee_bus_type constRicardo B. Marliere
Since commit d492cc2573a0 ("driver core: device.h: make struct bus_type a const *"), the driver core can properly handle constant struct bus_type, move the tee_bus_type variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Reviewed-by: Sumit Garg <sumit.garg@linaro.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-03-04nvme-multipath: use atomic queue limits API for stacking limitsChristoph Hellwig
Switch to the queue_limits_* helpers to stack the bdev limits, which also includes updating the readahead settings. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-04nvme-multipath: pass queue_limits to blk_alloc_diskChristoph Hellwig
The multipath disk starts out with the stacking default limits. The one interesting part here is that blk_set_stacking_limits sets the max_zone_append_sectorts to UINT_MAX, which fails the validation for non-zoned devices. With the old one call per limit scheme this was fine because no one verified this weird mismatch and it was fixed by blk_stack_limits a little later before I/O could be issued. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-04nvme: use the atomic queue limits update APIChristoph Hellwig
Changes the callchains that update queue_limits to build an on-stack queue_limits and update it atomically. Note that for now only the admin queue actually passes it to the queue allocation function. Doing the same for the gendisks used for the namespaces will require a little more work. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-04nvme: cleanup nvme_configure_metadataChristoph Hellwig
Fold nvme_init_ms into nvme_configure_metadata after splitting up a little helper to deal with the extended LBA formats. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-04nvme: don't query identify data in configure_metadataChristoph Hellwig
Move reading the Identify Namespace Data Structure, NVM Command Set out of configure_metadata into the caller. This allows doing the identify call outside the frozen I/O queues, and prepares for using data from the Identify data structure for other purposes. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-04nvme: split out a nvme_identify_ns_nvm helperChristoph Hellwig
Split the logic to query the Identify Namespace Data Structure, NVM Command Set into a separate helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-04nvme: move common logic into nvme_update_ns_infoChristoph Hellwig
nvme_update_ns_info_generic and nvme_update_ns_info_block share a fair amount of logic related to not fully supported namespace formats and updating the multipath information. Move this logic into the common caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-04nvme: move setting the write cache flags out of nvme_set_queue_limitsChristoph Hellwig
nvme_set_queue_limits is used on the admin queue and all gendisks including hidden ones that don't support block I/O. The write cache setting on the other hand only makes sense for block I/O. Move the blk_queue_write_cache call to nvme_update_ns_info_block instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-04nvme: move a few things out of nvme_update_disk_infoChristoph Hellwig
Move setting up the integrity profile and setting the disk capacity out of nvme_update_disk_info to get nvme_update_disk_info into a shape where it just sets queue_limits eventually. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-04nvme: don't use nvme_update_disk_info for the multipath diskChristoph Hellwig
Currently nvme_update_ns_info_block calls nvme_update_disk_info both for the namespace attached disk, and the multipath one (if it exists). This is very different from how other stacking drivers work, and leads to a lot of complexity. Switch to setting the disk capacity and initializing the integrity profile, and let blk_stack_limits which already is called just below deal with updating the other limits. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-04nvme: move blk_integrity_unregister into nvme_init_integrityChristoph Hellwig
Move uneregistering the existing integrity profile into the helper dealing with all the other integrity / metadata setup. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-04nvme: cleanup the nvme_init_integrity calling conventionsChristoph Hellwig
Handle the no metadata support case in nvme_init_integrity as well to simplify the calling convention and prepare for future changes in the area. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-04nvme: move max_integrity_segments handling out of nvme_init_integrityChristoph Hellwig
max_integrity_segments is just a hardware limit and doesn't need to be in nvme_init_integrity with the PI setup. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-04nvme: remove nvme_revalidate_zonesChristoph Hellwig
Handle setting the zone size / chunk_sectors and max_append_sectors limits together with the other ZNS limits, and just open code the call to blk_revalidate_zones in the current place. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-04nvme: move NVME_QUIRK_DEALLOCATE_ZEROES out of nvme_config_discardChristoph Hellwig
Move the handling of the NVME_QUIRK_DEALLOCATE_ZEROES quirk out of nvme_config_discard so that it is combined with the normal write_zeroes limit handling. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-04nvme: set max_hw_sectors unconditionallyChristoph Hellwig
All transports set a max_hw_sectors value in the nvme_ctrl, so make the code using it unconditional and clean it up using a little helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-04Merge tag 'vexpress-update-6.9' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into soc/dt Arm Vexpress update for v6.9 Just a single update to add stdout-path in the device tree chosen node so that the system console can be routed to this serial/uart port. * tag 'vexpress-update-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: ARM: dts: vexpress: Set stdout-path to serial0 in the chosen node Link: https://lore.kernel.org/r/20240223033307.117906-1-sudeep.holla@arm.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-03-04ARM: dts: samsung: exynos4412: decrease memory to account for unusable regionArtur Weber
The last 4 MiB of RAM on those devices is likely used by trustzone firmware, and is unusable under Linux. Change the device tree memory node accordingly. The proprietary bootloader (S-BOOT) passes these memory ranges through ATAG_MEM; this change allows us to have the correct memory ranges without relying on ATAG_MEM. Tested-by: Henrik Grimler <henrik@grimler.se> # i9300, i9305 Signed-off-by: Artur Weber <aweber.kernel@gmail.com> Link: https://lore.kernel.org/r/20240217-exynos4-memsize-fix-v1-1-7858e9c5f844@gmail.com Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-03-04Merge tag 'ti-keystone-dt-for-v6.9' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/ti/linux into soc/dt Keystone2 device tree updates for v6.9 Cosmetic cleanups: * Replace http urls with https. * tag 'ti-keystone-dt-for-v6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/ti/linux: ARM: dts: keystone: Replace http urls with https Link: https://lore.kernel.org/r/20240304133843.e6rm5va6w4oavgoy@posted Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-03-04Merge tag 'v6.9-rockchip-config64-1' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into soc/defconfig Enablement of the hdmi phy-driver used on rk3588. * tag 'v6.9-rockchip-config64-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: arm64: defconfig: Enable Rockchip HDMI/eDP Combo PHY Link: https://lore.kernel.org/r/14841602.uLZWGnKmhe@phil Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-03-04Merge tag 'ti-k3-config-for-v6.9' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/ti/linux into soc/defconfig TI K2/K3 defconfig updates for v6.9 - Enable Wave5 encoder/decoder driver present on multiple K3 SoCs - Enable K2 boards via multi_v7_defconfig to move away from keystone specific defconfig. * tag 'ti-k3-config-for-v6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/ti/linux: arm64: defconfig: Enable Wave5 Video Encoder/Decoder ARM: multi_v7_defconfig: Add more TI Keystone support Link: https://lore.kernel.org/r/a555943a-45d0-4e8e-ad25-682638cfcff7@ti.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-03-04ARM: defconfig: enable STMicroelectronics accelerometer and gyro for ExynosMartin Jücker
Enable STMicroelectronics accelerometer and gyro drivers for the Samsung P4note device family in exynos and multi_v7 defconfigs. Signed-off-by: Martin Jücker <martin.juecker@gmail.com> Link: https://lore.kernel.org/r/20231221230258.56272-2-martin.juecker@gmail.com Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240227080755.34170-2-krzysztof.kozlowski@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-03-04Merge tag 'imx-defconfig-6.9' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into soc/defconfig i.MX defconfig for 6.9: - Enable VF610 GPIO driver for both arm64 and i.MX ARM defconfig. - Enable i.MX8MP LDB bridge and i.MX8QXP support drivers in arm64 defconfig. * tag 'imx-defconfig-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: arm64: defconfig: enable i.MX8MP ldb bridge arm64: defconfig: enable the vf610 gpio driver ARM: imx_v6_v7_defconfig: enable the vf610 gpio driver arm64: defconfig: Enable i.MX8QXP device drivers Link: https://lore.kernel.org/r/20240226034147.233993-5-shawnguo2@yeah.net Signed-off-by: Arnd Bergmann <arnd@arndb.de>