linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-02-26	mptcp: map v4 address to v6 when destroying subflow	Geliang Tang
	Address family of server side mismatches with that of client side, like in "userspace pm add & remove address" test: userspace_pm_add_addr $ns1 10.0.2.1 10 userspace_pm_rm_sf $ns1 "::ffff:10.0.2.1" $SUB_ESTABLISHED That's because on the server side, the family is set to AF_INET6 and the v4 address is mapped in a v6 one. This patch fixes this issue. In mptcp_pm_nl_subflow_destroy_doit(), before checking local address family with remote address family, map an IPv4 address to an IPv6 address if the pair is a v4-mapped address. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/387 Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment") Cc: stable@vger.kernel.org Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Mat Martineau <martineau@kernel.org> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-20240223-misc-fixes-v1-1-162e87e48497@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26	dpll: rely on rcu for netdev_dpll_pin()	Eric Dumazet
	This fixes a possible UAF in if_nlmsg_size(), which can run without RTNL. Add rcu protection to "struct dpll_pin" Move netdev_dpll_pin() from netdevice.h to dpll.h to decrease name pollution. Note: This looks possible to no longer acquire RTNL in netdev_dpll_pin_assign() later in net-next. v2: do not force rcu_read_lock() in rtnl_dpll_pin_size() (Jiri Pirko) Fixes: 5f1842692880 ("netdev: expose DPLL pin handle for netdevice") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240223123208.3543319-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26	lan78xx: enable auto speed configuration for LAN7850 if no EEPROM is detected	Oleksij Rempel
	Same as LAN7800, LAN7850 can be used without EEPROM. If EEPROM is not present or not flashed, LAN7850 will fail to sync the speed detected by the PHY with the MAC. In case link speed is 100Mbit, it will accidentally work, otherwise no data can be transferred. Better way would be to implement link_up callback, or set auto speed configuration unconditionally. But this changes would be more intrusive. So, for now, set it only if no EEPROM is found. Fixes: e69647a19c87 ("lan78xx: Set ASD in MAC_CR when EEE is enabled.") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20240222123839.2816561-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26	scsi: mpt3sas: Prevent sending diag_reset when the controller is ready	Ranjan Kumar
	If the driver detects that the controller is not ready before sending the first IOC facts command, it will wait for a maximum of 10 seconds for it to become ready. However, even if the controller becomes ready within 10 seconds, the driver will still issue a diagnostic reset. Modify the driver to avoid sending a diag reset if the controller becomes ready within the 10-second wait time. Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20240221071724.14986-1-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-02-26	Merge branches 'rcu-doc.2024.02.14a', 'rcu-nocb.2024.02.14a', ↵	Boqun Feng
	'rcu-exp.2024.02.14a', 'rcu-tasks.2024.02.26a' and 'rcu-misc.2024.02.14a' into rcu.2024.02.26a
2024-02-26	scsi: mpi3mr: Reduce stack usage in mpi3mr_refresh_sas_ports()	Arnd Bergmann
	Doubling the number of PHYs also doubled the stack usage of this function, exceeding the 32-bit limit of 1024 bytes: drivers/scsi/mpi3mr/mpi3mr_transport.c: In function 'mpi3mr_refresh_sas_ports': drivers/scsi/mpi3mr/mpi3mr_transport.c:1818:1: error: the frame size of 1636 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] Since the sas_io_unit_pg0 structure is already allocated dynamically, use the same method here. The size of the allocation can be smaller based on the actual number of phys now, so use this as an upper bound. Fixes: cb5b60894602 ("scsi: mpi3mr: Increase maximum number of PHYs to 64 from 32") Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Sathya Prakash Veerichetty <sathya.prakash@broadcom.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20240123130754.2011469-1-arnd@kernel.org Tested-by: John Garry <john.g.garry@oracle.com> #build only Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-02-26	x86/bugs: Use fixed addressing for VERW operand	Pawan Gupta
	The macro used for MDS mitigation executes VERW with relative addressing for the operand. This was necessary in earlier versions of the series. Now it is unnecessary and creates a problem for backports on older kernels that don't support relocations in alternatives. Relocation support was added by commit 270a69c4485d ("x86/alternative: Support relocations in alternatives"). Also asm for fixed addressing is much cleaner than relative RIP addressing. Simplify the asm by using fixed addressing for VERW operand. [ dhansen: tweak changelog ] Closes: https://lore.kernel.org/lkml/20558f89-299b-472e-9a96-171403a83bd6@suse.com/ Fixes: baf8361e5455 ("x86/bugs: Add asm helpers for executing VERW") Reported-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20240226-verw-arg-fix-v1-1-7b37ee6fd57d%40linux.intel.com
2024-02-27	x86: Increase brk randomness entropy for 64-bit systems	Kees Cook
	In commit c1d171a00294 ("x86: randomize brk"), arch_randomize_brk() was defined to use a 32MB range (13 bits of entropy), but was never increased when moving to 64-bit. The default arch_randomize_brk() uses 32MB for 32-bit tasks, and 1GB (18 bits of entropy) for 64-bit tasks. Update x86_64 to match the entropy used by arm64 and other 64-bit architectures. Reported-by: y0un9n132@gmail.com Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Jiri Kosina <jkosina@suse.com> Closes: https://lore.kernel.org/linux-hardening/CA+2EKTVLvc8hDZc+2Yhwmus=dzOUG5E4gV7ayCbu0MPJTZzWkw@mail.gmail.com/ Link: https://lore.kernel.org/r/20240217062545.1631668-1-keescook@chromium.org
2024-02-27	x86/vdso: Move vDSO to mmap region	Daniel Micay
	The vDSO (and its initial randomization) was introduced in commit 2aae950b21e4 ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu"), but had very low entropy. The entropy was improved in commit 394f56fe4801 ("x86_64, vdso: Fix the vdso address randomization algorithm"), but there is still improvement to be made. In principle there should not be executable code at a low entropy offset from the stack, since the stack and executable code having separate randomization is part of what makes ASLR stronger. Remove the only executable code near the stack region and give the vDSO the same randomized base as other mmap mappings including the linker and other shared objects. This results in higher entropy being provided and there's little to no advantage in separating this from the existing executable code there. This is already how other architectures like arm64 handle the vDSO. As an side, while it's sensible for userspace to reserve the initial mmap base as a region for executable code with a random gap for other mmap allocations, along with providing randomization within that region, there isn't much the kernel can do to help due to how dynamic linkers load the shared objects. This was extracted from the PaX RANDMMAP feature. [kees: updated commit log with historical details and other tweaks] Signed-off-by: Daniel Micay <danielmicay@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Closes: https://github.com/KSPP/linux/issues/280 Link: https://lore.kernel.org/r/20240210091827.work.233-kees@kernel.org
2024-02-26	x86/nmi: Fix the inverse "in NMI handler" check	Breno Leitao
	Commit 344da544f177 ("x86/nmi: Print reasons why backtrace NMIs are ignored") creates a super nice framework to diagnose NMIs. Every time nmi_exc() is called, it increments a per_cpu counter (nsp->idt_nmi_seq). At its exit, it also increments the same counter. By reading this counter it can be seen how many times that function was called (dividing by 2), and, if the function is still being executed, by checking the idt_nmi_seq's least significant bit. On the check side (nmi_backtrace_stall_check()), that variable is queried to check if the NMI is still being executed, but, there is a mistake in the bitwise operation. That code wants to check if the least significant bit of the idt_nmi_seq is set or not, but does the opposite, and checks for all the other bits, which will always be true after the first exc_nmi() executed successfully. This appends the misleading string to the dump "(CPU currently in NMI handler function)" Fix it by checking the least significant bit, and if it is set, append the string. Fixes: 344da544f177 ("x86/nmi: Print reasons why backtrace NMIs are ignored") Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240207165237.1048837-1-leitao@debian.org
2024-02-26	vdso/datapage: Quick fix - use asm/page-def.h for ARM64	Anna-Maria Behnsen
	The vdso rework for the generic union vdso_data_store broke compat VDSO on arm64: In file included from arch/arm64/include/asm/lse.h:5, from arch/arm64/include/asm/cmpxchg.h:14, from arch/arm64/include/asm/atomic.h:16, from include/linux/atomic.h:7, from include/asm-generic/bitops/atomic.h:5, from arch/arm64/include/asm/bitops.h:25, from include/linux/bitops.h:68, from arch/arm64/include/asm/memory.h:209, from arch/arm64/include/asm/page.h:46, from include/vdso/datapage.h:22, from lib/vdso/gettimeofday.c:5, from <command-line>: arch/arm64/include/asm/atomic_ll_sc.h:298:9: error: unknown type name 'u128' 298 \| u128 full; \| ^~~~ arch/arm64/include/asm/atomic_ll_sc.h:305:24: error: unknown type name 'u128' 305 \| static __always_inline u128 \ \| The reason is the include of asm/page.h which in turn includes headers which are outside the scope of compat VDSO. The only reason for the asm/page.h include is the required definition of PAGE_SIZE. But as arm64 defines PAGE_SIZE in asm/page-def.h without extra header includes, this could be used instead. Caution: this is a quick fix only! The final fix is an upcoming cleanup of Arnd which consolidates PAGE_SIZE definition. After the cleanup, the include of asm/page.h to access PAGE_SIZE is no longer required. Fixes: a0d2fcd62ac2 ("vdso/ARM: Make union vdso_data_store available for all architectures") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240226175023.56679-1-anna-maria@linutronix.de Link: https://lore.kernel.org/lkml/CA+G9fYtrXXm_KO9fNPz3XaRxHV7UD_yQp-TEuPQrNRHU+_0W_Q@mail.gmail.com/
2024-02-26	md/md-bitmap: fix incorrect usage for sb_index	Heming Zhao
	Commit d7038f951828 ("md-bitmap: don't use ->index for pages backing the bitmap file") removed page->index from bitmap code, but left wrong code logic for clustered-md. current code never set slot offset for cluster nodes, will sometimes cause crash in clustered env. Call trace (partly): md_bitmap_file_set_bit+0x110/0x1d8 [md_mod] md_bitmap_startwrite+0x13c/0x240 [md_mod] raid1_make_request+0x6b0/0x1c08 [raid1] md_handle_request+0x1dc/0x368 [md_mod] md_submit_bio+0x80/0xf8 [md_mod] __submit_bio+0x178/0x300 submit_bio_noacct_nocheck+0x11c/0x338 submit_bio_noacct+0x134/0x614 submit_bio+0x28/0xdc submit_bh_wbc+0x130/0x1cc submit_bh+0x1c/0x28 Fixes: d7038f951828 ("md-bitmap: don't use ->index for pages backing the bitmap file") Cc: stable@vger.kernel.org # v6.6+ Signed-off-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240223121128.28985-1-heming.zhao@suse.com
2024-02-26	iommufd/selftest: Don't check map/unmap pairing with HUGE_PAGES	Jason Gunthorpe
	Since MOCK_HUGE_PAGE_SIZE was introduced it allows the core code to invoke mock with large page sizes. This confuses the validation logic that checks that map/unmap are paired. This is because the page size computed for map is based on the physical address and in many cases will always be the base page size, however the entire range generated by iommufd will be passed to map. Randomly iommufd can see small groups of physically contiguous pages, (say 8k unaligned and grouped together), but that group crosses a huge page boundary. The map side will observe this as a contiguous run and mark it accordingly, but there is a chance the unmap side will end up terminating interior huge pages in the middle of that group and trigger a validation failure. Meaning the validation only works if the core code passes the iova/length directly from iommufd to mock. syzkaller randomly hits this with failures like: WARNING: CPU: 0 PID: 11568 at drivers/iommu/iommufd/selftest.c:461 mock_domain_unmap_pages+0x1c0/0x250 Modules linked in: CPU: 0 PID: 11568 Comm: syz-executor.0 Not tainted 6.8.0-rc3+ #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:mock_domain_unmap_pages+0x1c0/0x250 Code: 2b e8 94 37 0f ff 48 d1 eb 31 ff 48 b8 00 00 00 00 00 00 20 00 48 21 c3 48 89 de e8 aa 32 0f ff 48 85 db 75 07 e8 70 37 0f ff <0f> 0b e8 69 37 0f ff 31 f6 31 ff e8 90 32 0f ff e8 5b 37 0f ff 4c RSP: 0018:ffff88800e707490 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff822dfae6 RDX: ffff88800cf86400 RSI: ffffffff822dfaf0 RDI: 0000000000000007 RBP: ffff88800e7074d8 R08: 0000000000000000 R09: ffffed1001167c90 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000001500000 R13: 0000000000083000 R14: 0000000000000001 R15: 0000000000000800 FS: 0000555556048480(0000) GS:ffff88806d400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2dc23000 CR3: 0000000008cbb000 CR4: 0000000000350eb0 Call Trace: <TASK> __iommu_unmap+0x281/0x520 iommu_unmap+0xc9/0x180 iopt_area_unmap_domain_range+0x1b1/0x290 iopt_area_unpin_domain+0x590/0x800 __iopt_area_unfill_domain+0x22e/0x650 iopt_area_unfill_domain+0x47/0x60 iopt_unfill_domain+0x187/0x590 iopt_table_remove_domain+0x267/0x2d0 iommufd_hwpt_paging_destroy+0x1f1/0x370 iommufd_object_remove+0x2a3/0x490 iommufd_device_detach+0x23a/0x2c0 iommufd_selftest_destroy+0x7a/0xf0 iommufd_fops_release+0x1d3/0x340 __fput+0x272/0xb50 __fput_sync+0x4b/0x60 __x64_sys_close+0x8b/0x110 do_syscall_64+0x71/0x140 entry_SYSCALL_64_after_hwframe+0x46/0x4e Do the simple thing and just disable the validation when the huge page tests are being run. Fixes: 7db521e23fe9 ("iommufd/selftest: Hugepage mock domain support") Link: https://lore.kernel.org/r/0-v1-1e17e60a5c8a+103fb-iommufd_mock_hugepg_jgg@nvidia.com Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-02-26	iommufd: Fix protection fault in iommufd_test_syz_conv_iova	Nicolin Chen
	Syzkaller reported the following bug: general protection fault, probably for non-canonical address 0xdffffc0000000038: 0000 [#1] SMP KASAN KASAN: null-ptr-deref in range [0x00000000000001c0-0x00000000000001c7] Call Trace: lock_acquire lock_acquire+0x1ce/0x4f0 down_read+0x93/0x4a0 iommufd_test_syz_conv_iova+0x56/0x1f0 iommufd_test_access_rw.isra.0+0x2ec/0x390 iommufd_test+0x1058/0x1e30 iommufd_fops_ioctl+0x381/0x510 vfs_ioctl __do_sys_ioctl __se_sys_ioctl __x64_sys_ioctl+0x170/0x1e0 do_syscall_x64 do_syscall_64+0x71/0x140 This is because the new iommufd_access_change_ioas() sets access->ioas to NULL during its process, so the lock might be gone in a concurrent racing context. Fix this by doing the same access->ioas sanity as iommufd_access_rw() and iommufd_access_pin_pages() functions do. Cc: stable@vger.kernel.org Fixes: 9227da7816dd ("iommufd: Add iommufd_access_change_ioas(_id) helpers") Link: https://lore.kernel.org/r/3f1932acaf1dd494d404c04364d73ce8f57f3e5e.1708636627.git.nicolinc@nvidia.com Reported-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-02-26	iommufd/selftest: Fix mock_dev_num bug	Nicolin Chen
	Syzkaller reported the following bug: sysfs: cannot create duplicate filename '/devices/iommufd_mock4' Call Trace: sysfs_warn_dup+0x71/0x90 sysfs_create_dir_ns+0x1ee/0x260 ? sysfs_create_mount_point+0x80/0x80 ? spin_bug+0x1d0/0x1d0 ? do_raw_spin_unlock+0x54/0x220 kobject_add_internal+0x221/0x970 kobject_add+0x11c/0x1e0 ? lockdep_hardirqs_on_prepare+0x273/0x3e0 ? kset_create_and_add+0x160/0x160 ? kobject_put+0x5d/0x390 ? bus_get_dev_root+0x4a/0x60 ? kobject_put+0x5d/0x390 device_add+0x1d5/0x1550 ? __fw_devlink_link_to_consumers.isra.0+0x1f0/0x1f0 ? __init_waitqueue_head+0xcb/0x150 iommufd_test+0x462/0x3b60 ? lock_release+0x1fe/0x640 ? __might_fault+0x117/0x170 ? reacquire_held_locks+0x4b0/0x4b0 ? iommufd_selftest_destroy+0xd0/0xd0 ? __might_fault+0xbe/0x170 iommufd_fops_ioctl+0x256/0x350 ? iommufd_option+0x180/0x180 ? __lock_acquire+0x1755/0x45f0 __x64_sys_ioctl+0xa13/0x1640 The bug is triggered when Syzkaller created multiple mock devices but didn't destroy them in the same sequence, messing up the mock_dev_num counter. Replace the atomic with an mock_dev_ida. Cc: stable@vger.kernel.org Fixes: 23a1b46f15d5 ("iommufd/selftest: Make the mock iommu driver into a real driver") Link: https://lore.kernel.org/r/5af41d5af6d5c013cc51de01427abb8141b3587e.1708636627.git.nicolinc@nvidia.com Reported-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-02-26	iommufd: Fix iopt_access_list_id overwrite bug	Nicolin Chen
	Syzkaller reported the following WARN_ON: WARNING: CPU: 1 PID: 4738 at drivers/iommu/iommufd/io_pagetable.c:1360 Call Trace: iommufd_access_change_ioas+0x2fe/0x4e0 iommufd_access_destroy_object+0x50/0xb0 iommufd_object_remove+0x2a3/0x490 iommufd_object_destroy_user iommufd_access_destroy+0x71/0xb0 iommufd_test_staccess_release+0x89/0xd0 __fput+0x272/0xb50 __fput_sync+0x4b/0x60 __do_sys_close __se_sys_close __x64_sys_close+0x8b/0x110 do_syscall_x64 The mismatch between the access pointer in the list and the passed-in pointer is resulting from an overwrite of access->iopt_access_list_id, in iopt_add_access(). Called from iommufd_access_change_ioas() when xa_alloc() succeeds but iopt_calculate_iova_alignment() fails. Add a new_id in iopt_add_access() and only update iopt_access_list_id when returning successfully. Cc: stable@vger.kernel.org Fixes: 9227da7816dd ("iommufd: Add iommufd_access_change_ioas(_id) helpers") Link: https://lore.kernel.org/r/2dda7acb25b8562ec5f1310de828ef5da9ef509c.1708636627.git.nicolinc@nvidia.com Reported-by: Jason Gunthorpe <jgg@nvidia.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-02-26	Merge tag 'mtd/fixes-for-6.8-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull mtd fixes from Miquel Raynal: "Many NAND page layouts have been added to the Marvell NAND controller but could not be used in practice so they are being removed. Regarding the SPI-NAND area, Gigadevice chips were not using the right buffer for an ECC status check operation. Aside from these driver fixes, there is also a refcount fix in the MTD core nodes parsing logic" * tag 'mtd/fixes-for-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: mtd: rawnand: marvell: fix layouts mtd: Fix possible refcounting issue when going through partition nodes mtd: spinand: gigadevice: Fix the get ecc status issue
2024-02-26	Merge tag 'for-6.8-rc6-tag' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A more fixes for recently reported or discovered problems: - fix corner case of send that would generate potentially large stream of zeros if there's a hole at the end of the file - fix chunk validation in zoned mode on conventional zones, it was possible to create chunks that would not be allowed on sequential zones - fix validation of dev-replace ioctl filenames - fix KCSAN warnings about access to block reserve struct members" * tag 'for-6.8-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix data race at btrfs_use_block_rsv() when accessing block reserve btrfs: fix data races when accessing the reserved amount of block reserves btrfs: send: don't issue unnecessary zero writes for trailing hole btrfs: dev-replace: properly validate device names btrfs: zoned: don't skip block group profile checks on conventional zones
2024-02-26	md: check mddev->pers before calling md_set_readonly()	Li Nan
	If 'mddev->pers' is NULL, there is nothing to do in md_set_readonly(). Except for md_ioctl(), the other two callers of md_set_readonly() have already checked 'mddev->pers'. To simplify the code, move the check of 'mddev->pers' to the caller. Signed-off-by: Li Nan <linan122@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240226031444.3606764-10-linan666@huaweicloud.com
2024-02-26	md: clean up openers check in do_md_stop() and md_set_readonly()	Li Nan
	Before stopping or setting readonly, mddev_set_closing_and_sync_blockdev() is always called to check the openers. So no longer need to check it again in do_md_stop() and md_set_readonly(). Clean it up. Signed-off-by: Li Nan <linan122@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240226031444.3606764-9-linan666@huaweicloud.com
2024-02-26	md: sync blockdev before stopping raid or setting readonly	Li Nan
	Commit a05b7ea03d72 ("md: avoid crash when stopping md array races with closing other open fds.") added sync_block before stopping raid and setting readonly. Later in commit 260fa034ef7a ("md: avoid deadlock when dirty buffers during md_stop.") it is moved to ioctl. array_state_store() was ignored. Add sync blockdev to array_state_store() now. Signed-off-by: Li Nan <linan122@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240226031444.3606764-8-linan666@huaweicloud.com
2024-02-26	md: factor out a helper to sync mddev	Li Nan
	There are no functional changes, prepare to sync mddev in array_state_store(). Signed-off-by: Li Nan <linan122@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240226031444.3606764-7-linan666@huaweicloud.com
2024-02-26	md: Don't clear MD_CLOSING when the raid is about to stop	Li Nan
	The raid should not be opened anymore when it is about to be stopped. However, other processes can open it again if the flag MD_CLOSING is cleared before exiting. From now on, this flag will not be cleared when the raid will be stopped. Fixes: 065e519e71b2 ("md: MD_CLOSING needs to be cleared after called md_set_readonly or do_md_stop") Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240226031444.3606764-6-linan666@huaweicloud.com
2024-02-26	md: return directly before setting did_set_md_closing	Li Nan
	There is nothing to do at 'out' before setting 'did_set_md_closing' in md_ioctl(). Return directly, and it will help us to remove 'did_set_md_closing' later. Signed-off-by: Li Nan <linan122@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240226031444.3606764-5-linan666@huaweicloud.com
2024-02-26	md: clean up invalid BUG_ON in md_ioctl	Li Nan
	'disk->private_data' is set to mddev in md_alloc() and never set to NULL, and users need to open mddev before submitting ioctl. So mddev must not have been freed during ioctl, and there is no need to check mddev here. Clean up it. Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240226031444.3606764-4-linan666@huaweicloud.com
2024-02-26	md: changed the switch of RAID_VERSION to if	Li Nan
	There is only one case of this 'switch'. Change it to 'if'. Signed-off-by: Li Nan <linan122@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240226031444.3606764-3-linan666@huaweicloud.com
2024-02-26	md: merge the check of capabilities into md_ioctl_valid()	Li Nan
	There is no functional change. Just to make code cleaner. Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240226031444.3606764-2-linan666@huaweicloud.com
2024-02-26	ceph: switch to corrected encoding of max_xattr_size in mdsmap	Xiubo Li
	The addition of bal_rank_mask with encoding version 17 was merged into ceph.git in Oct 2022 and made it into v18.2.0 release normally. A few months later, the much delayed addition of max_xattr_size got merged, also with encoding version 17, placed before bal_rank_mask in the encoding -- but it didn't make v18.2.0 release. The way this ended up being resolved on the MDS side is that bal_rank_mask will continue to be encoded in version 17 while max_xattr_size is now encoded in version 18. This does mean that older kernels will misdecode version 17, but this is also true for v18.2.0 and v18.2.1 clients in userspace. The best we can do is backport this adjustment -- see ceph.git commit 78abfeaff27fee343fb664db633de5b221699a73 for details. [ idryomov: changelog ] Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/64440 Fixes: d93231a6bc8a ("ceph: prevent a client from exceeding the MDS maximum xattr size") Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@ibm.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-02-26	fs/ntfs3: fix build without CONFIG_NTFS3_LZX_XPRESS	Mark O'Donovan
	When CONFIG_NTFS3_LZX_XPRESS is not set then we get the following build error: fs/ntfs3/frecord.c:2460:16: error: unused variable ‘i_size’ Signed-off-by: Mark O'Donovan <shiftee@posteo.net> Fixes: 4fd6c08a16d7 ("fs/ntfs3: Use i_size_read and i_size_write") Tested-by: Chris Clayton <chris2553@googlemail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-02-26	landlock: Fix asymmetric private inodes referring	Mickaël Salaün
	When linking or renaming a file, if only one of the source or destination directory is backed by an S_PRIVATE inode, then the related set of layer masks would be used as uninitialized by is_access_to_paths_allowed(). This would result to indeterministic access for one side instead of always being allowed. This bug could only be triggered with a mounted filesystem containing both S_PRIVATE and !S_PRIVATE inodes, which doesn't seem possible. The collect_domain_accesses() calls return early if is_nouser_or_private() returns false, which means that the directory's superblock has SB_NOUSER or its inode has S_PRIVATE. Because rename or link actions are only allowed on the same mounted filesystem, the superblock is always the same for both source and destination directories. However, it might be possible in theory to have an S_PRIVATE parent source inode with an !S_PRIVATE parent destination inode, or vice versa. To make sure this case is not an issue, explicitly initialized both set of layer masks to 0, which means to allow all actions on the related side. If at least on side has !S_PRIVATE, then collect_domain_accesses() and is_access_to_paths_allowed() check for the required access rights. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Günther Noack <gnoack@google.com> Cc: Jann Horn <jannh@google.com> Cc: Shervin Oloumi <enlightened@chromium.org> Cc: stable@vger.kernel.org Fixes: b91c3e4ea756 ("landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER") Link: https://lore.kernel.org/r/20240219190345.2928627-1-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-02-26	x86/cpu/intel: Detect TME keyid bits before setting MTRR mask registers	Paolo Bonzini
	MKTME repurposes the high bit of physical address to key id for encryption key and, even though MAXPHYADDR in CPUID[0x80000008] remains the same, the valid bits in the MTRR mask register are based on the reduced number of physical address bits. detect_tme() in arch/x86/kernel/cpu/intel.c detects TME and subtracts it from the total usable physical bits, but it is called too late. Move the call to early_init_intel() so that it is called in setup_arch(), before MTRRs are setup. This fixes boot on TDX-enabled systems, which until now only worked with "disable_mtrr_cleanup". Without the patch, the values written to the MTRRs mask registers were 52-bit wide (e.g. 0x000fffff_80000800) and the writes failed; with the patch, the values are 46-bit wide, which matches the reduced MAXPHYADDR that is shown in /proc/cpuinfo. Reported-by: Zixi Chen <zixchen@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20240131230902.1867092-3-pbonzini%40redhat.com
2024-02-26	x86/cpu: Allow reducing x86_phys_bits during early_identify_cpu()	Paolo Bonzini
	In commit fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach"), the initialization of c->x86_phys_bits was moved after this_cpu->c_early_init(c). This is incorrect because early_init_amd() expected to be able to reduce the value according to the contents of CPUID leaf 0x8000001f. Fortunately, the bug was negated by init_amd()'s call to early_init_amd(), which does reduce x86_phys_bits in the end. However, this is very late in the boot process and, most notably, the wrong value is used for x86_phys_bits when setting up MTRRs. To fix this, call get_cpu_address_sizes() as soon as X86_FEATURE_CPUID is set/cleared, and c->extended_cpuid_level is retrieved. Fixes: fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc:stable@vger.kernel.org Link: https://lore.kernel.org/all/20240131230902.1867092-2-pbonzini%40redhat.com
2024-02-26	fbcon: always restore the old font data in fbcon_do_set_font()	Jiri Slaby (SUSE)
	Commit a5a923038d70 (fbdev: fbcon: Properly revert changes when vc_resize() failed) started restoring old font data upon failure (of vc_resize()). But it performs so only for user fonts. It means that the "system"/internal fonts are not restored at all. So in result, the very first call to fbcon_do_set_font() performs no restore at all upon failing vc_resize(). This can be reproduced by Syzkaller to crash the system on the next invocation of font_get(). It's rather hard to hit the allocation failure in vc_resize() on the first font_set(), but not impossible. Esp. if fault injection is used to aid the execution/failure. It was demonstrated by Sirius: BUG: unable to handle page fault for address: fffffffffffffff8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD cb7b067 P4D cb7b067 PUD cb7d067 PMD 0 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 8007 Comm: poc Not tainted 6.7.0-g9d1694dc91ce #20 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:fbcon_get_font+0x229/0x800 drivers/video/fbdev/core/fbcon.c:2286 Call Trace: <TASK> con_font_get drivers/tty/vt/vt.c:4558 [inline] con_font_op+0x1fc/0xf20 drivers/tty/vt/vt.c:4673 vt_k_ioctl drivers/tty/vt/vt_ioctl.c:474 [inline] vt_ioctl+0x632/0x2ec0 drivers/tty/vt/vt_ioctl.c:752 tty_ioctl+0x6f8/0x1570 drivers/tty/tty_io.c:2803 vfs_ioctl fs/ioctl.c:51 [inline] ... So restore the font data in any case, not only for user fonts. Note the later 'if' is now protected by 'old_userfont' and not 'old_data' as the latter is always set now. (And it is supposed to be non-NULL. Otherwise we would see the bug above again.) Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Fixes: a5a923038d70 ("fbdev: fbcon: Properly revert changes when vc_resize() failed") Reported-and-tested-by: Ubisectech Sirius <bugreport@ubisectech.com> Cc: Ubisectech Sirius <bugreport@ubisectech.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Helge Deller <deller@gmx.de> Cc: linux-fbdev@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20240208114411.14604-1-jirislaby@kernel.org
2024-02-26	ASoC: amd: yc: Add Lenovo ThinkBook 21J0 into DMI quirk table	Johnny Hsieh
	This patch adds Lenovo 21J0 (ThinkBook 16 G5+ ARP) to the DMI quirks table to enable internal microphone array. Cc: linux-sound@vger.kernel.org Signed-off-by: Johnny Hsieh <mnixry@outlook.com> Link: https://msgid.link/r/TYSPR04MB8429D62DFDB6727866ECF1DEC55A2@TYSPR04MB8429.apcprd04.prod.outlook.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-02-26	drm/ttm/tests: depend on UML \|\| COMPILE_TEST	Christian König
	At least the device test requires that no other driver using TTM is loaded. So make those unit tests depend on UML \|\| COMPILE_TEST to prevent people from trying them on bare metal. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/all/20240219230116.77b8ad68@yea/
2024-02-26	Merge drm/drm-fixes into drm-misc-fixes	Maxime Ripard
	Sima needs a more recent release to apply a patch. Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-02-26	irqchip: Add StarFive external interrupt controller	Changhuang Liang
	Add StarFive external interrupt controller for JH8100 SoC. Signed-off-by: Changhuang Liang <changhuang.liang@starfivetech.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Link: https://lore.kernel.org/r/20240226055025.1669223-3-changhuang.liang@starfivetech.com
2024-02-26	dt-bindings: interrupt-controller: Add starfive,jh8100-intc	Changhuang Liang
	StarFive SoCs like the JH8100 use a interrupt controller. Add a binding for it. Signed-off-by: Changhuang Liang <changhuang.liang@starfivetech.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240226055025.1669223-2-changhuang.liang@starfivetech.com
2024-02-26	drm/tegra: Remove existing framebuffer only if we support display	Thierry Reding
	Tegra DRM doesn't support display on Tegra234 and later, so make sure not to remove any existing framebuffers in that case. v2: - add comments explaining how this situation can come about - clear DRIVER_MODESET and DRIVER_ATOMIC feature bits Fixes: 6848c291a54f ("drm/aperture: Convert drivers to aperture interfaces") Signed-off-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Signed-off-by: Robert Foss <rfoss@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240223150333.1401582-1-thierry.reding@gmail.com
2024-02-26	RAS/AMD/ATL: Fix bit overflow in denorm_addr_df4_np2()	Yazen Ghannam
	The hash_pa8 and hashed_bit values in denorm_addr_df4_np2() are currently defined as u8 types. These variables represent single bits. 'hash_pa8' is set based on logical AND operations using masks with more than 8 bits. So the calculated value will not fit in this variable. It will always be '0'. The 'hash_pa8' check later in the function will fail which produces incorrect results for some cases. Change these variables to bool type. This clarifies that they are single bit values. Also, this allows the compiler to ensure they hold the proper results. Remove an unnecessary shift operation. [ bp: Remove the unnecessary brackets in the else-branch of the hash_pa8 assignment. ] Fixes: 3f3174996be6 ("RAS: Introduce AMD Address Translation Library") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240222165449.23582-1-yazen.ghannam@amd.com
2024-02-26	ARM: dts: omap4-panda-common: Enable powering off the device	Andreas Kemnade
	As the TWL6030 chip is the main power controller here, declare it as system-power-controller Signed-off-by: Andreas Kemnade <andreas@kemnade.info> Message-ID: <20240217082007.3238948-5-andreas@kemnade.info> Signed-off-by: Tony Lindgren <tony@atomide.com>
2024-02-26	ARM: dts: omap-embt2ws: system-power-controller for bt200	Andreas Kemnade
	Configure the TWL6032 as system power controller to let the device power off. Signed-off-by: Andreas Kemnade <andreas@kemnade.info> Message-ID: <20240217082007.3238948-4-andreas@kemnade.info> Signed-off-by: Tony Lindgren <tony@atomide.com>
2024-02-26	x86/boot/64: Use RIP_REL_REF() to access early_top_pgt[]	Ard Biesheuvel
	early_top_pgt[] is assigned from code that executes from a 1:1 mapping so it cannot use a plain access from C. Replace the use of fixup_pointer() with RIP_REL_REF(), which is better and simpler. For legibility and to align with the code that populates the lower page table levels, statically initialize the root level page table with an entry pointing to level3_kernel_pgt[], and overwrite it when needed to enable 5-level paging. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240221113506.2565718-24-ardb+git@google.com
2024-02-26	x86/boot/64: Use RIP_REL_REF() to access early page tables	Ard Biesheuvel
	The early statically allocated page tables are populated from code that executes from a 1:1 mapping so it cannot use plain accesses from C. Replace the use of fixup_pointer() with RIP_REL_REF(), which is better and simpler. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240221113506.2565718-23-ardb+git@google.com
2024-02-26	x86/boot/64: Use RIP_REL_REF() to access '__supported_pte_mask'	Ard Biesheuvel
	'__supported_pte_mask' is accessed from code that executes from a 1:1 mapping so it cannot use a plain access from C. Replace the use of fixup_pointer() with RIP_REL_REF(), which is better and simpler. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240221113506.2565718-22-ardb+git@google.com
2024-02-26	x86/boot/64: Use RIP_REL_REF() to access early_dynamic_pgts[]	Ard Biesheuvel
	early_dynamic_pgts[] and next_early_pgt are accessed from code that executes from a 1:1 mapping so it cannot use a plain access from C. Replace the use of fixup_pointer() with RIP_REL_REF(), which is better and simpler. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240221113506.2565718-21-ardb+git@google.com
2024-02-26	x86/boot/64: Use RIP_REL_REF() to assign 'phys_base'	Ard Biesheuvel
	'phys_base' is assigned from code that executes from a 1:1 mapping so it cannot use a plain access from C. Replace the use of fixup_pointer() with RIP_REL_REF(), which is better and simpler. While at it, move the assignment to before the addition of the SME mask so there is no need to subtract it again, and drop the unnecessary addition ('phys_base' is statically initialized to 0x0) Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240221113506.2565718-20-ardb+git@google.com
2024-02-26	x86/boot/64: Simplify global variable accesses in GDT/IDT programming	Ard Biesheuvel
	There are two code paths in the startup code to program an IDT: one that runs from the 1:1 mapping and one that runs from the virtual kernel mapping. Currently, these are strictly separate because fixup_pointer() is used on the 1:1 path, which will produce the wrong value when used while executing from the virtual kernel mapping. Switch to RIP_REL_REF() so that the two code paths can be merged. Also, move the GDT and IDT descriptors to the stack so that they can be referenced directly, rather than via RIP_REL_REF(). Rename startup_64_setup_env() to startup_64_setup_gdt_idt() while at it, to make the call from assembler self-documenting. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240221113506.2565718-19-ardb+git@google.com
2024-02-26	ARM: dts: omap: Switch over to https:// url	Nishanth Menon
	Move the pending urls back to https:// and mark the ones that are no longer accessible (http or https) as defunct. Signed-off-by: Nishanth Menon <nm@ti.com> Message-ID: <20240109195500.3833121-1-nm@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
2024-02-26	ARM: dts: ti: omap: add missing abb_{mpu,ivahd,dspeve,gpu} unit addresses ↵	Romain Naour
	for dra7 SoC abb_{mpu,ivahd,dspeve,gpu} have 'reg' so they must have unit address to fix dtc W=1 warnings: Warning (unit_address_vs_reg): /ocp/regulator-abb-mpu: node has a reg or ranges property, but no unit name Warning (unit_address_vs_reg): /ocp/regulator-abb-ivahd: node has a reg or ranges property, but no unit name Warning (unit_address_vs_reg): /ocp/regulator-abb-dspeve: node has a reg or ranges property, but no unit name Warning (unit_address_vs_reg): /ocp/regulator-abb-gpu: node has a reg or ranges property, but no unit name Signed-off-by: Romain Naour <romain.naour@skf.com> Message-ID: <20240123085551.733155-3-romain.naour@smile.fr> Signed-off-by: Tony Lindgren <tony@atomide.com>