linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-02-25	Merge tag 'erofs-for-6.8-rc6-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fix from Gao Xiang: - Fix page refcount leak when looking up specific inodes introduced by metabuf reworking * tag 'erofs-for-6.8-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: fix refcount on the metabuf used for inode lookup
2024-02-25	Merge tag 'pull-fixes.pathwalk-rcu-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull RCU pathwalk fixes from Al Viro: "We still have some races in filesystem methods when exposed to RCU pathwalk. This series is a result of code audit (the second round of it) and it should deal with most of that stuff. Still pending: ntfs3 ->d_hash()/->d_compare() and ceph_d_revalidate(). Up to maintainers (a note for NTFS folks - when documentation says that a method may not block, it does imply that blocking allocations are to be avoided. Really)" [ More explanations for people who aren't familiar with the vagaries of RCU path walking: most of it is hidden from filesystems, but if a filesystem actively participates in the low-level path walking it needs to make sure the fields involved in that walk are RCU-safe. That "actively participate in low-level path walking" includes things like having its own ->d_hash()/->d_compare() routines, or by having its own directory permission function that doesn't just use the common helpers. Having a ->d_revalidate() function will also have this issue. Note that instead of making everything RCU safe you can also choose to abort the RCU pathwalk if your operation cannot be done safely under RCU, but that obviously comes with a performance penalty. One common pattern is to allow the simple cases under RCU, and abort only if you need to do something more complicated. So not everything needs to be RCU-safe, and things like the inode etc that the VFS itself maintains obviously already are. But these fixes tend to be about properly RCU-delaying things like ->s_fs_info that are maintained by the filesystem and that got potentially released too early. - Linus ] * tag 'pull-fixes.pathwalk-rcu-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ext4_get_link(): fix breakage in RCU mode cifs_get_link(): bail out in unsafe case fuse: fix UAF in rcu pathwalks procfs: make freeing proc_fs_info rcu-delayed procfs: move dropping pde and pid from ->evict_inode() to ->free_inode() nfs: fix UAF on pathwalk running into umount nfs: make nfs_set_verifier() safe for use in RCU pathwalk afs: fix __afs_break_callback() / afs_drop_open_mmap() race hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper affs: free affs_sb_info with kfree_rcu() rcu pathwalk: prevent bogus hard errors from may_lookup() fs/super.c: don't drop ->s_user_ns until we free struct super_block itself
2024-02-25	Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	Linus Torvalds
	Pull vfs fixes from Al Viro: "A couple of fixes - revert of regression from this cycle and a fix for erofs failure exit breakage (had been there since way back)" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: erofs: fix handling kern_mount() failure Revert "get rid of DCACHE_GENOCIDE"
2024-02-25	efivarfs: Drop 'duplicates' bool parameter on efivar_init()	Ard Biesheuvel
	The 'duplicates' bool argument is always true when efivar_init() is called from its only caller so let's just drop it instead. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-02-25	efivarfs: Drop redundant cleanup on fill_super() failure	Ard Biesheuvel
	Al points out that kill_sb() will be called if efivarfs_fill_super() fails and so there is no point in cleaning up the efivar entry list. Reported-by: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-02-25	efivarfs: Request at most 512 bytes for variable names	Tim Schumacher
	Work around a quirk in a few old (2011-ish) UEFI implementations, where a call to `GetNextVariableName` with a buffer size larger than 512 bytes will always return EFI_INVALID_PARAMETER. There is some lore around EFI variable names being up to 1024 bytes in size, but this has no basis in the UEFI specification, and the upper bounds are typically platform specific, and apply to the entire variable (name plus payload). Given that Linux does not permit creating files with names longer than NAME_MAX (255) bytes, 512 bytes (== 256 UTF-16 characters) is a reasonable limit. Cc: <stable@vger.kernel.org> # 6.1+ Signed-off-by: Tim Schumacher <timschumi@gmx.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-02-25	ALSA: hda/realtek: Add special fixup for Lenovo 14IRP8	Willian Wang
	Lenovo Slim/Yoga Pro 9 14IRP8 requires a special fixup because there is a collision of its PCI SSID (17aa:3802) with Lenovo Yoga DuetITL 2021 codec SSID. Fixes: 3babae915f4c ("ALSA: hda/tas2781: Add tas2781 HDA driver") Link: https://bugzilla.kernel.org/show_bug.cgi?id=208555 Link: https://lore.kernel.org/all/d5b42e483566a3815d229270abd668131a0d9f3a.camel@irl.hu Cc: stable@vger.kernel.org Signed-off-by: Willian Wang <git@willian.wang> Reviewed-by: Gergo Koteles <soyer@irl.hu> Link: https://lore.kernel.org/r/170879111795.8.6687687359006700715.273812184@willian.wang Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-02-25	ext4_get_link(): fix breakage in RCU mode	Al Viro
	1) errors from ext4_getblk() should not be propagated to caller unless we are really sure that we would've gotten the same error in non-RCU pathwalk. 2) we leak buffer_heads if ext4_getblk() is successful, but bh is not uptodate. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25	cifs_get_link(): bail out in unsafe case	Al Viro
	->d_revalidate() bails out there, anyway. It's not enough to prevent getting into ->get_link() in RCU mode, but that could happen only in a very contrieved setup. Not worth trying to do anything fancy here unless ->d_revalidate() stops kicking out of RCU mode at least in some cases. Reviewed-by: Christian Brauner <brauner@kernel.org> Acked-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25	fuse: fix UAF in rcu pathwalks	Al Viro
	->permission(), ->get_link() and ->inode_get_acl() might dereference ->s_fs_info (and, in case of ->permission(), ->s_fs_info->fc->user_ns as well) when called from rcu pathwalk. Freeing ->s_fs_info->fc is rcu-delayed; we need to make freeing ->s_fs_info and dropping ->user_ns rcu-delayed too. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25	procfs: make freeing proc_fs_info rcu-delayed	Al Viro
	makes proc_pid_ns() safe from rcu pathwalk (put_pid_ns() is still synchronous, but that's not a problem - it does rcu-delay everything that needs to be) Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25	procfs: move dropping pde and pid from ->evict_inode() to ->free_inode()	Al Viro
	that keeps both around until struct inode is freed, making access to them safe from rcu-pathwalk Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25	nfs: fix UAF on pathwalk running into umount	Al Viro
	NFS ->d_revalidate(), ->permission() and ->get_link() need to access some parts of nfs_server when called in RCU mode: server->flags server->caps *(server->io_stats) and, worst of all, call server->nfs_client->rpc_ops->have_delegation (the last one - as NFS_PROTO(inode)->have_delegation()). We really don't want to RCU-delay the entire nfs_free_server() (it would have to be done with schedule_work() from RCU callback, since it can't be made to run from interrupt context), but actual freeing of nfs_server and ->io_stats can be done via call_rcu() just fine. nfs_client part is handled simply by making nfs_free_client() use kfree_rcu(). Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25	nfs: make nfs_set_verifier() safe for use in RCU pathwalk	Al Viro
	nfs_set_verifier() relies upon dentry being pinned; if that's the case, grabbing ->d_lock stabilizes ->d_parent and guarantees that ->d_parent points to a positive dentry. For something we'd run into in RCU mode that is not true - dentry might've been through dentry_kill() just as we grabbed ->d_lock, with its parent going through the same just as we get to into nfs_set_verifier_locked(). It might get to detaching inode (and zeroing ->d_inode) before nfs_set_verifier_locked() gets to fetching that; we get an oops as the result. That can happen in nfs{,4} ->d_revalidate(); the call chain in question is nfs_set_verifier_locked() <- nfs_set_verifier() <- nfs_lookup_revalidate_delegated() <- nfs{,4}_do_lookup_revalidate(). We have checked that the parent had been positive, but that's done before we get to nfs_set_verifier() and it's possible for memory pressure to pick our dentry as eviction candidate by that time. If that happens, back-to-back attempts to kill dentry and its parent are quite normal. Sure, in case of eviction we'll fail the ->d_seq check in the caller, but we need to survive until we return there... Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25	afs: fix __afs_break_callback() / afs_drop_open_mmap() race	Al Viro
	In __afs_break_callback() we might check ->cb_nr_mmap and if it's non-zero do queue_work(&vnode->cb_work). In afs_drop_open_mmap() we decrement ->cb_nr_mmap and do flush_work(&vnode->cb_work) if it reaches zero. The trouble is, there's nothing to prevent __afs_break_callback() from seeing ->cb_nr_mmap before the decrement and do queue_work() after both the decrement and flush_work(). If that happens, we might be in trouble - vnode might get freed before the queued work runs. __afs_break_callback() is always done under ->cb_lock, so let's make sure that ->cb_nr_mmap can change from non-zero to zero while holding ->cb_lock (the spinlock component of it - it's a seqlock and we don't need to mess with the counter). Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25	hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info	Al Viro
	->d_hash() and ->d_compare() use those, so we need to delay freeing them. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25	exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper	Al Viro
	That stuff can be accessed by ->d_hash()/->d_compare(); as it is, we have a hard-to-hit UAF if rcu pathwalk manages to get into ->d_hash() on a filesystem that is in process of getting shut down. Besides, having nls and upcase table cleanup moved from ->put_super() towards the place where sbi is freed makes for simpler failure exits. Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25	affs: free affs_sb_info with kfree_rcu()	Al Viro
	one of the flags in it is used by ->d_hash()/->d_compare() Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25	rcu pathwalk: prevent bogus hard errors from may_lookup()	Al Viro
	If lazy call of ->permission() returns a hard error, check that try_to_unlazy() succeeds before returning it. That both makes life easier for ->permission() instances and closes the race in ENOTDIR handling - it is possible that positive d_can_lookup() seen in link_path_walk() applies to the state after unlink() + mkdir(), while nd->inode matches the state prior to that. Normally seeing e.g. EACCES from permission check in rcu pathwalk means that with some timings non-rcu pathwalk would've run into the same; however, running into a non-executable regular file in the middle of a pathname would not get to permission check - it would fail with ENOTDIR instead. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25	fs/super.c: don't drop ->s_user_ns until we free struct super_block itself	Al Viro
	Avoids fun races in RCU pathwalk... Same goes for freeing LSM shite hanging off super_block's arse. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-24	bcachefs: Fix check_snapshot() memcpy	Kent Overstreet
	check_snapshot() copies the bch_snapshot to a temporary to easily handle older versions that don't have all the fields of the current version, but it lacked a min() to correctly handle keys newer and larger than the current version. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-24	bcachefs: Fix bch2_journal_flush_device_pins()	Kent Overstreet
	If a journal write errored, the list of devices it was written to could be empty - we're not supposed to mark an empty replicas list. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-24	bcachefs: fix iov_iter count underflow on sub-block dio read	Brian Foster
	bch2_direct_IO_read() checks the request offset and size for sector alignment and then falls through to a couple calculations to shrink the size of the request based on the inode size. The problem is that these checks round up to the fs block size, which runs the risk of underflowing iter->count if the block size happens to be large enough. This is triggered by fstest generic/361 with a 4k block size, which subsequently leads to a crash. To avoid this crash, check that the shorten length doesn't exceed the overall length of the iter. Fixes: Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Su Yue <glass.su@suse.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-24	bcachefs: Fix BTREE_ITER_FILTER_SNAPSHOTS on inodes btree	Kent Overstreet
	If we're in FILTER_SNAPSHOTS mode and we start scanning a range of the keyspace where no keys are visible in the current snapshot, we have a problem - we'll scan for a very long time before scanning terminates. Awhile back, this was fixed for most cases with peek_upto() (and assertions that enforce that it's being used). But the fix missed the fact that the inodes btree is different - every key offset is in a different snapshot tree, not just the inode field. Fixes: Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-24	bcachefs: Kill __GFP_NOFAIL in buffered read path	Kent Overstreet
	Recently, we fixed our __GFP_NOFAIL usage in the readahead path, but the easy one in read_single_folio() (where wa can return an error) was missed - oops. Fixes: Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-24	bcachefs: fix backpointer_to_text() when dev does not exist	Kent Overstreet
	Fixes: Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-24	Merge tag 'powerpc-6.8-4' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix a crash when hot adding a PCI device to an LPAR since recent changes - Fix nested KVM level-2 guest reboot failure due to empty 'arch_compat' Thanks to Amit Machhiwal, Aneesh Kumar K.V (IBM), Brian King, Gaurav Batra, and Vaibhav Jain. * tag 'powerpc-6.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: KVM: PPC: Book3S HV: Fix L2 guest reboot failure due to empty 'arch_compat' powerpc/pseries/iommu: DLPAR add doesn't completely initialize pci_controller
2024-02-24	Merge tag 'iommu-fixes-v6.8-rc5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fixes from Joerg Roedel: - Intel VT-d fixes for nested domain handling: - Cache invalidation for changes in a parent domain - Dirty tracking setting for parent and nested domains - Fix a constant-out-of-range warning - ARM SMMU fixes: - Fix CD allocation from atomic context when using SVA with SMMUv3 - Revert the conversion of SMMUv2 to domain_alloc_paging(), as it breaks the boot for Qualcomm MSM8996 devices - Restore SVA handle sharing in core code as it turned out there are still drivers relying on it * tag 'iommu-fixes-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/sva: Restore SVA handle sharing iommu/arm-smmu-v3: Do not use GFP_KERNEL under as spinlock iommu/vt-d: Fix constant-out-of-range warning iommu/vt-d: Set SSADE when attaching to a parent with dirty tracking iommu/vt-d: Add missing dirty tracking set for parent domain iommu/vt-d: Wrap the dirty tracking loop to be a helper iommu/vt-d: Remove domain parameter for intel_pasid_setup_dirty_tracking() iommu/vt-d: Add missing device iotlb flush for parent domain iommu/vt-d: Update iotlb in nested domain attach iommu/vt-d: Add missing iotlb flush for parent domain iommu/vt-d: Add __iommu_flush_iotlb_psi() iommu/vt-d: Track nested domains in parent Revert "iommu/arm-smmu: Convert to domain_alloc_paging()"
2024-02-24	Merge tag 'cxl-fixes-6.8-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fixes from Dan Williams: "A collection of significant fixes for the CXL subsystem. The largest change in this set, that bordered on "new development", is the fix for the fact that the location of the new qos_class attribute did not match the Documentation. The fix ends up deleting more code than it added, and it has a new unit test to backstop basic errors in this interface going forward. So the "red-diff" and unit test saved the "rip it out and try again" response. In contrast, the new notification path for firmware reported CXL errors (CXL CPER notifications) has a locking context bug that can not be fixed with a red-diff. Given where the release cycle stands, it is not comfortable to squeeze in that fix in these waning days. So, that receives the "back it out and try again later" treatment. There is a regression fix in the code that establishes memory NUMA nodes for platform CXL regions. That has an ack from x86 folks. There are a couple more fixups for Linux to understand (reassemble) CXL regions instantiated by platform firmware. The policy around platforms that do not match host-physical-address with system-physical-address (i.e. systems that have an address translation mechanism between the address range reported in the ACPI CEDT.CFMWS and endpoint decoders) has been softened to abort driver load rather than teardown the memory range (can cause system hangs). Lastly, there is a robustness / regression fix for cases where the driver would previously continue in the face of error, and a fixup for PCI error notification handling. Summary: - Fix NUMA initialization from ACPI CEDT.CFMWS - Fix region assembly failures due to async init order - Fix / simplify export of qos_class information - Fix cxl_acpi initialization vs single-window-init failures - Fix handling of repeated 'pci_channel_io_frozen' notifications - Workaround platforms that violate host-physical-address == system-physical address assumptions - Defer CXL CPER notification handling to v6.9" * tag 'cxl-fixes-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/acpi: Fix load failures due to single window creation failure acpi/ghes: Remove CXL CPER notifications cxl/pci: Fix disabling memory if DVSEC CXL Range does not match a CFMWS window cxl/test: Add support for qos_class checking cxl: Fix sysfs export of qos_class for memdev cxl: Remove unnecessary type cast in cxl_qos_class_verify() cxl: Change 'struct cxl_memdev_state' *_perf_list to single 'struct cxl_dpa_perf' cxl/region: Allow out of order assembly of autodiscovered regions cxl/region: Handle endpoint decoders in cxl_region_find_decoder() x86/numa: Fix the sort compare func used in numa_fill_memblks() x86/numa: Fix the address overlap check in numa_fill_memblks() cxl/pci: Skip to handle RAS errors if CXL.mem device is detached
2024-02-24	Merge tag 'for-6.8/dm-fix-3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fix from Mike Snitzer: - Fix DM integrity and verity targets to not use excessive stack when they recheck in the error path. * tag 'for-6.8/dm-fix-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm-integrity, dm-verity: reduce stack usage for recheck
2024-02-24	Merge tag 'scsi-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Six fixes: the four driver ones are pretty trivial. The larger two core changes are to try to fix various USB attached devices which have somewhat eccentric ways of handling the VPD and other mode pages which necessitate multiple revalidates (that were removed in the interests of efficiency) and updating the heuristic for supported VPD pages" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: jazz_esp: Only build if SCSI core is builtin scsi: smartpqi: Fix disable_managed_interrupts scsi: ufs: Uninitialized variable in ufshcd_devfreq_target() scsi: target: pscsi: Fix bio_put() for error case scsi: core: Consult supported VPD page list prior to fetching page scsi: sd: usb_storage: uas: Access media prior to querying device properties
2024-02-24	Merge tag 'i2c-for-6.8-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fix from Wolfram Sang: "A bugfix for host drivers" * tag 'i2c-for-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: imx: when being a target, mark the last read as processed
2024-02-24	Merge tag 'loongarch-fixes-6.8-3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: "Fix two cpu-hotplug issues, fix the init sequence about FDT system, fix the coding style of dts, and fix the wrong CPUCFG ID handling of KVM" * tag 'loongarch-fixes-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: KVM: Streamline kvm_check_cpucfg() and improve comments LoongArch: KVM: Rename _kvm_get_cpucfg() to _kvm_get_cpucfg_mask() LoongArch: KVM: Fix input validation of _kvm_get_cpucfg() & kvm_check_cpucfg() LoongArch: dts: Minor whitespace cleanup LoongArch: Call early_init_fdt_scan_reserved_mem() earlier LoongArch: Update cpu_sibling_map when disabling nonboot CPUs LoongArch: Disable IRQ before init_fn() for nonboot CPUs
2024-02-24	dm-integrity, dm-verity: reduce stack usage for recheck	Arnd Bergmann
	The newly added integrity_recheck() function has another larger stack allocation, just like its caller integrity_metadata(). When it gets inlined, the combination of the two exceeds the warning limit for 32-bit architectures and possibly risks an overflow when this is called from a deep call chain through a file system: drivers/md/dm-integrity.c:1767:13: error: stack frame size (1048) exceeds limit (1024) in 'integrity_metadata' [-Werror,-Wframe-larger-than] 1767 \| static void integrity_metadata(struct work_struct *w) Since the caller at this point is done using its checksum buffer, just reuse the same buffer in the new function to avoid the double allocation. [Mikulas: add "noinline" to integrity_recheck and verity_recheck. These functions are only called on error, so they shouldn't bloat the stack frame or code size of the caller.] Fixes: c88f5e553fe3 ("dm-integrity: recheck the integrity tag after a failure") Fixes: 9177f3c0dea6 ("dm-verity: recheck the hash after a failure") Cc: stable@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-02-24	cpufreq: intel_pstate: fix pstate limits enforcement for adjust_perf call back	Doug Smythies
	There is a loophole in pstate limit clamping for the intel_cpufreq CPU frequency scaling driver (intel_pstate in passive mode), schedutil CPU frequency scaling governor, HWP (HardWare Pstate) control enabled, when the adjust_perf call back path is used. Fix it. Fixes: a365ab6b9dfb cpufreq: intel_pstate: Implement the ->adjust_perf() callback Signed-off-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-23	mm/debug_vm_pgtable: fix BUG_ON with pud advanced test	Aneesh Kumar K.V (IBM)
	Architectures like powerpc add debug checks to ensure we find only devmap PUD pte entries. These debug checks are only done with CONFIG_DEBUG_VM. This patch marks the ptes used for PUD advanced test devmap pte entries so that we don't hit on debug checks on architecture like ppc64 as below. WARNING: CPU: 2 PID: 1 at arch/powerpc/mm/book3s64/radix_pgtable.c:1382 radix__pud_hugepage_update+0x38/0x138 .... NIP [c0000000000a7004] radix__pud_hugepage_update+0x38/0x138 LR [c0000000000a77a8] radix__pudp_huge_get_and_clear+0x28/0x60 Call Trace: [c000000004a2f950] [c000000004a2f9a0] 0xc000000004a2f9a0 (unreliable) [c000000004a2f980] [000d34c100000000] 0xd34c100000000 [c000000004a2f9a0] [c00000000206ba98] pud_advanced_tests+0x118/0x334 [c000000004a2fa40] [c00000000206db34] debug_vm_pgtable+0xcbc/0x1c48 [c000000004a2fc10] [c00000000000fd28] do_one_initcall+0x60/0x388 Also kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:202! .... NIP [c000000000096510] pudp_huge_get_and_clear_full+0x98/0x174 LR [c00000000206bb34] pud_advanced_tests+0x1b4/0x334 Call Trace: [c000000004a2f950] [000d34c100000000] 0xd34c100000000 (unreliable) [c000000004a2f9a0] [c00000000206bb34] pud_advanced_tests+0x1b4/0x334 [c000000004a2fa40] [c00000000206db34] debug_vm_pgtable+0xcbc/0x1c48 [c000000004a2fc10] [c00000000000fd28] do_one_initcall+0x60/0x388 Link: https://lkml.kernel.org/r/20240129060022.68044-1-aneesh.kumar@kernel.org Fixes: 27af67f35631 ("powerpc/book3s64/mm: enable transparent pud hugepage") Signed-off-by: Aneesh Kumar K.V (IBM) <aneesh.kumar@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23	mm: cachestat: fix folio read-after-free in cache walk	Nhat Pham
	In cachestat, we access the folio from the page cache's xarray to compute its page offset, and check for its dirty and writeback flags. However, we do not hold a reference to the folio before performing these actions, which means the folio can concurrently be released and reused as another folio/page/slab. Get around this altogether by just using xarray's existing machinery for the folio page offsets and dirty/writeback states. This changes behavior for tmpfs files to now always report zeroes in their dirty and writeback counters. This is okay as tmpfs doesn't follow conventional writeback cache behavior: its pages get "cleaned" during swapout, after which they're no longer resident etc. Link: https://lkml.kernel.org/r/20240220153409.GA216065@cmpxchg.org Fixes: cf264e1329fb ("cachestat: implement cachestat syscall") Reported-by: Jann Horn <jannh@google.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Tested-by: Jann Horn <jannh@google.com> Cc: <stable@vger.kernel.org> [6.4+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23	MAINTAINERS: add memory mapping entry with reviewers	Lorenzo Stoakes
	Recently there have been a number of patches which have affected various aspects of the memory mapping logic as implemented in mm/mmap.c where it would have been useful for regular contributors to have been notified. Add an entry for this part of mm in particular with regular contributors tagged as reviewers. Link: https://lkml.kernel.org/r/20240220064410.4639-1-lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23	mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index	Byungchul Park
	With numa balancing on, when a numa system is running where a numa node doesn't have its local memory so it has no managed zones, the following oops has been observed. It's because wakeup_kswapd() is called with a wrong zone index, -1. Fixed it by checking the index before calling wakeup_kswapd(). > BUG: unable to handle page fault for address: 00000000000033f3 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 0 P4D 0 > Oops: 0000 [#1] PREEMPT SMP NOPTI > CPU: 2 PID: 895 Comm: masim Not tainted 6.6.0-dirty #255 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 > RIP: 0010:wakeup_kswapd (./linux/mm/vmscan.c:7812) > Code: (omitted) > RSP: 0000:ffffc90004257d58 EFLAGS: 00010286 > RAX: ffffffffffffffff RBX: ffff88883fff0480 RCX: 0000000000000003 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88883fff0480 > RBP: ffffffffffffffff R08: ff0003ffffffffff R09: ffffffffffffffff > R10: ffff888106c95540 R11: 0000000055555554 R12: 0000000000000003 > R13: 0000000000000000 R14: 0000000000000000 R15: ffff88883fff0940 > FS: 00007fc4b8124740(0000) GS:ffff888827c00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00000000000033f3 CR3: 000000026cc08004 CR4: 0000000000770ee0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > PKRU: 55555554 > Call Trace: > <TASK> > ? __die > ? page_fault_oops > ? __pte_offset_map_lock > ? exc_page_fault > ? asm_exc_page_fault > ? wakeup_kswapd > migrate_misplaced_page > __handle_mm_fault > handle_mm_fault > do_user_addr_fault > exc_page_fault > asm_exc_page_fault > RIP: 0033:0x55b897ba0808 > Code: (omitted) > RSP: 002b:00007ffeefa821a0 EFLAGS: 00010287 > RAX: 000055b89983acd0 RBX: 00007ffeefa823f8 RCX: 000055b89983acd0 > RDX: 00007fc2f8122010 RSI: 0000000000020000 RDI: 000055b89983acd0 > RBP: 00007ffeefa821a0 R08: 0000000000000037 R09: 0000000000000075 > R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 > R13: 00007ffeefa82410 R14: 000055b897ba5dd8 R15: 00007fc4b8340000 > </TASK> Link: https://lkml.kernel.org/r/20240216111502.79759-1-byungchul@sk.com Signed-off-by: Byungchul Park <byungchul@sk.com> Reported-by: Hyeongtak Ji <hyeongtak.ji@sk.com> Fixes: c574bbe917036 ("NUMA balancing: optimize page placement for memory tiering system") Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23	kasan: revert eviction of stack traces in generic mode	Marco Elver
	This partially reverts commits cc478e0b6bdf, 63b85ac56a64, 08d7c94d9635, a414d4286f34, and 773688a6cb24 to make use of variable-sized stack depot records, since eviction of stack entries from stack depot forces fixed- sized stack records. Care was taken to retain the code cleanups by the above commits. Eviction was added to generic KASAN as a response to alleviating the additional memory usage from fixed-sized stack records, but this still uses more memory than previously. With the re-introduction of variable-sized records for stack depot, we can just switch back to non-evictable stack records again, and return back to the previous performance and memory usage baseline. Before (observed after a KASAN kernel boot): pools: 597 refcounted_allocations: 17547 refcounted_frees: 6477 refcounted_in_use: 11070 freelist_size: 3497 persistent_count: 12163 persistent_bytes: 1717008 After: pools: 319 refcounted_allocations: 0 refcounted_frees: 0 refcounted_in_use: 0 freelist_size: 0 persistent_count: 29397 persistent_bytes: 5183536 As can be seen from the counters, with a generic KASAN config, refcounted allocations and evictions are no longer used. Due to using variable-sized records, I observe a reduction of 278 stack depot pools (saving 4448 KiB) with my test setup. Link: https://lkml.kernel.org/r/20240129100708.39460-2-elver@google.com Fixes: cc478e0b6bdf ("kasan: avoid resetting aux_lock") Fixes: 63b85ac56a64 ("kasan: stop leaking stack trace handles") Fixes: 08d7c94d9635 ("kasan: memset free track in qlink_free") Fixes: a414d4286f34 ("kasan: handle concurrent kasan_record_aux_stack calls") Fixes: 773688a6cb24 ("kasan: use stack_depot_put for Generic mode") Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-23	stackdepot: use variable size records for non-evictable entries	Marco Elver
	With the introduction of stack depot evictions, each stack record is now fixed size, so that future reuse after an eviction can safely store differently sized stack traces. In all cases that do not make use of evictions, this wastes lots of space. Fix it by re-introducing variable size stack records (up to the max allowed size) for entries that will never be evicted. We know if an entry will never be evicted if the flag STACK_DEPOT_FLAG_GET is not provided, since a later stack_depot_put() attempt is undefined behavior. With my current kernel config that enables KASAN and also SLUB owner tracking, I observe (after a kernel boot) a whopping reduction of 296 stack depot pools, which translates into 4736 KiB saved. The savings here are from SLUB owner tracking only, because KASAN generic mode still uses refcounting. Before: pools: 893 allocations: 29841 frees: 6524 in_use: 23317 freelist_size: 3454 After: pools: 597 refcounted_allocations: 17547 refcounted_frees: 6477 refcounted_in_use: 11070 freelist_size: 3497 persistent_count: 12163 persistent_bytes: 1717008 [elver@google.com: fix -Wstringop-overflow warning] Link: https://lore.kernel.org/all/20240201135747.18eca98e@canb.auug.org.au/ Link: https://lkml.kernel.org/r/20240201090434.1762340-1-elver@google.com Link: https://lore.kernel.org/all/CABXGCsOzpRPZGg23QqJAzKnqkZPKzvieeg=W7sgjgi3q0pBo0g@mail.gmail.com/ Link: https://lkml.kernel.org/r/20240129100708.39460-1-elver@google.com Link: https://lore.kernel.org/all/CABXGCsOzpRPZGg23QqJAzKnqkZPKzvieeg=W7sgjgi3q0pBo0g@mail.gmail.com/ Fixes: 108be8def46e ("lib/stackdepot: allow users to evict stack traces") Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-24	crypto: arm64/neonbs - fix out-of-bounds access on short input	Ard Biesheuvel
	The bit-sliced implementation of AES-CTR operates on blocks of 128 bytes, and will fall back to the plain NEON version for tail blocks or inputs that are shorter than 128 bytes to begin with. It will call straight into the plain NEON asm helper, which performs all memory accesses in granules of 16 bytes (the size of a NEON register). For this reason, the associated plain NEON glue code will copy inputs shorter than 16 bytes into a temporary buffer, given that this is a rare occurrence and it is not worth the effort to work around this in the asm code. The fallback from the bit-sliced NEON version fails to take this into account, potentially resulting in out-of-bounds accesses. So clone the same workaround, and use a temp buffer for short in/outputs. Fixes: fc074e130051 ("crypto: arm64/aes-neonbs-ctr - fallback to plain NEON for final chunk") Cc: <stable@vger.kernel.org> Reported-by: syzbot+f1ceaa1a09ab891e1934@syzkaller.appspotmail.com Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-24	crypto: lskcipher - Copy IV in lskcipher glue code always	Herbert Xu
	The lskcipher glue code for skcipher needs to copy the IV every time rather than only on the first and last request. Otherwise those algorithms that use IV to perform chaining may break, e.g., CBC. This is because crypto_skcipher_import/export do not include the IV as part of the saved state. Reported-by: syzbot+b90b904ef6bdfdafec1d@syzkaller.appspotmail.com Fixes: 662ea18d089b ("crypto: skcipher - Make use of internal state") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-02-23	tun: Fix xdp_rxq_info's queue_index when detaching	Yunjian Wang
	When a queue(tfile) is detached, we only update tfile's queue_index, but do not update xdp_rxq_info's queue_index. This patch fixes it. Fixes: 8bf5c4ee1889 ("tun: setup xdp_rxq_info") Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Link: https://lore.kernel.org/r/1708398727-46308-1-git-send-email-wangyunjian@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-23	i2c: imx: when being a target, mark the last read as processed	Corey Minyard
	When being a target, NAK from the controller means that all bytes have been transferred. So, the last byte needs also to be marked as 'processed'. Otherwise index registers of backends may not increase. Fixes: f7414cd6923f ("i2c: imx: support slave mode for imx I2C driver") Signed-off-by: Corey Minyard <minyard@acm.org> Tested-by: Andrew Manley <andrew.manley@sealingtech.com> Reviewed-by: Andrew Manley <andrew.manley@sealingtech.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> [wsa: fixed comment and commit message to properly describe the case] Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-02-23	apparmor: fix lsm_get_self_attr()	Mickaël Salaün
	In apparmor_getselfattr() when an invalid AppArmor attribute is requested, or a value hasn't been explicitly set for the requested attribute, the label passed to aa_put_label() is not properly initialized which can cause problems when the pointer value is non-NULL and AppArmor attempts to drop a reference on the bogus label object. Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: John Johansen <john.johansen@canonical.com> Fixes: 223981db9baf ("AppArmor: Add selfattr hooks") Signed-off-by: Mickaël Salaün <mic@digikod.net> Reviewed-by: Paul Moore <paul@paul-moore.com> [PM: description changes as discussed with MS] Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-02-23	selinux: fix lsm_get_self_attr()	Mickaël Salaün
	selinux_getselfattr() doesn't properly initialize the string pointer it passes to selinux_lsm_getattr() which can cause a problem when an attribute hasn't been explicitly set; selinux_lsm_getattr() returns 0/success, but does not set or initialize the string label/attribute. Failure to properly initialize the string causes problems later in selinux_getselfattr() when the function attempts to kfree() the string. Cc: Casey Schaufler <casey@schaufler-ca.com> Fixes: 762c934317e6 ("SELinux: Add selfattr hooks") Suggested-by: Paul Moore <paul@paul-moore.com> [PM: description changes as discussed in the thread] Signed-off-by: Mickaël Salaün <mic@digikod.net> Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-02-23	Merge tag 'parisc-for-6.8-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc architecture fixes from Helge Deller: "Fixes CPU hotplug, the parisc stack unwinder and two possible build errors in kprobes and ftrace area: - Fix CPU hotplug - Fix unaligned accesses and faults in stack unwinder - Fix potential build errors by always including asm-generic/kprobes.h - Fix build bug by add missing CONFIG_DYNAMIC_FTRACE check" * tag 'parisc-for-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Fix stack unwinder parisc/kprobes: always include asm-generic/kprobes.h parisc/ftrace: add missing CONFIG_DYNAMIC_FTRACE check Revert "parisc: Only list existing CPUs in cpu_possible_mask"
2024-02-23	Merge tag 'arm-fixes-6.8-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull arm and RISC-V SoC fixes from Arnd Bergmann: "The Rockchip and IMX8 platforms get a number of fixes for dts files in order to address some misconfigurations, including a regression for USB-C support on some boards. The other dts fixes are part of a series by Rob Herring to clean up another class of dtc compiler warnings across all platforms, with a few others helping out as well. With this, we can enable the warning for the coming merge window without introducing regressions. Conor Dooley has collected fixes for RISC-V platforms, both for the dts files and for platofrm specific drivers. The ep93xx platform gets a regression for for its gpio descriptors" * tag 'arm-fixes-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (28 commits) ARM: dts: renesas: rcar-gen2: Add missing #interrupt-cells to DA9063 nodes cache: ax45mp_cache: Align end size to cache boundary in ax45mp_dma_cache_wback() arm64: dts: qcom: Fix interrupt-map cell sizes arm: dts: Fix dtc interrupt_map warnings arm64: dts: Fix dtc interrupt_provider warnings arm: dts: Fix dtc interrupt_provider warnings arm64: dts: freescale: Disable interrupt_map check ARM: ep93xx: Add terminator to gpiod_lookup_table riscv: dts: sifive: add missing #interrupt-cells to pmic arm64: dts: rockchip: Correct Indiedroid Nova GPIO Names arm64: dts: rockchip: Drop interrupts property from rk3328 pwm-rockchip node arm64: dts: rockchip: set num-cs property for spi on px30 arm64: dts: rockchip: minor rk3588 whitespace cleanup riscv: dts: starfive: replace underscores in node names bus: imx-weim: fix valid range check Revert "arm64: dts: imx8mn-var-som-symphony: Describe the USB-C connector" Revert "arm64: dts: imx8mp-dhcom-pdk3: Describe the USB-C connector" arm64: dts: tqma8mpql: fix audio codec iov-supply arm64: dts: rockchip: drop unneeded status from rk3588-jaguar gpio-leds ARM: dts: rockchip: Drop interrupts property from pwm-rockchip nodes ...
2024-02-23	Merge tag 'arm64-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "A simple fix to a definition in the CXL PMU driver, a couple of patches to restore SME control registers on the resume path (since Arm's fast model now clears them) and a revert for our jump label asm constraints after Geert noticed they broke the build with GCC 5.5. There was then the ensuing discussion about raising the minimum GCC (and corresponding binutils) versions at [1], but for now we'll keep things working as they were until that goes ahead. - Revert fix to jump label asm constraints, as it regresses the build with some GCC 5.5 toolchains. - Restore SME control registers when resuming from suspend - Fix incorrect filter definition in CXL PMU driver" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64/sme: Restore SMCR_EL1.EZT0 on exit from suspend arm64/sme: Restore SME registers on exit from suspend Revert "arm64: jump_label: use constraints "Si" instead of "i"" perf: CXL: fix CPMU filter value mask length