linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2024-07-03	USB: core: Fix duplicate endpoint bug by clearing reserved bits in the ↵	Alan Stern
	descriptor Syzbot has identified a bug in usbcore (see the Closes: tag below) caused by our assumption that the reserved bits in an endpoint descriptor's bEndpointAddress field will always be 0. As a result of the bug, the endpoint_is_duplicate() routine in config.c (and possibly other routines as well) may believe that two descriptors are for distinct endpoints, even though they have the same direction and endpoint number. This can lead to confusion, including the bug identified by syzbot (two descriptors with matching endpoint numbers and directions, where one was interrupt and the other was bulk). To fix the bug, we will clear the reserved bits in bEndpointAddress when we parse the descriptor. (Note that both the USB-2.0 and USB-3.1 specs say these bits are "Reserved, reset to zero".) This requires us to make a copy of the descriptor earlier in usb_parse_endpoint() and use the copy instead of the original when checking for duplicates. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-and-tested-by: syzbot+8693a0bb9c10b554272a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-usb/0000000000003d868e061bc0f554@google.com/ Fixes: 0a8fd1346254 ("USB: fix problems with duplicate endpoint addresses") CC: Oliver Neukum <oneukum@suse.com> CC: stable@vger.kernel.org Link: https://lore.kernel.org/r/205a5edc-7fef-4159-b64a-80374b6b101a@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-03	xhci: always resume roothubs if xHC was reset during resume	Mathias Nyman
	Usb device connect may not be detected after runtime resume if xHC is reset during resume. In runtime resume cases xhci_resume() will only resume roothubs if there are pending port events. If the xHC host is reset during runtime resume due to a Save/Restore Error (SRE) then these pending port events won't be detected as PORTSC change bits are not immediately set by host after reset. Unconditionally resume roothubs if xHC is reset during resume to ensure device connections are detected. Also return early with error code if starting xHC fails after reset. Issue was debugged and a similar solution suggested by Remi Pommarel. Using this instead as it simplifies future refactoring. Reported-by: Remi Pommarel <repk@triplefau.lt> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218987 Suggested-by: Remi Pommarel <repk@triplefau.lt> Tested-by: Remi Pommarel <repk@triplefau.lt> Cc: stable@vger.kernel.org Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240627145523.1453155-2-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-03	serial: imx: only set receiver level if it is zero	Stefan Eichenberger
	With commit a81dbd0463ec ("serial: imx: set receiver level before starting uart") we set the receiver level to its default value. This caused a regression when using SDMA, where the receiver level is 9 instead of 8 (default). This change will first check if the receiver level is zero and only then set it to the default. This still avoids the interrupt storm when the receiver level is zero. Fixes: a81dbd0463ec ("serial: imx: set receiver level before starting uart") Cc: stable <stable@kernel.org> Signed-off-by: Stefan Eichenberger <stefan.eichenberger@toradex.com> Link: https://lore.kernel.org/r/20240703112543.148304-1-eichest@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-03	wifi: wilc1000: fix ies_len type in connect path	Jozef Hopko
	Commit 205c50306acf ("wifi: wilc1000: fix RCU usage in connect path") made sure that the IEs data was manipulated under the relevant RCU section. Unfortunately, while doing so, the commit brought a faulty implicit cast from int to u8 on the ies_len variable, making the parsing fail to be performed correctly if the IEs block is larger than 255 bytes. This failure can be observed with Access Points appending a lot of IEs TLVs in their beacon frames (reproduced with a Pixel phone acting as an Access Point, which brough 273 bytes of IE data in my testing environment). Fix IEs parsing by removing this undesired implicit cast. Fixes: 205c50306acf ("wifi: wilc1000: fix RCU usage in connect path") Signed-off-by: Jozef Hopko <jozef.hopko@altana.com> Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com> Acked-by: Ajay Singh <ajay.kathat@microchip.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://patch.msgid.link/20240701-wilc_fix_ies_data-v1-1-7486cbacf98a@bootlin.com
2024-07-03	Merge tag 'usb-serial-6.10-rc6' of ↵	Greg Kroah-Hartman
	ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus Johan writes: USB-serial device ids for 6.10-rc6 Here are some new modem device ids. All have been in linux-next with no reported issues. * tag 'usb-serial-6.10-rc6' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial: USB: serial: option: add Telit generic core-dump composition USB: serial: option: add Fibocom FM350-GL USB: serial: option: add Telit FN912 rmnet compositions
2024-07-03	gpio: mmio: do not calculate bgpio_bits via "ngpios"	Shiji Yang
	bgpio_bits must be aligned with the data bus width. For example, on a 32 bit big endian system and we only have 16 GPIOs. If we only assume bgpio_bits=16 we can never control the GPIO because the base address is the lowest address. low address high address ------------------------------------------------- \| byte3 \| byte2 \| byte1 \| byte0 \| ------------------------------------------------- \| NaN \| NaN \| gpio8-15 \| gpio0-7 \| ------------------------------------------------- Fixes: 55b2395e4e92 ("gpio: mmio: handle "ngpios" properly in bgpio_init()") Fixes: https://github.com/openwrt/openwrt/issues/15739 Reported-by: Mark Mentovai <mark@mentovai.com> Signed-off-by: Shiji Yang <yangshiji66@outlook.com> Suggested-By: Mark Mentovai <mark@mentovai.com> Reviewed-by: Jonas Gorski <jonas.gorski@gmail.com> Tested-by: Lóránd Horváth <lorand.horvath82@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/TYCP286MB089577B47D70F0AB25ABA6F5BCD52@TYCP286MB0895.JPNP286.PROD.OUTLOOK.COM Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-07-03	perf: arm_pmuv3: Include asm/arm_pmuv3.h from linux/perf/arm_pmuv3.h	Rob Herring (Arm)
	The arm64 asm/arm_pmuv3.h depends on defines from linux/perf/arm_pmuv3.h. Rather than depend on include order, follow the usual pattern of "linux" headers including "asm" headers of the same name. With this change, the include of linux/kvm_host.h is problematic due to circular includes: In file included from ../arch/arm64/include/asm/arm_pmuv3.h:9, from ../include/linux/perf/arm_pmuv3.h:312, from ../include/kvm/arm_pmu.h:11, from ../arch/arm64/include/asm/kvm_host.h:38, from ../arch/arm64/mm/init.c:41: ../include/linux/kvm_host.h:383:30: error: field 'arch' has incomplete type Switching to asm/kvm_host.h solves the issue. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Link: https://lore.kernel.org/r/20240626-arm-pmu-3-9-icntr-v2-5-c9784b4f4065@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03	perf: arm_v6/7_pmu: Drop non-DT probe support	Rob Herring (Arm)
	There are no non-DT based PMU users for v6 or v7, so drop the custom non-DT probe table. Unfortunately XScale still needs non-DT probing. Note that this drops support for arm1156 PMU, but there are no arm1156 based systems supported in the kernel. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Link: https://lore.kernel.org/r/20240626-arm-pmu-3-9-icntr-v2-4-c9784b4f4065@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03	perf/arm: Move 32-bit PMU drivers to drivers/perf/	Rob Herring (Arm)
	It is preferred to put drivers under drivers/ rather than under arch/. The PMU drivers also depend on arm_pmu.c, so it's better to place them all together. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Link: https://lore.kernel.org/r/20240626-arm-pmu-3-9-icntr-v2-3-c9784b4f4065@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03	perf: arm_pmuv3: Drop unnecessary IS_ENABLED(CONFIG_ARM64) check	Rob Herring (Arm)
	The IS_ENABLED(CONFIG_ARM64) check for threshold support is unnecessary. The purpose is to not enable thresholds on arm32, but if threshold is non-zero, the check against threshold_max() just above here will have errored out because threshold_max() is always 0 on arm32. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Acked-by: Mark rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20240626-arm-pmu-3-9-icntr-v2-2-c9784b4f4065@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03	perf: arm_pmuv3: Avoid assigning fixed cycle counter with threshold	Rob Herring (Arm)
	If the user has requested a counting threshold for the CPU cycles event, then the fixed cycle counter can't be assigned as it lacks threshold support. Currently, the thresholds will work or not randomly depending on which counter the event is assigned. While using thresholds for CPU cycles doesn't make much sense, it can be useful for testing purposes. Fixes: 816c26754447 ("arm64: perf: Add support for event counting threshold") Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20240626-arm-pmu-3-9-icntr-v2-1-c9784b4f4065@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03	x86/bhi: Avoid warning in #DB handler due to BHI mitigation	Alexandre Chartre
	When BHI mitigation is enabled, if SYSENTER is invoked with the TF flag set then entry_SYSENTER_compat() uses CLEAR_BRANCH_HISTORY and calls the clear_bhb_loop() before the TF flag is cleared. This causes the #DB handler (exc_debug_kernel()) to issue a warning because single-step is used outside the entry_SYSENTER_compat() function. To address this issue, entry_SYSENTER_compat() should use CLEAR_BRANCH_HISTORY after making sure the TF flag is cleared. The problem can be reproduced with the following sequence: $ cat sysenter_step.c int main() { asm("pushf; pop %ax; bts $8,%ax; push %ax; popf; sysenter"); } $ gcc -o sysenter_step sysenter_step.c $ ./sysenter_step Segmentation fault (core dumped) The program is expected to crash, and the #DB handler will issue a warning. Kernel log: WARNING: CPU: 27 PID: 7000 at arch/x86/kernel/traps.c:1009 exc_debug_kernel+0xd2/0x160 ... RIP: 0010:exc_debug_kernel+0xd2/0x160 ... Call Trace: <#DB> ? show_regs+0x68/0x80 ? __warn+0x8c/0x140 ? exc_debug_kernel+0xd2/0x160 ? report_bug+0x175/0x1a0 ? handle_bug+0x44/0x90 ? exc_invalid_op+0x1c/0x70 ? asm_exc_invalid_op+0x1f/0x30 ? exc_debug_kernel+0xd2/0x160 exc_debug+0x43/0x50 asm_exc_debug+0x1e/0x40 RIP: 0010:clear_bhb_loop+0x0/0xb0 ... </#DB> <TASK> ? entry_SYSENTER_compat_after_hwframe+0x6e/0x8d </TASK> [ bp: Massage commit message. ] Fixes: 7390db8aea0d ("x86/bhi: Add support for clearing branch history at syscall entry") Reported-by: Suman Maity <suman.m.maity@oracle.com> Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/20240524070459.3674025-1-alexandre.chartre@oracle.com
2024-07-03	s390/vfio_ccw: Fix target addresses of TIC CCWs	Eric Farman
	The processing of a Transfer-In-Channel (TIC) CCW requires locating the target of the CCW in the channel program, and updating the address to reflect what will actually be sent to hardware. An error exists where the 64-bit virtual address is truncated to 32-bits (variable "cda") when performing this math. Since s390 addresses of that size are 31-bits, this leaves that additional bit enabled such that the resulting I/O triggers a channel program check. This shows up occasionally when booting a KVM guest from a passthrough DASD device: ..snip... Interrupt Response Block Data: : 0x0000000000003990 Function Ctrl : [Start] Activity Ctrl : Status Ctrl : [Alert] [Primary] [Secondary] [Status-Pending] Device Status : Channel Status : [Program-Check] cpa=: 0x00000000008d0018 prev_ccw=: 0x0000000000000000 this_ccw=: 0x0000000000000000 ...snip... dasd-ipl: Failed to run IPL1 channel program The channel program address of "0x008d0018" in the IRB doesn't look wrong, but tracing the CCWs shows the offending bit enabled: ccw=0x0000012e808d0000 cda=00a0b030 ccw=0x0000012e808d0008 cda=00a0b038 ccw=0x0000012e808d0010 cda=808d0008 ccw=0x0000012e808d0018 cda=00a0b040 Fix the calculation of the TIC CCW's data address such that it points to a valid 31-bit address regardless of the input address. Fixes: bd36cfbbb9e1 ("s390/vfio_ccw_cp: use new address translation helpers") Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20240628163738.3643513-1-farman@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-07-03	power: sequencing: simplify returning pointer without cleanup	Krzysztof Kozlowski
	Use 'return_ptr' helper for returning a pointer without cleanup for shorter code. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240703083038.95777-1-krzysztof.kozlowski@linaro.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-07-03	fat: Convert to new uid/gid option parsing helpers	Eric Sandeen
	Convert to new uid/gid option parsing helpers Signed-off-by: Eric Sandeen <sandeen@redhat.com> Link: https://lore.kernel.org/r/1a67d2a8-0aae-42a2-9c0f-21cd4cd87d13@redhat.com Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03	fat: Convert to new mount api	Eric Sandeen
	vfat and msdos share a common set of options, with additional, unique options for each filesystem. Each filesystem calls common fc initialization and parsing routines, with an "is_vfat" parameter. For parsing, if the option is not found in the common parameter_spec, parsing is retried with the fs-specific parameter_spec. This patch leaves nls loading to fill_super, so the codepage and charset options are not validated as they are requested. This matches current behavior. It would be possible to test-load as each option is parsed, but that would make i.e. mount -o "iocharset=nope,iocharset=iso8859-1" fail, where it does not fail today because only the last iocharset option is considered. The obsolete "conv=" option is set up with an enum of acceptable values; currently invalid "conv=" options are rejected as such, even though the option is obsolete, so this patch preserves that behavior. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Link: https://lore.kernel.org/r/a9411b02-5f8e-4e1e-90aa-0c032d66c312@redhat.com Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03	fat: move debug into fat_mount_options	Eric Sandeen
	Move the debug variable into fat_mount_options for consistency and to facilitate conversion to new mount API. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Link: https://lore.kernel.org/r/f6155247-32ee-4cfe-b808-9102b17f7cd1@redhat.com Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03	cachefiles: add missing lock protection when polling	Jingbo Xu
	Add missing lock protection in poll routine when iterating xarray, otherwise: Even with RCU read lock held, only the slot of the radix tree is ensured to be pinned there, while the data structure (e.g. struct cachefiles_req) stored in the slot has no such guarantee. The poll routine will iterate the radix tree and dereference cachefiles_req accordingly. Thus RCU read lock is not adequate in this case and spinlock is needed here. Fixes: b817e22b2e91 ("cachefiles: narrow the scope of triggering EPOLLIN events in ondemand mode") Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240628062930.2467993-10-libaokun@huaweicloud.com Acked-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03	cachefiles: cyclic allocation of msg_id to avoid reuse	Baokun Li
	Reusing the msg_id after a maliciously completed reopen request may cause a read request to remain unprocessed and result in a hung, as shown below: t1 \| t2 \| t3 ------------------------------------------------- cachefiles_ondemand_select_req cachefiles_ondemand_object_is_close(A) cachefiles_ondemand_set_object_reopening(A) queue_work(fscache_object_wq, &info->work) ondemand_object_worker cachefiles_ondemand_init_object(A) cachefiles_ondemand_send_req(OPEN) // get msg_id 6 wait_for_completion(&req_A->done) cachefiles_ondemand_daemon_read // read msg_id 6 req_A cachefiles_ondemand_get_fd copy_to_user // Malicious completion msg_id 6 copen 6,-1 cachefiles_ondemand_copen complete(&req_A->done) // will not set the object to close // because ondemand_id && fd is valid. // ondemand_object_worker() is done // but the object is still reopening. // new open req_B cachefiles_ondemand_init_object(B) cachefiles_ondemand_send_req(OPEN) // reuse msg_id 6 process_open_req copen 6,A.size // The expected failed copen was executed successfully Expect copen to fail, and when it does, it closes fd, which sets the object to close, and then close triggers reopen again. However, due to msg_id reuse resulting in a successful copen, the anonymous fd is not closed until the daemon exits. Therefore read requests waiting for reopen to complete may trigger hung task. To avoid this issue, allocate the msg_id cyclically to avoid reusing the msg_id for a very short duration of time. Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240628062930.2467993-9-libaokun@huaweicloud.com Acked-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03	cachefiles: wait for ondemand_object_worker to finish when dropping object	Hou Tao
	When queuing ondemand_object_worker() to re-open the object, cachefiles_object is not pinned. The cachefiles_object may be freed when the pending read request is completed intentionally and the related erofs is umounted. If ondemand_object_worker() runs after the object is freed, it will incur use-after-free problem as shown below. process A processs B process C process D cachefiles_ondemand_send_req() // send a read req X // wait for its completion // close ondemand fd cachefiles_ondemand_fd_release() // set object as CLOSE cachefiles_ondemand_daemon_read() // set object as REOPENING queue_work(fscache_wq, &info->ondemand_work) // close /dev/cachefiles cachefiles_daemon_release cachefiles_flush_reqs complete(&req->done) // read req X is completed // umount the erofs fs cachefiles_put_object() // object will be freed cachefiles_ondemand_deinit_obj_info() kmem_cache_free(object) // both info and object are freed ondemand_object_worker() When dropping an object, it is no longer necessary to reopen the object, so use cancel_work_sync() to cancel or wait for ondemand_object_worker() to finish. Fixes: 0a7e54c1959c ("cachefiles: resend an open request if the read request's object is closed") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240628062930.2467993-8-libaokun@huaweicloud.com Acked-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03	cachefiles: cancel all requests for the object that is being dropped	Baokun Li
	Because after an object is dropped, requests for that object are useless, cancel them to avoid causing other problems. This prepares for the later addition of cancel_work_sync(). After the reopen requests is generated, cancel it to avoid cancel_work_sync() blocking by waiting for daemon to complete the reopen requests. Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240628062930.2467993-7-libaokun@huaweicloud.com Acked-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03	cachefiles: stop sending new request when dropping object	Baokun Li
	Added CACHEFILES_ONDEMAND_OBJSTATE_DROPPING indicates that the cachefiles object is being dropped, and is set after the close request for the dropped object completes, and no new requests are allowed to be sent after this state. This prepares for the later addition of cancel_work_sync(). It prevents leftover reopen requests from being sent, to avoid processing unnecessary requests and to avoid cancel_work_sync() blocking by waiting for daemon to complete the reopen requests. Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240628062930.2467993-6-libaokun@huaweicloud.com Acked-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03	cachefiles: propagate errors from vfs_getxattr() to avoid infinite loop	Baokun Li
	In cachefiles_check_volume_xattr(), the error returned by vfs_getxattr() is not passed to ret, so it ends up returning -ESTALE, which leads to an endless loop as follows: cachefiles_acquire_volume retry: ret = cachefiles_check_volume_xattr ret = -ESTALE xlen = vfs_getxattr // return -EIO // The ret is not updated when xlen < 0, so -ESTALE is returned. return ret // Supposed to jump out of the loop at this judgement. if (ret != -ESTALE) goto error_dir; cachefiles_bury_object // EIO causes rename failure goto retry; Hence propagate the error returned by vfs_getxattr() to avoid the above issue. Do the same in cachefiles_check_auxdata(). Fixes: 32e150037dce ("fscache, cachefiles: Store the volume coherency data") Fixes: 72b957856b0c ("cachefiles: Implement metadata/coherency data storage in xattrs") Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240628062930.2467993-5-libaokun@huaweicloud.com Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03	cachefiles: fix slab-use-after-free in cachefiles_withdraw_cookie()	Baokun Li
	We got the following issue in our fault injection stress test: ================================================================== BUG: KASAN: slab-use-after-free in cachefiles_withdraw_cookie+0x4d9/0x600 Read of size 8 at addr ffff888118efc000 by task kworker/u78:0/109 CPU: 13 PID: 109 Comm: kworker/u78:0 Not tainted 6.8.0-dirty #566 Call Trace: <TASK> kasan_report+0x93/0xc0 cachefiles_withdraw_cookie+0x4d9/0x600 fscache_cookie_state_machine+0x5c8/0x1230 fscache_cookie_worker+0x91/0x1c0 process_one_work+0x7fa/0x1800 [...] Allocated by task 117: kmalloc_trace+0x1b3/0x3c0 cachefiles_acquire_volume+0xf3/0x9c0 fscache_create_volume_work+0x97/0x150 process_one_work+0x7fa/0x1800 [...] Freed by task 120301: kfree+0xf1/0x2c0 cachefiles_withdraw_cache+0x3fa/0x920 cachefiles_put_unbind_pincount+0x1f6/0x250 cachefiles_daemon_release+0x13b/0x290 __fput+0x204/0xa00 task_work_run+0x139/0x230 do_exit+0x87a/0x29b0 [...] ================================================================== Following is the process that triggers the issue: p1 \| p2 ------------------------------------------------------------ fscache_begin_lookup fscache_begin_volume_access fscache_cache_is_live(fscache_cache) cachefiles_daemon_release cachefiles_put_unbind_pincount cachefiles_daemon_unbind cachefiles_withdraw_cache fscache_withdraw_cache fscache_set_cache_state(cache, FSCACHE_CACHE_IS_WITHDRAWN); cachefiles_withdraw_objects(cache) fscache_wait_for_objects(fscache) atomic_read(&fscache_cache->object_count) == 0 fscache_perform_lookup cachefiles_lookup_cookie cachefiles_alloc_object refcount_set(&object->ref, 1); object->volume = volume fscache_count_object(vcookie->cache); atomic_inc(&fscache_cache->object_count) cachefiles_withdraw_volumes cachefiles_withdraw_volume fscache_withdraw_volume __cachefiles_free_volume kfree(cachefiles_volume) fscache_cookie_state_machine cachefiles_withdraw_cookie cache = object->volume->cache; // cachefiles_volume UAF !!! After setting FSCACHE_CACHE_IS_WITHDRAWN, wait for all the cookie lookups to complete first, and then wait for fscache_cache->object_count == 0 to avoid the cookie exiting after the volume has been freed and triggering the above issue. Therefore call fscache_withdraw_volume() before calling cachefiles_withdraw_objects(). This way, after setting FSCACHE_CACHE_IS_WITHDRAWN, only the following two cases will occur: 1) fscache_begin_lookup fails in fscache_begin_volume_access(). 2) fscache_withdraw_volume() will ensure that fscache_count_object() has been executed before calling fscache_wait_for_objects(). Fixes: fe2140e2f57f ("cachefiles: Implement volume support") Suggested-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240628062930.2467993-4-libaokun@huaweicloud.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03	cachefiles: fix slab-use-after-free in fscache_withdraw_volume()	Baokun Li
	We got the following issue in our fault injection stress test: ================================================================== BUG: KASAN: slab-use-after-free in fscache_withdraw_volume+0x2e1/0x370 Read of size 4 at addr ffff88810680be08 by task ondemand-04-dae/5798 CPU: 0 PID: 5798 Comm: ondemand-04-dae Not tainted 6.8.0-dirty #565 Call Trace: kasan_check_range+0xf6/0x1b0 fscache_withdraw_volume+0x2e1/0x370 cachefiles_withdraw_volume+0x31/0x50 cachefiles_withdraw_cache+0x3ad/0x900 cachefiles_put_unbind_pincount+0x1f6/0x250 cachefiles_daemon_release+0x13b/0x290 __fput+0x204/0xa00 task_work_run+0x139/0x230 Allocated by task 5820: __kmalloc+0x1df/0x4b0 fscache_alloc_volume+0x70/0x600 __fscache_acquire_volume+0x1c/0x610 erofs_fscache_register_volume+0x96/0x1a0 erofs_fscache_register_fs+0x49a/0x690 erofs_fc_fill_super+0x6c0/0xcc0 vfs_get_super+0xa9/0x140 vfs_get_tree+0x8e/0x300 do_new_mount+0x28c/0x580 [...] Freed by task 5820: kfree+0xf1/0x2c0 fscache_put_volume.part.0+0x5cb/0x9e0 erofs_fscache_unregister_fs+0x157/0x1b0 erofs_kill_sb+0xd9/0x1c0 deactivate_locked_super+0xa3/0x100 vfs_get_super+0x105/0x140 vfs_get_tree+0x8e/0x300 do_new_mount+0x28c/0x580 [...] ================================================================== Following is the process that triggers the issue: mount failed \| daemon exit ------------------------------------------------------------ deactivate_locked_super cachefiles_daemon_release erofs_kill_sb erofs_fscache_unregister_fs fscache_relinquish_volume __fscache_relinquish_volume fscache_put_volume(fscache_volume, fscache_volume_put_relinquish) zero = __refcount_dec_and_test(&fscache_volume->ref, &ref); cachefiles_put_unbind_pincount cachefiles_daemon_unbind cachefiles_withdraw_cache cachefiles_withdraw_volumes list_del_init(&volume->cache_link) fscache_free_volume(fscache_volume) cache->ops->free_volume cachefiles_free_volume list_del_init(&cachefiles_volume->cache_link); kfree(fscache_volume) cachefiles_withdraw_volume fscache_withdraw_volume fscache_volume->n_accesses // fscache_volume UAF !!! The fscache_volume in cache->volumes must not have been freed yet, but its reference count may be 0. So use the new fscache_try_get_volume() helper function try to get its reference count. If the reference count of fscache_volume is 0, fscache_put_volume() is freeing it, so wait for it to be removed from cache->volumes. If its reference count is not 0, call cachefiles_withdraw_volume() with reference count protection to avoid the above issue. Fixes: fe2140e2f57f ("cachefiles: Implement volume support") Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240628062930.2467993-3-libaokun@huaweicloud.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03	netfs, fscache: export fscache_put_volume() and add fscache_try_get_volume()	Baokun Li
	Export fscache_put_volume() and add fscache_try_get_volume() helper function to allow cachefiles to get/put fscache_volume via linux/fscache-cache.h. Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240628062930.2467993-2-libaokun@huaweicloud.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03	fs: fix dentry size	Christian Brauner
	On CONFIG_SMP=y and on 32bit we need to decrease DNAME_INLINE_LEN to 36 btyes to end up with 128 bytes in total. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Links: https://lore.kernel.org/r/CAHk-=whtoqTSCcAvV-X-KPqoDWxS4vxmWpuKLB+Vv8=FtUd5vA@mail.gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03	vfs: move d_lockref out of the area used by RCU lookup	Mateusz Guzik
	Stock kernel scales worse than FreeBSD when doing a 20-way stat(2) on the same tmpfs-backed file. According to perf top: 38.09% lockref_put_return 26.08% lockref_get_not_dead 25.60% __d_lookup_rcu 0.89% clear_bhb_loop __d_lookup_rcu is participating in cacheline ping pong due to the embedded name sharing a cacheline with lockref. Moving it out resolves the problem: 41.50% lockref_put_return 41.03% lockref_get_not_dead 1.54% clear_bhb_loop benchmark (will-it-scale, Sapphire Rapids, tmpfs, ops/s): FreeBSD:7219334 before: 5038006 after: 7842883 (+55%) One minor remark: the 'after' result is unstable, fluctuating in the range ~7.8 mln to ~9 mln during different runs. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20240613001215.648829-3-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03	net: stmmac: enable HW-accelerated VLAN stripping for gmac4 only	Furong Xu
	Commit 750011e239a5 ("net: stmmac: Add support for HW-accelerated VLAN stripping") enables MAC level VLAN tag stripping for all MAC cores, but leaves set_hw_vlan_mode() and rx_hw_vlan() un-implemented for both gmac and xgmac. On gmac and xgmac, ethtool reports rx-vlan-offload is on, both MAC and driver do nothing about VLAN packets actually, although VLAN works well. Driver level stripping should be used on gmac and xgmac for now. Fixes: 750011e239a5 ("net: stmmac: Add support for HW-accelerated VLAN stripping") Signed-off-by: Furong Xu <0x1207@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-03	drm/fbdev-generic: Fix framebuffer on big endian devices	Thomas Huth
	Starting with kernel 6.7, the framebuffer text console is not working anymore with the virtio-gpu device on s390x hosts. Such big endian fb devices are usinga different pixel ordering than little endian devices, e.g. DRM_FORMAT_BGRX8888 instead of DRM_FORMAT_XRGB8888. This used to work fine as long as drm_client_buffer_addfb() was still calling drm_mode_addfb() which called drm_driver_legacy_fb_format() internally to get the right format. But drm_client_buffer_addfb() has recently been reworked to call drm_mode_addfb2() instead with the format value that has been passed to it as a parameter (see commit 6ae2ff23aa43 ("drm/client: Convert drm_client_buffer_addfb() to drm_mode_addfb2()"). That format parameter is determined in drm_fbdev_generic_helper_fb_probe() via the drm_mode_legacy_fb_format() function - which only generates formats suitable for little endian devices. So to fix this issue switch to drm_driver_legacy_fb_format() here instead to take the device endianness into consideration. Fixes: 6ae2ff23aa43 ("drm/client: Convert drm_client_buffer_addfb() to drm_mode_addfb2()") Closes: https://issues.redhat.com/browse/RHEL-45158 Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240627173530.460615-1-thuth@redhat.com
2024-07-03	drm/panthor: Fix sync-only jobs	Boris Brezillon
	A sync-only job is meant to provide a synchronization point on a queue, so we can't return a NULL fence there, we have to add a signal operation to the command stream which executes after all other previously submitted jobs are done. v2: - Fixed a UAF bug - Added R-bs Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block") Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240703071640.231278-3-boris.brezillon@collabora.com
2024-07-03	drm/panthor: Don't check the array stride on empty uobj arrays	Boris Brezillon
	The user is likely to leave all the drm_panthor_obj_array fields to zero when the array is empty, which will cause an EINVAL failure. v2: - Added R-bs Fixes: 4bdca1150792 ("drm/panthor: Add the driver frontend block") Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240703071640.231278-2-boris.brezillon@collabora.com
2024-07-03	power: supply: cros_charge-control: Avoid accessing attributes out of bounds	Nathan Chancellor
	Clang warns (or errors with CONFIG_WERROR=y): drivers/power/supply/cros_charge-control.c:319:2: error: array index 3 is past the end of the array (that has type 'struct attribute [3]') [-Werror,-Warray-bounds] 319 \| priv->attributes[_CROS_CHCTL_ATTR_COUNT] = NULL; \| ^ ~~~~~~~~~~~~~~~~~~~~~~ drivers/power/supply/cros_charge-control.c:49:2: note: array 'attributes' declared here 49 \| struct attribute attributes[_CROS_CHCTL_ATTR_COUNT]; \| ^ 1 error generated. In earlier revisions of the driver, the attributes array in cros_chctl_priv had four elements with four distinct assignments but during review, the number of elements was changed to three through use of an enum and the assignments became a for loop, except for this one, which is now out of bounds. This assignment is no longer necessary because the size of the attributes array no longer accounts for it, so just remove it to clear up the warning. Fixes: c6ed48ef5259 ("power: supply: add ChromeOS EC based charge control driver") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20240702-cros_charge-control-fix-clang-array-bounds-warning-v1-1-ae04d995cd1d@kernel.org Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
2024-07-02	cifs: Fix read-performance regression by dropping readahead expansion	David Howells
	cifs_expand_read() is causing a performance regression of around 30% by causing extra pagecache to be allocated for an inode in the readahead path before we begin actually dispatching RPC requests, thereby delaying the actual I/O. The expansion is sized according to the rsize parameter, which seems to be 4MiB on my test system; this is a big step up from the first requests made by the fio test program. Simple repro (look at read bandwidth number): fio --name=writetest --filename=/xfstest.test/foo --time_based --runtime=60 --size=16M --numjobs=1 --rw=read Fix this by removing cifs_expand_readahead(). Readahead expansion is mostly useful for when we're using the local cache if the local cache has a block size greater than PAGE_SIZE, so we can dispense with it when not caching. Fixes: 69c3c023af25 ("cifs: Implement netfslib hooks") Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: Matthew Wilcox <willy@infradead.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Signed-off-by: Steve French <stfrench@microsoft.com>
2024-07-02	net: ntb_netdev: Move ntb_netdev_rx_handler() to call netif_rx() from ↵	Dave Jiang
	__netif_rx() The following is emitted when using idxd (DSA) dmanegine as the data mover for ntb_transport that ntb_netdev uses. [74412.546922] BUG: using smp_processor_id() in preemptible [00000000] code: irq/52-idxd-por/14526 [74412.556784] caller is netif_rx_internal+0x42/0x130 [74412.562282] CPU: 6 PID: 14526 Comm: irq/52-idxd-por Not tainted 6.9.5 #5 [74412.569870] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.E9I.1752.P05.2402080856 02/08/2024 [74412.581699] Call Trace: [74412.584514] <TASK> [74412.586933] dump_stack_lvl+0x55/0x70 [74412.591129] check_preemption_disabled+0xc8/0xf0 [74412.596374] netif_rx_internal+0x42/0x130 [74412.600957] __netif_rx+0x20/0xd0 [74412.604743] ntb_netdev_rx_handler+0x66/0x150 [ntb_netdev] [74412.610985] ntb_complete_rxc+0xed/0x140 [ntb_transport] [74412.617010] ntb_rx_copy_callback+0x53/0x80 [ntb_transport] [74412.623332] idxd_dma_complete_txd+0xe3/0x160 [idxd] [74412.628963] idxd_wq_thread+0x1a6/0x2b0 [idxd] [74412.634046] irq_thread_fn+0x21/0x60 [74412.638134] ? irq_thread+0xa8/0x290 [74412.642218] irq_thread+0x1a0/0x290 [74412.646212] ? __pfx_irq_thread_fn+0x10/0x10 [74412.651071] ? __pfx_irq_thread_dtor+0x10/0x10 [74412.656117] ? __pfx_irq_thread+0x10/0x10 [74412.660686] kthread+0x100/0x130 [74412.664384] ? __pfx_kthread+0x10/0x10 [74412.668639] ret_from_fork+0x31/0x50 [74412.672716] ? __pfx_kthread+0x10/0x10 [74412.676978] ret_from_fork_asm+0x1a/0x30 [74412.681457] </TASK> The cause is due to the idxd driver interrupt completion handler uses threaded interrupt and the threaded handler is not hard or soft interrupt context. However __netif_rx() can only be called from interrupt context. Change the call to netif_rx() in order to allow completion via normal context for dmaengine drivers that utilize threaded irq handling. While the following commit changed from netif_rx() to __netif_rx(), baebdf48c360 ("net: dev: Makes sure netif_rx() can be invoked in any context."), the change should've been a noop instead. However, the code precedes this fix should've been using netif_rx_ni() or netif_rx_any_context(). Fixes: 548c237c0a99 ("net: Add support for NTB virtual ethernet device") Reported-by: Jerry Dai <jerry.dai@intel.com> Tested-by: Jerry Dai <jerry.dai@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://patch.msgid.link/20240701181538.3799546-1-dave.jiang@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-02	net: phy: aquantia: add missing include guards	Bartosz Golaszewski
	The header is missing the include guards so add them. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Fixes: fb470f70fea7 ("net: phy: aquantia: add hwmon support") Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://patch.msgid.link/20240701080322.9569-1-brgl@bgdev.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-02	drm/amdgpu/atomfirmware: silence UBSAN warning	Alex Deucher
	This is a variable sized array. Link: https://lists.freedesktop.org/archives/amd-gfx/2024-June/110420.html Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2024-07-02	Merge tag 'linux_kselftest-fixes-6.10-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: "One single patch to fix the non-contiguous CBM resctrl: - AMD supports non-contiguous CBM but does not report it via CPUID. This test should not use CPUID on AMD to detect non-contiguous CBM support. Fix the problem so the test uses CPUID to discover non-contiguous CBM support only on Intel" * tag 'linux_kselftest-fixes-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/resctrl: Fix non-contiguous CBM for AMD
2024-07-02	Merge tag 'vfs-6.10-rc7.fixes.2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "VFS: - Improve handling of deep ancestor chains in is_subdir() - Release locks cleanly when fctnl_setlk() races with close(). When setting a file lock fails the VFS tries to cleanup the already created lock. The helper used for this calls back into the LSM layer which may cause it to fail, leaving the stale lock accessible via /proc/locks. AFS: - Fix a comma/semicolon typo" * tag 'vfs-6.10-rc7.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: afs: Convert comma to semicolon fs: better handle deep ancestor chains in is_subdir() filelock: Remove locks reliably when fcntl/close race is detected
2024-07-02	afs: Convert comma to semicolon	Chen Ni
	Replace a comma between expression statements by a semicolon. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Link: https://lore.kernel.org/r/20240702024055.1411407-1-nichen@iscas.ac.cn/ Link: https://lore.kernel.org/r/20240702024055.1411407-1-nichen@iscas.ac.cn Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-02	fs: better handle deep ancestor chains in is_subdir()	Christian Brauner
	Jan reported that 'cd ..' may take a long time in deep directory hierarchies under a bind-mount. If concurrent renames happen it is possible to livelock in is_subdir() because it will keep retrying. Change is_subdir() from simply retrying over and over to retry once and then acquire the rename lock to handle deep ancestor chains better. The list of alternatives to this approach were less then pleasant. Change the scope of rcu lock to cover the whole walk while at it. A big thanks to Jan and Linus. Both Jan and Linus had proposed effectively the same thing just that one version ended up being slightly more elegant. Reported-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-02	Merge tag 'qcom-clk-fixes-for-6.10' of ↵	Stephen Boyd
	https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into clk-fixes Pull Qualcomm clk driver fixes from Bjorn Andersson: - Correct the Stromer Plus PLL set_rate to explicitly set ALPHA_EN bit and remove unnecessary upper parts of CONFIG_CTL values. - Mark the recently added IPQ9574 GCC crypto clocks BRANCH_HALT_VOTED, to address stuck clock warnings. - Fix the GPLL6 and GPLL7 parents on SM6350 to avoid issues with these reportedly running at ~25GHz. * tag 'qcom-clk-fixes-for-6.10' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: clk: qcom: gcc-ipq9574: Add BRANCH_HALT_VOTED flag clk: qcom: apss-ipq-pll: remove 'config_ctl_hi_val' from Stromer pll configs clk: qcom: clk-alpha-pll: set ALPHA_EN bit for Stromer Plus PLLs clk: qcom: gcc-sm6350: Fix gpll6* & gpll7 parents
2024-07-02	Merge tag 'erofs-for-6.10-rc7-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: "The most important one fixes possible infinite loops reported by a smartphone vendor OPPO recently due to some unexpected zero-sized compressed pcluster out of interrupted I/Os, storage failures, etc. Another patch fixes global buffer memory leak on unloading, and the remaining one switches to use super_set_uuid() to keep with the other filesystems. Summary: - Fix possible global buffer memory leak when unloading EROFS module - Fix FS_IOC_GETFSUUID ioctl by using super_set_uuid() - Reset m_llen to 0 so then it can retry if metadata is invalid" * tag 'erofs-for-6.10-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: ensure m_llen is reset to 0 if metadata is invalid erofs: convert to use super_set_uuid to support for FS_IOC_GETFSUUID erofs: fix possible memory leak in z_erofs_gbuf_exit()
2024-07-02	filelock: Remove locks reliably when fcntl/close race is detected	Jann Horn
	When fcntl_setlk() races with close(), it removes the created lock with do_lock_file_wait(). However, LSMs can allow the first do_lock_file_wait() that created the lock while denying the second do_lock_file_wait() that tries to remove the lock. In theory (but AFAIK not in practice), posix_lock_file() could also fail to remove a lock due to GFP_KERNEL allocation failure (when splitting a range in the middle). After the bug has been triggered, use-after-free reads will occur in lock_get_status() when userspace reads /proc/locks. This can likely be used to read arbitrary kernel memory, but can't corrupt kernel memory. This only affects systems with SELinux / Smack / AppArmor / BPF-LSM in enforcing mode and only works from some security contexts. Fix it by calling locks_remove_posix() instead, which is designed to reliably get rid of POSIX locks associated with the given file and files_struct and is also used by filp_flush(). Fixes: c293621bbf67 ("[PATCH] stale POSIX lock handling") Cc: stable@kernel.org Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2563 Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20240702-fs-lock-recover-2-v1-1-edd456f63789@google.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-02	ACPI: processor_idle: Fix invalid comparison with insertion sort for latency	Kuan-Wei Chiu
	The acpi_cst_latency_cmp() comparison function currently used for sorting C-state latencies does not satisfy transitivity, causing incorrect sorting results. Specifically, if there are two valid acpi_processor_cx elements A and B and one invalid element C, it may occur that A < B, A = C, and B = C. Sorting algorithms assume that if A < B and A = C, then C < B, leading to incorrect ordering. Given the small size of the array (<=8), we replace the library sort function with a simple insertion sort that properly ignores invalid elements and sorts valid ones based on latency. This change ensures correct ordering of the C-state latencies. Fixes: 65ea8f2c6e23 ("ACPI: processor idle: Fix up C-state latency if not ordered") Reported-by: Julian Sikorski <belegdol@gmail.com> Closes: https://lore.kernel.org/lkml/70674dc7-5586-4183-8953-8095567e73df@gmail.com Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Tested-by: Julian Sikorski <belegdol@gmail.com> Cc: All applicable <stable@vger.kernel.org> Link: https://patch.msgid.link/20240701205639.117194-1-visitorckw@gmail.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-02	workqueue: Simplify goto statement	Lai Jiangshan
	Use a simple if-statement to replace the cumbersome goto-statement in workqueue_set_unbound_cpumask(). Cc: Waiman Long <longman@redhat.com> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-02	btrfs: fix uninitialized return value in the ref-verify tool	Filipe Manana
	In the ref-verify tool, when processing the inline references of an extent item, we may end up returning with uninitialized return value, because: 1) The 'ret' variable is not initialized if there are no inline extent references ('ptr' == 'end' before the while loop starts); 2) If we find an extent owner inline reference we don't initialize 'ret'. So fix these cases by initializing 'ret' to 0 when declaring the variable and set it to -EINVAL if we find an extent owner inline references and simple quotas are not enabled (as well as print an error message). Reported-by: Mirsad Todorovac <mtodorovac69@gmail.com> Link: https://lore.kernel.org/linux-btrfs/59b40ebe-c824-457d-8b24-0bbca69d472b@gmail.com/ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-02	btrfs: always do the basic checks for btrfs_qgroup_inherit structure	Qu Wenruo
	[BUG] Syzbot reports the following regression detected by KASAN: BUG: KASAN: slab-out-of-bounds in btrfs_qgroup_inherit+0x42e/0x2e20 fs/btrfs/qgroup.c:3277 Read of size 8 at addr ffff88814628ca50 by task syz-executor318/5171 CPU: 0 PID: 5171 Comm: syz-executor318 Not tainted 6.10.0-rc2-syzkaller-00010-g2ab795141095 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 btrfs_qgroup_inherit+0x42e/0x2e20 fs/btrfs/qgroup.c:3277 create_pending_snapshot+0x1359/0x29b0 fs/btrfs/transaction.c:1854 create_pending_snapshots+0x195/0x1d0 fs/btrfs/transaction.c:1922 btrfs_commit_transaction+0xf20/0x3740 fs/btrfs/transaction.c:2382 create_snapshot+0x6a1/0x9e0 fs/btrfs/ioctl.c:875 btrfs_mksubvol+0x58f/0x710 fs/btrfs/ioctl.c:1029 btrfs_mksnapshot+0xb5/0xf0 fs/btrfs/ioctl.c:1075 __btrfs_ioctl_snap_create+0x387/0x4b0 fs/btrfs/ioctl.c:1340 btrfs_ioctl_snap_create_v2+0x1f2/0x3a0 fs/btrfs/ioctl.c:1422 btrfs_ioctl+0x99e/0xc60 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fcbf1992509 RSP: 002b:00007fcbf1928218 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007fcbf1a1f618 RCX: 00007fcbf1992509 RDX: 0000000020000280 RSI: 0000000050009417 RDI: 0000000000000003 RBP: 00007fcbf1a1f610 R08: 00007ffea1298e97 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007fcbf19eb660 R13: 00000000200002b8 R14: 00007fcbf19e60c0 R15: 0030656c69662f2e </TASK> And it also pinned it down to commit b5357cb268c4 ("btrfs: qgroup: do not check qgroup inherit if qgroup is disabled"). [CAUSE] That offending commit skips the whole qgroup inherit check if qgroup is not enabled. But that also skips the very basic checks like num_ref_copies/num_excl_copies and the structure size checks. Meaning if a qgroup enable/disable race is happening at the background, and we pass a btrfs_qgroup_inherit structure when the qgroup is disabled, the check would be completely skipped. Then at the time of transaction commitment, qgroup is re-enabled and btrfs_qgroup_inherit() is going to use the incorrect structure and causing the above KASAN error. [FIX] Make btrfs_qgroup_check_inherit() only skip the source qgroup checks. So that even if invalid btrfs_qgroup_inherit structure is passed in, we can still reject invalid ones no matter if qgroup is enabled or not. Furthermore we do already have an extra safety inside btrfs_qgroup_inherit(), which would just ignore invalid qgroup sources, so even if we only skip the qgroup source check we're still safe. Reported-by: syzbot+a0d1f7e26910be4dc171@syzkaller.appspotmail.com Fixes: b5357cb268c4 ("btrfs: qgroup: do not check qgroup inherit if qgroup is disabled") Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-02	workqueue: Update cpumasks after only applying it successfully	Lai Jiangshan
	Make workqueue_unbound_exclude_cpumask() and workqueue_set_unbound_cpumask() only update wq_isolated_cpumask and wq_requested_unbound_cpumask when workqueue_apply_unbound_cpumask() returns successfully. Fixes: fe28f631fa94("workqueue: Add workqueue_unbound_exclude_cpumask() to exclude CPUs from wq_unbound_cpumask") Cc: Waiman Long <longman@redhat.com> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-02	btrfs: zoned: fix calc_available_free_space() for zoned mode	Naohiro Aota
	calc_available_free_space() returns the total size of metadata (or system) block groups, which can be allocated from unallocated disk space. The logic is wrong on zoned mode in two places. First, the calculation of data_chunk_size is wrong. We always allocate one zone as one chunk, and no partial allocation of a zone. So, we should use zone_size (= data_sinfo->chunk_size) as it is. Second, the result "avail" may not be zone aligned. Since we always allocate one zone as one chunk on zoned mode, returning non-zone size aligned bytes will result in less pressure on the async metadata reclaim process. This is serious for the nearly full state with a large zone size device. Allowing over-commit too much will result in less async reclaim work and end up in ENOSPC. We can align down to the zone size to avoid that. Fixes: cb6cbab79055 ("btrfs: adjust overcommit logic when very close to full") CC: stable@vger.kernel.org # 6.9 Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>