summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-06-05locking/atomic: remove fallback commentsMark Rutland
Currently a subset of the fallback templates have kerneldoc comments, resulting in a haphazard set of generated kerneldoc comments as only some operations have fallback templates to begin with. We'd like to generate more consistent kerneldoc comments, and to do so we'll need to restructure the way the fallback code is generated. To minimize churn and to make it easier to restructure the fallback code, this patch removes the existing kerneldoc comments from the fallback templates. We can add new kerneldoc comments in subsequent patches. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-3-mark.rutland@arm.com
2023-06-05locking/atomic: arm: fix sync opsMark Rutland
The sync_*() ops on arch/arm are defined in terms of the regular bitops with no special handling. This is not correct, as UP kernels elide barriers for the fully-ordered operations, and so the required ordering is lost when such UP kernels are run under a hypervsior on an SMP system. Fix this by defining sync ops with the required barriers. Note: On 32-bit arm, the sync_*() ops are currently only used by Xen, which requires ARMv7, but the semantics can be implemented for ARMv6+. Fixes: e54d2f61528165bb ("xen/arm: sync_bitops") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605070124.3741859-2-mark.rutland@arm.com
2023-06-05MAINTAINERS: Add myself as I2C host drivers maintainerAndi Shyti
I will help Wolfram out with the i2c controllers patches. Signed-off-by: Andi Shyti <andi.shyti@kernel.org> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2023-06-05ALSA: hda/realtek: Add "Intel Reference board" and "NUC 13" SSID in the ALC256Sayed, Karimuddin
Add "Intel Reference boad" and "Intel NUC 13" SSID in the alc256. Enable jack headset volume buttons Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Signed-off-by: Sayed, Karimuddin <karimuddin.sayed@intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20230602193812.66768-1-pierre-louis.bossart@linux.intel.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-06-05ALSA: hda/realtek: Add Lenovo P3 Tower platformRenHai
Headset microphone on this platform does not work without ALC897_FIXUP_HEADSET_MIC_PIN fixup. Signed-off-by: RenHai <kean0048@gmail.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20230602003604.975892-1-kean0048@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-06-05s390/cpum_sf: Convert to cmpxchg128()Peter Zijlstra
Now that there is a cross arch u128 and cmpxchg128(), use those instead of the custom CDSG helper. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531132324.058821078@infradead.org
2023-06-05arch: Remove cmpxchg_doublePeter Zijlstra
No moar users, remove the monster. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531132323.991907085@infradead.org
2023-06-05slub: Replace cmpxchg_double()Peter Zijlstra
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531132323.924677086@infradead.org
2023-06-05x86,intel_iommu: Replace cmpxchg_double()Peter Zijlstra
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531132323.855976804@infradead.org
2023-06-05x86,amd_iommu: Replace cmpxchg_double()Peter Zijlstra
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531132323.788955257@infradead.org
2023-06-05parisc: Raise minimal GCC versionPeter Zijlstra
64-bit targets need the __int128 type, which for pa-risc means raising the minimum gcc version to 11. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Helge Deller <deller@gmx.de> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lkml.kernel.org/r/20230602143912.GI620383%40hirez.programming.kicks-ass.net
2023-06-05percpu: Wire up cmpxchg128Peter Zijlstra
In order to replace cmpxchg_double() with the newly minted cmpxchg128() family of functions, wire it up in this_cpu_cmpxchg(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531132323.654945124@infradead.org
2023-06-05percpu: Add {raw,this}_cpu_try_cmpxchg()Peter Zijlstra
Add the try_cmpxchg() form to the per-cpu ops. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531132323.587480729@infradead.org
2023-06-05instrumentation: Wire up cmpxchg128()Peter Zijlstra
Wire up the cmpxchg128 family in the atomic wrapper scripts. These provide the generic cmpxchg128 family of functions from the arch_ prefixed version, adding explicit instrumentation where needed. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531132323.519237070@infradead.org
2023-06-05arch: Introduce arch_{,try_}_cmpxchg128{,_local}()Peter Zijlstra
For all architectures that currently support cmpxchg_double() implement the cmpxchg128() family of functions that is basically the same but with a saner interface. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531132323.452120708@infradead.org
2023-06-05types: Introduce [us]128Peter Zijlstra
Introduce [us]128 (when available). Unlike [us]64, ensure they are always naturally aligned. This also enables 128bit wide atomics (which require natural alignment) such as cmpxchg128(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531132323.385005581@infradead.org
2023-06-05cyrpto/b128ops: Remove struct u128Peter Zijlstra
Per git-grep u128_xor() and its related struct u128 are unused except to implement {be,le}128_xor(). Remove them to free up the namespace. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531132323.314826687@infradead.org
2023-06-05ALSA: hda/realtek: Add a quirk for HP Slim Desktop S01Ai Chao
Add a quirk for HP Slim Desktop S01 to fixup headset MIC no presence. Signed-off-by: Ai Chao <aichao@kylinos.cn> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20230526094704.14597-1-aichao@kylinos.cn Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-06-05selftests: alsa: pcm-test: Fix compiler warnings about the formatMirsad Goran Todorovac
GCC 11.3.0 issues warnings in this module about wrong sizes of format specifiers: pcm-test.c: In function ‘test_pcm_time’: pcm-test.c:384:68: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 5 \ has type ‘unsigned int’ [-Wformat=] 384 | snprintf(msg, sizeof(msg), "rate mismatch %ld != %ld", rate, rrate); pcm-test.c:455:53: warning: format ‘%d’ expects argument of type ‘int’, but argument 4 has \ type ‘long int’ [-Wformat=] 455 | "expected %d, wrote %li", rate, frames); pcm-test.c:462:53: warning: format ‘%d’ expects argument of type ‘int’, but argument 4 has \ type ‘long int’ [-Wformat=] 462 | "expected %d, wrote %li", rate, frames); pcm-test.c:467:53: warning: format ‘%d’ expects argument of type ‘int’, but argument 4 has \ type ‘long int’ [-Wformat=] 467 | "expected %d, wrote %li", rate, frames); Simple fix according to compiler's suggestion removed the warnings. Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230524191528.13203-1-mirsad.todorovac@alu.unizg.hr Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-06-05Merge patch series "can: j1939: avoid possible use-after-free when ↵Marc Kleine-Budde
j1939_can_rx_register fails" Fedor Pchelkin <pchelkin@ispras.ru> says: The patch series fixes a possible racy use-after-free scenario described in 2/2: if j1939_can_rx_register() fails then the concurrent thread may have already read the invalid priv structure. The 1/2 makes j1939_netdev_lock a mutex so that access to j1939_can_rx_register() can be serialized without changing GFP_KERNEL to GFP_ATOMIC inside can_rx_register(). This seems to be safe. Note that the patch series has been tested only via Syzkaller and not with a real device. Link: https://lore.kernel.org/r/20230526171910.227615-1-pchelkin@ispras.ru Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-06-05can: j1939: avoid possible use-after-free when j1939_can_rx_register failsFedor Pchelkin
Syzkaller reports the following failure: BUG: KASAN: use-after-free in kref_put include/linux/kref.h:64 [inline] BUG: KASAN: use-after-free in j1939_priv_put+0x25/0xa0 net/can/j1939/main.c:172 Write of size 4 at addr ffff888141c15058 by task swapper/3/0 CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.10.144-syzkaller #0 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x107/0x167 lib/dump_stack.c:118 print_address_description.constprop.0+0x1c/0x220 mm/kasan/report.c:385 __kasan_report mm/kasan/report.c:545 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:562 check_memory_region_inline mm/kasan/generic.c:186 [inline] check_memory_region+0x145/0x190 mm/kasan/generic.c:192 instrument_atomic_read_write include/linux/instrumented.h:101 [inline] atomic_fetch_sub_release include/asm-generic/atomic-instrumented.h:220 [inline] __refcount_sub_and_test include/linux/refcount.h:272 [inline] __refcount_dec_and_test include/linux/refcount.h:315 [inline] refcount_dec_and_test include/linux/refcount.h:333 [inline] kref_put include/linux/kref.h:64 [inline] j1939_priv_put+0x25/0xa0 net/can/j1939/main.c:172 j1939_sk_sock_destruct+0x44/0x90 net/can/j1939/socket.c:374 __sk_destruct+0x4e/0x820 net/core/sock.c:1784 rcu_do_batch kernel/rcu/tree.c:2485 [inline] rcu_core+0xb35/0x1a30 kernel/rcu/tree.c:2726 __do_softirq+0x289/0x9a3 kernel/softirq.c:298 asm_call_irq_on_stack+0x12/0x20 </IRQ> __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline] do_softirq_own_stack+0xaa/0xe0 arch/x86/kernel/irq_64.c:77 invoke_softirq kernel/softirq.c:393 [inline] __irq_exit_rcu kernel/softirq.c:423 [inline] irq_exit_rcu+0x136/0x200 kernel/softirq.c:435 sysvec_apic_timer_interrupt+0x4d/0x100 arch/x86/kernel/apic/apic.c:1095 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:635 Allocated by task 1141: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc.constprop.0+0xc9/0xd0 mm/kasan/common.c:461 kmalloc include/linux/slab.h:552 [inline] kzalloc include/linux/slab.h:664 [inline] j1939_priv_create net/can/j1939/main.c:131 [inline] j1939_netdev_start+0x111/0x860 net/can/j1939/main.c:268 j1939_sk_bind+0x8ea/0xd30 net/can/j1939/socket.c:485 __sys_bind+0x1f2/0x260 net/socket.c:1645 __do_sys_bind net/socket.c:1656 [inline] __se_sys_bind net/socket.c:1654 [inline] __x64_sys_bind+0x6f/0xb0 net/socket.c:1654 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x61/0xc6 Freed by task 1141: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48 kasan_set_track+0x1c/0x30 mm/kasan/common.c:56 kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355 __kasan_slab_free+0x112/0x170 mm/kasan/common.c:422 slab_free_hook mm/slub.c:1542 [inline] slab_free_freelist_hook+0xad/0x190 mm/slub.c:1576 slab_free mm/slub.c:3149 [inline] kfree+0xd9/0x3b0 mm/slub.c:4125 j1939_netdev_start+0x5ee/0x860 net/can/j1939/main.c:300 j1939_sk_bind+0x8ea/0xd30 net/can/j1939/socket.c:485 __sys_bind+0x1f2/0x260 net/socket.c:1645 __do_sys_bind net/socket.c:1656 [inline] __se_sys_bind net/socket.c:1654 [inline] __x64_sys_bind+0x6f/0xb0 net/socket.c:1654 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x61/0xc6 It can be caused by this scenario: CPU0 CPU1 j1939_sk_bind(socket0, ndev0, ...) j1939_netdev_start() j1939_sk_bind(socket1, ndev0, ...) j1939_netdev_start() mutex_lock(&j1939_netdev_lock) j1939_priv_set(ndev0, priv) mutex_unlock(&j1939_netdev_lock) if (priv_new) kref_get(&priv_new->rx_kref) return priv_new; /* inside j1939_sk_bind() */ jsk->priv = priv j1939_can_rx_register(priv) // fails j1939_priv_set(ndev, NULL) kfree(priv) j1939_sk_sock_destruct() j1939_priv_put() // <- uaf To avoid this, call j1939_can_rx_register() under j1939_netdev_lock so that a concurrent thread cannot process j1939_priv before j1939_can_rx_register() returns. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20230526171910.227615-3-pchelkin@ispras.ru Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-06-05can: j1939: change j1939_netdev_lock type to mutexFedor Pchelkin
It turns out access to j1939_can_rx_register() needs to be serialized, otherwise j1939_priv can be corrupted when parallel threads call j1939_netdev_start() and j1939_can_rx_register() fails. This issue is thoroughly covered in other commit which serializes access to j1939_can_rx_register(). Change j1939_netdev_lock type to mutex so that we do not need to remove GFP_KERNEL from can_rx_register(). j1939_netdev_lock seems to be used in normal contexts where mutex usage is not prohibited. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Suggested-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20230526171910.227615-2-pchelkin@ispras.ru Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-06-05can: j1939: j1939_sk_send_loop_abort(): improved error queue handling in ↵Oleksij Rempel
J1939 Socket This patch addresses an issue within the j1939_sk_send_loop_abort() function in the j1939/socket.c file, specifically in the context of Transport Protocol (TP) sessions. Without this patch, when a TP session is initiated and a Clear To Send (CTS) frame is received from the remote side requesting one data packet, the kernel dispatches the first Data Transport (DT) frame and then waits for the next CTS. If the remote side doesn't respond with another CTS, the kernel aborts due to a timeout. This leads to the user-space receiving an EPOLLERR on the socket, and the socket becomes active. However, when trying to read the error queue from the socket with sock.recvmsg(, , socket.MSG_ERRQUEUE), it returns -EAGAIN, given that the socket is non-blocking. This situation results in an infinite loop: the user-space repeatedly calls epoll(), epoll() returns the socket file descriptor with EPOLLERR, but the socket then blocks on the recv() of ERRQUEUE. This patch introduces an additional check for the J1939_SOCK_ERRQUEUE flag within the j1939_sk_send_loop_abort() function. If the flag is set, it indicates that the application has subscribed to receive error queue messages. In such cases, the kernel can communicate the current transfer state via the error queue. This allows for the function to return early, preventing the unnecessary setting of the socket into an error state, and breaking the infinite loop. It is crucial to note that a socket error is only needed if the application isn't using the error queue, as, without it, the application wouldn't be aware of transfer issues. Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Reported-by: David Jander <david@protonic.nl> Tested-by: David Jander <david@protonic.nl> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20230526081946.715190-1-o.rempel@pengutronix.de Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-06-05xfs: collect errors from inodegc for unlinked inode recoveryDave Chinner
Unlinked list recovery requires errors removing the inode the from the unlinked list get fed back to the main recovery loop. Now that we offload the unlinking to the inodegc work, we don't get errors being fed back when we trip over a corruption that prevents the inode from being removed from the unlinked list. This means we never clear the corrupt unlinked list bucket, resulting in runtime operations eventually tripping over it and shutting down. Fix this by collecting inodegc worker errors and feed them back to the flush caller. This is largely best effort - the only context that really cares is log recovery, and it only flushes a single inode at a time so we don't need complex synchronised handling. Essentially the inodegc workers will capture the first error that occurs and the next flush will gather them and clear them. The flush itself will only report the first gathered error. In the cases where callers can return errors, propagate the collected inodegc flush error up the error handling chain. In the case of inode unlinked list recovery, there are several superfluous calls to flush queued unlinked inodes - xlog_recover_iunlink_bucket() guarantees that it has flushed the inodegc and collected errors before it returns. Hence nothing in the calling path needs to run a flush, even when an error is returned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-06-05xfs: validate block number being freed before adding to xefiDave Chinner
Bad things happen in defered extent freeing operations if it is passed a bad block number in the xefi. This can come from a bogus agno/agbno pair from deferred agfl freeing, or just a bad fsbno being passed to __xfs_free_extent_later(). Either way, it's very difficult to diagnose where a null perag oops in EFI creation is coming from when the operation that queued the xefi has already been completed and there's no longer any trace of it around.... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-06-05xfs: validity check agbnos on the AGFLDave Chinner
If the agfl or the indexing in the AGF has been corrupted, getting a block form the AGFL could return an invalid block number. If this happens, bad things happen. Check the agbno we pull off the AGFL and return -EFSCORRUPTED if we find somethign bad. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-06-05xfs: fix agf/agfl verification on v4 filesystemsDave Chinner
When a v4 filesystem has fl_last - fl_first != fl_count, we do not not detect the corruption and allow the AGF to be used as it if was fully valid. On V5 filesystems, we reset the AGFL to empty in these cases and avoid the corruption at a small cost of leaked blocks. If we don't catch the corruption on V4 filesystems, bad things happen later when an allocation attempts to trim the free list and either double-frees stale entries in the AGFl or tries to free NULLAGBNO entries. Either way, this is bad. Prevent this from happening by using the AGFL_NEED_RESET logic for v4 filesysetms, too. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-06-05xfs: fix double xfs_perag_rele() in xfs_filestream_pick_ag()Dave Chinner
xfs_bmap_longest_free_extent() can return an error when accessing the AGF fails. In this case, the behaviour of xfs_filestream_pick_ag() is conditional on the error. We may continue the loop, or break out of it. The error handling after the loop cleans up the perag reference held when the break occurs. If we continue, the next loop iteration handles cleaning up the perag reference. EIther way, we don't need to release the active perag reference when xfs_bmap_longest_free_extent() fails. Doing so means we do a double decrement on the active reference count, and this causes tha active reference count to fall to zero. At this point, new active references will fail. This leads to unmount hanging because it tries to grab active references to that perag, only for it to fail. This happens inside a loop that retries until a inode tree radix tree tag is cleared, which cannot happen because we can't get an active reference to the perag. The unmount livelocks in this path: xfs_reclaim_inodes+0x80/0xc0 xfs_unmount_flush_inodes+0x5b/0x70 xfs_unmountfs+0x5b/0x1a0 xfs_fs_put_super+0x49/0x110 generic_shutdown_super+0x7c/0x1a0 kill_block_super+0x27/0x50 deactivate_locked_super+0x30/0x90 deactivate_super+0x3c/0x50 cleanup_mnt+0xc2/0x160 __cleanup_mnt+0x12/0x20 task_work_run+0x5e/0xa0 exit_to_user_mode_prepare+0x1bc/0x1c0 syscall_exit_to_user_mode+0x16/0x40 do_syscall_64+0x40/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Reported-by: Pengfei Xu <pengfei.xu@intel.com> Fixes: eb70aa2d8ed9 ("xfs: use for_each_perag_wrap in xfs_filestream_pick_ag") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-06-05xfs: fix broken logic when detecting mergeable bmap recordsDarrick J. Wong
Commit 6bc6c99a944c was a well-intentioned effort to initiate consolidation of adjacent bmbt mapping records by setting the PREEN flag. Consolidation can only happen if the length of the combined record doesn't overflow the 21-bit blockcount field of the bmbt recordset. Unfortunately, the length test is inverted, leading to it triggering on data forks like these: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL 0: [0..16777207]: 76110848..92888055 0 (76110848..92888055) 16777208 1: [16777208..20639743]: 92888056..96750591 0 (92888056..96750591) 3862536 Note that record 0 has a length of 16777208 512b blocks. This corresponds to 2097151 4k fsblocks, which is the maximum. Hence the two records cannot be merged. However, the logic is still wrong even if we change the in-loop comparison, because the scope of our examination isn't broad enough inside the loop to detect mappings like this: 0: [0..9]: 76110838..76110847 0 (76110838..76110847) 10 1: [10..16777217]: 76110848..92888055 0 (76110848..92888055) 16777208 2: [16777218..20639753]: 92888056..96750591 0 (92888056..96750591) 3862536 These three records could be merged into two, but one cannot determine this purely from looking at records 0-1 or 1-2 in isolation. Hoist the mergability detection outside the loop, and base its decision making on whether or not a merged mapping could be expressed in fewer bmbt records. While we're at it, fix the incorrect return type of the iter function. Fixes: 336642f79283 ("xfs: alert the user about data/attr fork mappings that could be merged") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-06-05xfs: Fix undefined behavior of shift into sign bitGeert Uytterhoeven
With gcc-5: In file included from ./include/trace/define_trace.h:102:0, from ./fs/xfs/scrub/trace.h:988, from fs/xfs/scrub/trace.c:40: ./fs/xfs/./scrub/trace.h: In function ‘trace_raw_output_xchk_fsgate_class’: ./fs/xfs/scrub/scrub.h:111:28: error: initializer element is not constant #define XREP_ALREADY_FIXED (1 << 31) /* checking our repair work */ ^ Shifting the (signed) value 1 into the sign bit is undefined behavior. Fix this for all definitions in the file by shifting "1U" instead of "1". This was exposed by the first user added in commit 466c525d6d35e691 ("xfs: minimize overhead of drain wakeups by using jump labels"). Fixes: 160b5a784525e8a4 ("xfs: hoist the already_fixed variable to the scrub context") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-06-05xfs: fix AGF vs inode cluster buffer deadlockDave Chinner
Lock order in XFS is AGI -> AGF, hence for operations involving inode unlinked list operations we always lock the AGI first. Inode unlinked list operations operate on the inode cluster buffer, so the lock order there is AGI -> inode cluster buffer. For O_TMPFILE operations, this now means the lock order set down in xfs_rename and xfs_link is AGI -> inode cluster buffer -> AGF as the unlinked ops are done before the directory modifications that may allocate space and lock the AGF. Unfortunately, we also now lock the inode cluster buffer when logging an inode so that we can attach the inode to the cluster buffer and pin it in memory. This creates a lock order of AGF -> inode cluster buffer in directory operations as we have to log the inode after we've allocated new space for it. This creates a lock inversion between the AGF and the inode cluster buffer. Because the inode cluster buffer is shared across multiple inodes, the inversion is not specific to individual inodes but can occur when inodes in the same cluster buffer are accessed in different orders. To fix this we need move all the inode log item cluster buffer interactions to the end of the current transaction. Unfortunately, xfs_trans_log_inode() calls are littered throughout the transactions with no thought to ordering against other items or locking. This makes it difficult to do anything that involves changing the call sites of xfs_trans_log_inode() to change locking orders. However, we do now have a mechanism that allows is to postpone dirty item processing to just before we commit the transaction: the ->iop_precommit method. This will be called after all the modifications are done and high level objects like AGI and AGF buffers have been locked and modified, thereby providing a mechanism that guarantees we don't lock the inode cluster buffer before those high level objects are locked. This change is largely moving the guts of xfs_trans_log_inode() to xfs_inode_item_precommit() and providing an extra flag context in the inode log item to track the dirty state of the inode in the current transaction. This also means we do a lot less repeated work in xfs_trans_log_inode() by only doing it once per transaction when all the work is done. Fixes: 298f7bec503f ("xfs: pin inode backing buffer to the inode log item") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-06-05xfs: defered work could create precommitsDave Chinner
To fix a AGI-AGF-inode cluster buffer deadlock, we need to move inode cluster buffer operations to the ->iop_precommit() method. However, this means that deferred operations can require precommits to be run on the final transaction that the deferred ops pass back to xfs_trans_commit() context. This will be exposed by attribute handling, in that the last changes to the inode in the attr set state machine "disappear" because the precommit operation is not run. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-06-05xfs: restore allocation trylock iterationDave Chinner
It was accidentally dropped when refactoring the allocation code, resulting in the AG iteration always doing blocking AG iteration. This results in a small performance regression for a specific fsmark test that runs more user data writer threads than there are AGs. Reported-by: kernel test robot <oliver.sang@intel.com> Fixes: 2edf06a50f5b ("xfs: factor xfs_alloc_vextent_this_ag() for _iterate_ags()") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-06-05xfs: buffer pins need to hold a buffer referenceDave Chinner
When a buffer is unpinned by xfs_buf_item_unpin(), we need to access the buffer after we've dropped the buffer log item reference count. This opens a window where we can have two racing unpins for the buffer item (e.g. shutdown checkpoint context callback processing racing with journal IO iclog completion processing) and both attempt to access the buffer after dropping the BLI reference count. If we are unlucky, the "BLI freed" context wins the race and frees the buffer before the "BLI still active" case checks the buffer pin count. This results in a use after free that can only be triggered in active filesystem shutdown situations. To fix this, we need to ensure that buffer existence extends beyond the BLI reference count checks and until the unpin processing is complete. This implies that a buffer pin operation must also take a buffer reference to ensure that the buffer cannot be freed until the buffer unpin processing is complete. Reported-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-06-04Linux 6.4-rc5v6.4-rc5Linus Torvalds
2023-06-04KVM: arm64: PMU: Don't overwrite PMUSERENR with vcpu loadedReiji Watanabe
Currently, with VHE, KVM sets ER, CR, SW and EN bits of PMUSERENR_EL0 to 1 on vcpu_load(), and saves and restores the register value for the host on vcpu_load() and vcpu_put(). If the value of those bits are cleared on a pCPU with a vCPU loaded (armv8pmu_start() would do that when PMU counters are programmed for the guest), PMU access from the guest EL0 might be trapped to the guest EL1 directly regardless of the current PMUSERENR_EL0 value of the vCPU. Fix this by not letting armv8pmu_start() overwrite PMUSERENR_EL0 on the pCPU where PMUSERENR_EL0 for the guest is loaded, and instead updating the saved shadow register value for the host so that the value can be restored on vcpu_put() later. While vcpu_{put,load}() are manipulating PMUSERENR_EL0, disable IRQs to prevent a race condition between these processes and IPIs that attempt to update PMUSERENR_EL0 for the host EL0. Suggested-by: Mark Rutland <mark.rutland@arm.com> Suggested-by: Marc Zyngier <maz@kernel.org> Fixes: 83a7a4d643d3 ("arm64: perf: Enable PMU counter userspace access for perf event") Signed-off-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230603025035.3781797-3-reijiw@google.com
2023-06-04KVM: arm64: PMU: Restore the host's PMUSERENR_EL0Reiji Watanabe
Restore the host's PMUSERENR_EL0 value instead of clearing it, before returning back to userspace, as the host's EL0 might have a direct access to PMU registers (some bits of PMUSERENR_EL0 for might not be zero for the host EL0). Fixes: 83a7a4d643d3 ("arm64: perf: Enable PMU counter userspace access for perf event") Signed-off-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230603025035.3781797-2-reijiw@google.com
2023-06-04Merge tag 'irq_urgent_for_v6.4_rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Borislav Petkov: - Fix open firmware quirks validation so that they don't get applied wrongly * tag 'irq_urgent_for_v6.4_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic: Correctly validate OF quirk descriptors
2023-06-04net: sched: wrap tc_skip_wrapper with CONFIG_RETPOLINEMin-Hua Chen
This patch fixes the following sparse warning: net/sched/sch_api.c:2305:1: sparse: warning: symbol 'tc_skip_wrapper' was not declared. Should it be static? No functional change intended. Signed-off-by: Min-Hua Chen <minhuadotchen@gmail.com> Acked-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-04Merge branch 'enetc-fixes'David S. Miller
Wei Fang says: ==================== net: enetc: correct the statistics of rx bytes The purpose of this patch set is to fix the issue of rx bytes statistics. The first patch corrects the rx bytes statistics of normal kernel protocol stack path, and the second patch is used to correct the rx bytes statistics of XDP. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-04net: enetc: correct rx_bytes statistics of XDPWei Fang
The rx_bytes statistics of XDP are always zero, because rx_byte_cnt is not updated after it is initialized to 0. So fix it. Fixes: d1b15102dd16 ("net: enetc: add support for XDP_DROP and XDP_PASS") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-04net: enetc: correct the statistics of rx bytesWei Fang
The rx_bytes of struct net_device_stats should count the length of ethernet frames excluding the FCS. However, there are two problems with the rx_bytes statistics of the current enetc driver. one is that the length of VLAN header is not counted if the VLAN extraction feature is enabled. The other is that the length of L2 header is not counted, because eth_type_trans() is invoked before updating rx_bytes which will subtract the length of L2 header from skb->len. BTW, the rx_bytes statistics of XDP path also have similar problem, I will fix it in another patch. Fixes: a800abd3ecb9 ("net: enetc: move skb creation into enetc_build_skb") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-04ublk: add control command of UBLK_U_CMD_GET_FEATURESMing Lei
Add control command of UBLK_U_CMD_GET_FEATURES for returning driver's feature set or capability. This way can simplify userspace for maintaining compatibility because userspace doesn't need to send command to one device for querying driver feature set any more. Such as, with the queried feature set, userspace can choose to use: - UBLK_CMD_GET_DEV_INFO2 or UBLK_CMD_GET_DEV_INFO, - UBLK_U_CMD_* or UBLK_CMD_* Userspace code: https://github.com/ming1/ubdsrv/commits/features-cmd Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230603040601.775227-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-04Merge tag 'media/v6.4-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "Some driver fixes: - a regression fix for the verisilicon driver - uvcvideo: don't expose unsupported video formats to userspace - camss-video: don't zero subdev format after init - mediatek: some fixes for 4K decoder formats - fix a Sphinx build warning (missing doc for client_caps) - some fixes for imx and atomisp staging drivers And two CEC core fixes: - don't set last_initiator if TX in progress - disable adapter in cec_devnode_unregister" * tag 'media/v6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: uvcvideo: Don't expose unsupported formats to userspace media: v4l2-subdev: Fix missing kerneldoc for client_caps media: staging: media: imx: initialize hs_settle to avoid warning media: v4l2-mc: Drop subdev check in v4l2_create_fwnode_links_to_pad() media: staging: media: atomisp: init high & low vars media: cec: core: don't set last_initiator if tx in progress media: cec: core: disable adapter in cec_devnode_unregister media: mediatek: vcodec: Only apply 4K frame sizes on decoder formats media: camss: camss-video: Don't zero subdev format again after initialization media: verisilicon: Additional fix for the crash when opening the driver
2023-06-04fsverity: simplify error handling in verify_data_block()Eric Biggers
Clean up the error handling in verify_data_block() to (a) eliminate the 'err' variable which has caused some confusion because the function actually returns a bool, (b) reduce the compiled code size slightly, and (c) execute one fewer branch in the success case. Link: https://lore.kernel.org/r/20230604022312.48532-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-06-04fsverity: don't use bio_first_page_all() in fsverity_verify_bio()Eric Biggers
bio_first_page_all(bio)->mapping->host is not compatible with large folios, since the first page of the bio is not necessarily the head page of the folio, and therefore it might not have the mapping pointer set. Therefore, move the dereference of ->mapping->host into verify_data_blocks(), which works with a folio. (Like the commit that this Fixes, this hasn't actually been tested with large folios yet, since the filesystems that use fs/verity/ don't support that yet. But based on code review, I think this is needed.) Fixes: 5d0f0e57ed90 ("fsverity: support verifying data from large folios") Link: https://lore.kernel.org/r/20230604022101.48342-1-ebiggers@kernel.org Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-06-04fsverity: constify fsverity_hash_algEric Biggers
Now that fsverity_hash_alg doesn't have an embedded mempool, it can be 'const' almost everywhere. Add it. Link: https://lore.kernel.org/r/20230604022348.48658-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-06-04fsverity: use shash API instead of ahash APIEric Biggers
The "ahash" API, like the other scatterlist-based crypto APIs such as "skcipher", comes with some well-known limitations. First, it can't easily be used with vmalloc addresses. Second, the request struct can't be allocated on the stack. This adds complexity and a possible failure point that needs to be worked around, e.g. using a mempool. The only benefit of ahash over "shash" is that ahash is needed to access traditional memory-to-memory crypto accelerators, i.e. drivers/crypto/. However, this style of crypto acceleration has largely fallen out of favor and been superseded by CPU-based acceleration or inline crypto engines. Also, ahash needs to be used asynchronously to take full advantage of such hardware, but fs/verity/ has never done this. On all systems that aren't actually using one of these ahash-only crypto accelerators, ahash just adds unnecessary overhead as it sits between the user and the underlying shash algorithms. Also, XFS is planned to cache fsverity Merkle tree blocks in the existing XFS buffer cache. As a result, it will be possible for a single Merkle tree block to be split across discontiguous pages (https://lore.kernel.org/r/20230405233753.GU3223426@dread.disaster.area). This data will need to be hashed. It is easiest to work with a vmapped address in this case. However, ahash is incompatible with this. Therefore, let's convert fs/verity/ from ahash to shash. This simplifies the code, and it should also slightly improve performance for everyone who wasn't actually using one of these ahash-only crypto accelerators, i.e. almost everyone (or maybe even everyone)! Link: https://lore.kernel.org/r/20230516052306.99600-1-ebiggers@kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-06-04Merge tag 'v6.4-rc4' into v4l_for_linusMauro Carvalho Chehab
Linux 6.4-rc4 * tag 'v6.4-rc4': (606 commits) Linux 6.4-rc4 cxl: Explicitly initialize resources when media is not ready x86: re-introduce support for ERMS copies for user space accesses NVMe: Add MAXIO 1602 to bogus nid list. module: error out early on concurrent load of the same module file x86/topology: Fix erroneous smp_num_siblings on Intel Hybrid platforms cpufreq: amd-pstate: Update policy->cur in amd_pstate_adjust_perf() io_uring: unlock sqd->lock before sq thread release CPU MAINTAINERS: update arm64 Microchip entries udplite: Fix NULL pointer dereference in __sk_mem_raise_allocated(). net: phy: mscc: enable VSC8501/2 RGMII RX clock net: phy: mscc: remove unnecessary phydev locking net: phy: mscc: add support for VSC8501 net: phy: mscc: add VSC8502 to MODULE_DEVICE_TABLE net/handshake: Enable the SNI extension to work properly net/handshake: Unpin sock->file if a handshake is cancelled net/handshake: handshake_genl_notify() shouldn't ignore @flags net/handshake: Fix uninitialized local variable net/handshake: Fix handshake_dup() ref counting net/handshake: Remove unneeded check from handshake_dup() ...
2023-06-04Merge tag 'char-misc-6.4-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are a bunch of tiny char/misc/other driver fixes for 6.4-rc5 that resolve a number of reported issues. Included in here are: - iio driver fixes - fpga driver fixes - test_firmware bugfixes - fastrpc driver tiny bugfixes - MAINTAINERS file updates for some subsystems All of these have been in linux-next this past week with no reported issues" * tag 'char-misc-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (34 commits) test_firmware: fix the memory leak of the allocated firmware buffer test_firmware: fix a memory leak with reqs buffer test_firmware: prevent race conditions by a correct implementation of locking firmware_loader: Fix a NULL vs IS_ERR() check MAINTAINERS: Vaibhav Gupta is the new ipack maintainer dt-bindings: fpga: replace Ivan Bornyakov maintainership MAINTAINERS: update Microchip MPF FPGA reviewers misc: fastrpc: reject new invocations during device removal misc: fastrpc: return -EPIPE to invocations on device removal misc: fastrpc: Reassign memory ownership only for remote heap misc: fastrpc: Pass proper scm arguments for secure map request iio: imu: inv_icm42600: fix timestamp reset iio: adc: ad_sigma_delta: Fix IRQ issue by setting IRQ_DISABLE_UNLAZY flag dt-bindings: iio: adc: renesas,rcar-gyroadc: Fix adi,ad7476 compatible value iio: dac: mcp4725: Fix i2c_master_send() return value handling iio: accel: kx022a fix irq getting iio: bu27034: Ensure reset is written iio: dac: build ad5758 driver when AD5758 is selected iio: addac: ad74413: fix resistance input processing iio: light: vcnl4035: fixed chip ID check ...