summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-02-28SUNRPC: Merge svc_do_enqueue_xprt() into svc_enqueue_xprt()Chuck Lever
Neil says: "These functions were separated in commit 0971374e2818 ("SUNRPC: Reduce contention in svc_xprt_enqueue()") so that the XPT_BUSY check happened before taking any spinlocks. We have since moved or removed the spinlocks so the extra test is fairly pointless." I've made this a separate patch in case the XPT_BUSY change has unexpected consequences and needs to be reverted. Suggested-by: Neil Brown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28SUNRPC: Remove the .svo_enqueue_xprt methodChuck Lever
We have never been able to track down and address the underlying cause of the performance issues with workqueue-based service support. svo_enqueue_xprt is called multiple times per RPC, so it adds instruction path length, but always ends up at the same function: svc_xprt_do_enqueue(). We do not anticipate needing this flexibility for dynamic nfsd thread management support. As a micro-optimization, remove .svo_enqueue_xprt because Spectre/Meltdown makes virtual function calls more costly. This change essentially reverts commit b9e13cdfac70 ("nfsd/sunrpc: turn enqueueing a svc_xprt into a svc_serv operation"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28SUNRPC: Record endpoint information in trace logChuck Lever
To make server-side trace events more useful in container-ized environments, capture not just the remote's IP address, but the local IP address and network namespace as well. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28SUNRPC: Same as SVC_RQST_ENDPOINT, but without the xidChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28SUNRPC: Improve sockaddr handling in the svc_xprt_create_error trace pointChuck Lever
Clean up: Use the new __sockaddr field to record the socket address. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28NFSD: Remove NFSD_PROC_ARGS_* macrosChuck Lever
Clean up. The PROC_ARGS macros were added when I thought that NFSD tracepoints would be reporting endpoint information. However, tracepoints in the RPC server now report transport endpoint information, so in general there's no need for the upper layers to do that any more, and these macros can be retired. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28NFSD: Use __sockaddr field to store socket addressesChuck Lever
As an example usage of the new __sockaddr field, convert some NFSD trace points to use it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28tracing: Update print fmt check to handle new __get_sockaddr() macroSteven Rostedt (Google)
A helper macro was added to make reading socket addresses easier in trace events. It pairs %pISpc with __get_sockaddr() that reads the socket address from the ring buffer into a human readable format. The boot up check that makes sure that trace events do not reference pointers to memory that can later be freed when the trace event is read, incorrectly flagged this as a delayed reference. Update the check to handle "__get_sockaddr" and not report an error on it. Link: https://lore.kernel.org/all/20220125160505.068dbb52@canb.auug.org.au/ Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28tracing: Introduce helpers to safely handle dynamic-sized sockaddrsChuck Lever
Enable a struct sockaddr to be stored in a trace record as a dynamically-sized field. The common cases are AF_INET and AF_INET6 which are different sizes, and are vastly smaller than a struct sockaddr_storage. These are safer because, when used properly, the size of the sockaddr destination field in each trace record is now guaranteed to be the same as the source address that is being copied into it. Link: https://lore.kernel.org/all/164182978641.8391.8277203495236105391.stgit@bazille.1015granger.net/ Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28NFSD: Streamline the rare "found" caseChuck Lever
Move a rarely called function call site out of the hot path. This is an exceptionally small improvement because the compiler inlines most of the functions that nfsd_cache_lookup() calls. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28NFSD: Skip extra computation for RC_NOCACHE caseChuck Lever
Force the compiler to skip unneeded initialization for cases that don't need those values. For example, NFSv4 COMPOUND operations are RC_NOCACHE. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28NFSD: De-duplicate hash bucket indexingChuck Lever
Clean up: The details of finding the right hash bucket are exactly the same in both nfsd_cache_lookup() and nfsd_cache_update(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28nfsd: Add support for the birth time attributeOndrej Valousek
For filesystems that supports "btime" timestamp (i.e. most modern filesystems do) we share it via kernel nfsd. Btime support for NFS client has already been added by Trond recently. Suggested-by: Bruce Fields <bfields@fieldses.org> Signed-off-by: Ondrej Valousek <ondrej.valousek.xm@renesas.com> [ cel: addressed some whitespace/checkpatch nits ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28spi: dt-bindings: renesas,rspi: Document RZ/V2L SoCLad Prabhakar
Add RSPI binding documentation for Renesas RZ/V2L SoC. RSPI block is identical to one found on RZ/A, so no driver changes are required. The fallback compatible string "renesas,rspi-rz" will be used on RZ/V2L. Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20220227225956.29570-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-02-28random: do crng pre-init loading in worker rather than irqJason A. Donenfeld
Taking spinlocks from IRQ context is generally problematic for PREEMPT_RT. That is, in part, why we take trylocks instead. However, a spin_try_lock() is also problematic since another spin_lock() invocation can potentially PI-boost the wrong task, as the spin_try_lock() is invoked from an IRQ-context, so the task on CPU (random task or idle) is not the actual owner. Additionally, by deferring the crng pre-init loading to the worker, we can use the cryptographic hash function rather than xor, which is perhaps a meaningful difference when considering this data has only been through the relatively weak fast_mix() function. The biggest downside of this approach is that the pre-init loading is now deferred until later, which means things that need random numbers after interrupts are enabled, but before workqueues are running -- or before this particular worker manages to run -- are going to get into trouble. Hopefully in the real world, this window is rather small, especially since this code won't run until 64 interrupts had occurred. Cc: Sultan Alsawaf <sultan@kerneltoast.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Theodore Ts'o <tytso@mit.edu> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-28random: unify cycles_t and jiffies usage and typesJason A. Donenfeld
random_get_entropy() returns a cycles_t, not an unsigned long, which is sometimes 64 bits on various 32-bit platforms, including x86. Conversely, jiffies is always unsigned long. This commit fixes things to use cycles_t for fields that use random_get_entropy(), named "cycles", and unsigned long for fields that use jiffies, named "now". It's also good to mix in a cycles_t and a jiffies in the same way for both add_device_randomness and add_timer_randomness, rather than using xor in one case. Finally, we unify the order of these volatile reads, always reading the more precise cycles counter, and then jiffies, so that the cycle counter is as close to the event as possible. Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-28random: cleanup UUID handlingJason A. Donenfeld
Rather than hard coding various lengths, we can use the right constants. Strings should be `char *` while buffers should be `u8 *`. Rather than have a nonsensical and unused maxlength, just remove it. Finally, use snprintf instead of sprintf, just out of good hygiene. As well, remove the old comment about returning a binary UUID via the binary sysctl syscall. That syscall was removed from the kernel in 5.5, and actually, the "uuid_strategy" function and related infrastructure for even serving it via the binary sysctl syscall was removed with 894d2491153a ("sysctl drivers: Remove dead binary sysctl support") back in 2.6.33. Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-02-28ARM: 9181/1: vdso: remove -nostdlib compiler flagMasahiro Yamada
The -nostdlib option requests the compiler to not use the standard system startup files or libraries when linking. It is effective only when $(CC) is used as a linker driver. Since commit fe00e50b2db8 ("ARM: 8858/1: vdso: use $(LD) instead of $(CC) to link VDSO"), $(LD) is directly used, hence -nostdlib is unneeded. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-02-28ARM: 9175/1: Convert to reserve_initrd_mem()Wang Kefeng
Covert to the generic reserve_initrd_mem() function. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-02-28ARM: 9174/1: amba: Move EXPORT_SYMBOL() closer to definitionWang Kefeng
Some EXPORT_SYMBOL() is at the end of the function, but some is at the end of file. For reader sanity and be consistent, move all EXPORT_SYMBOL() declarations just after the end of the function. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-02-28ARM: 9173/1: amba: kill amba_find_match()Wang Kefeng
There is no one use amba_find_match(), kill it. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-02-28ARM: 9172/1: amba: Cleanup amba pclk operationWang Kefeng
There is no user about amba_pclk_[un]prepare() besides pl330.c, directly use clk_[un]prepare(). After this, all the function about amba pclk operation, enable, disable, [un]prepare could be killed. Acked-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-02-28ARM: 9182/1: mmu: fix returns from early_param() and __setup() functionsRandy Dunlap
early_param() handlers should return 0 on success. __setup() handlers should return 1 on success, i.e., the parameter has been handled. A return of 0 would cause the "option=value" string to be added to init's environment strings, polluting it. ../arch/arm/mm/mmu.c: In function 'test_early_cachepolicy': ../arch/arm/mm/mmu.c:215:1: error: no return statement in function returning non-void [-Werror=return-type] ../arch/arm/mm/mmu.c: In function 'test_noalign_setup': ../arch/arm/mm/mmu.c:221:1: error: no return statement in function returning non-void [-Werror=return-type] Fixes: b849a60e0903 ("ARM: make cr_alignment read-only #ifndef CONFIG_CPU_CP15") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: linux-arm-kernel@lists.infradead.org Cc: patches@armlinux.org.uk Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-02-28blk-crypto: show crypto capabilities in sysfsEric Biggers
Add sysfs files that expose the inline encryption capabilities of request queues: /sys/block/$disk/queue/crypto/max_dun_bits /sys/block/$disk/queue/crypto/modes/$mode /sys/block/$disk/queue/crypto/num_keyslots Userspace can use these new files to decide what encryption settings to use, or whether to use inline encryption at all. This also brings the crypto capabilities in line with the other queue properties, which are already discoverable via the queue directory in sysfs. Design notes: - Place the new files in a new subdirectory "crypto" to group them together and to avoid complicating the main "queue" directory. This also makes it possible to replace "crypto" with a symlink later if we ever make the blk_crypto_profiles into real kobjects (see below). - It was necessary to define a new kobject that corresponds to the crypto subdirectory. For now, this kobject just contains a pointer to the blk_crypto_profile. Note that multiple queues (and hence multiple such kobjects) may refer to the same blk_crypto_profile. An alternative design would more closely match the current kernel data structures: the blk_crypto_profile could be a kobject itself, located directly under the host controller device's kobject, while /sys/block/$disk/queue/crypto would be a symlink to it. I decided not to do that for now because it would require a lot more changes, such as no longer embedding blk_crypto_profile in other structures, and also because I'm not sure we can rule out moving the crypto capabilities into 'struct queue_limits' in the future. (Even if multiple queues share the same crypto engine, maybe the supported data unit sizes could differ due to other queue properties.) It would also still be possible to switch to that design later without breaking userspace, by replacing the directory with a symlink. - Use "max_dun_bits" instead of "max_dun_bytes". Currently, the kernel internally stores this value in bytes, but that's an implementation detail. It probably makes more sense to talk about this value in bits, and choosing bits is more future-proof. - "modes" is a sub-subdirectory, since there may be multiple supported crypto modes, sysfs is supposed to have one value per file, and it makes sense to group all the mode files together. - Each mode had to be named. The crypto API names like "xts(aes)" are not appropriate because they don't specify the key size. Therefore, I assigned new names. The exact names chosen are arbitrary, but they happen to match the names used in log messages in fs/crypto/. - The "num_keyslots" file is a bit different from the others in that it is only useful to know for performance reasons. However, it's included as it can still be useful. For example, a user might not want to use inline encryption if there aren't very many keyslots. Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20220124215938.2769-4-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-28block: don't delete queue kobject before its childrenEric Biggers
kobjects aren't supposed to be deleted before their child kobjects are deleted. Apparently this is usually benign; however, a WARN will be triggered if one of the child kobjects has a named attribute group: sysfs group 'modes' not found for kobject 'crypto' WARNING: CPU: 0 PID: 1 at fs/sysfs/group.c:278 sysfs_remove_group+0x72/0x80 ... Call Trace: sysfs_remove_groups+0x29/0x40 fs/sysfs/group.c:312 __kobject_del+0x20/0x80 lib/kobject.c:611 kobject_cleanup+0xa4/0x140 lib/kobject.c:696 kobject_release lib/kobject.c:736 [inline] kref_put include/linux/kref.h:65 [inline] kobject_put+0x53/0x70 lib/kobject.c:753 blk_crypto_sysfs_unregister+0x10/0x20 block/blk-crypto-sysfs.c:159 blk_unregister_queue+0xb0/0x110 block/blk-sysfs.c:962 del_gendisk+0x117/0x250 block/genhd.c:610 Fix this by moving the kobject_del() and the corresponding kobject_uevent() to the correct place. Fixes: 2c2086afc2b8 ("block: Protect less code with sysfs_lock in blk_{un,}register_queue()") Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220124215938.2769-3-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-28block: simplify calling convention of elv_unregister_queue()Eric Biggers
Make elv_unregister_queue() a no-op if q->elevator is NULL or is not registered. This simplifies the existing callers, as well as the future caller in the error path of blk_register_queue(). Also don't bother checking whether q is NULL, since it never is. Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220124215938.2769-2-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-28blktrace: fix use after free for struct blk_traceYu Kuai
When tracing the whole disk, 'dropped' and 'msg' will be created under 'q->debugfs_dir' and 'bt->dir' is NULL, thus blk_trace_free() won't remove those files. What's worse, the following UAF can be triggered because of accessing stale 'dropped' and 'msg': ================================================================== BUG: KASAN: use-after-free in blk_dropped_read+0x89/0x100 Read of size 4 at addr ffff88816912f3d8 by task blktrace/1188 CPU: 27 PID: 1188 Comm: blktrace Not tainted 5.17.0-rc4-next-20220217+ #469 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-4 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_address_description.constprop.0.cold+0xab/0x381 ? blk_dropped_read+0x89/0x100 ? blk_dropped_read+0x89/0x100 kasan_report.cold+0x83/0xdf ? blk_dropped_read+0x89/0x100 kasan_check_range+0x140/0x1b0 blk_dropped_read+0x89/0x100 ? blk_create_buf_file_callback+0x20/0x20 ? kmem_cache_free+0xa1/0x500 ? do_sys_openat2+0x258/0x460 full_proxy_read+0x8f/0xc0 vfs_read+0xc6/0x260 ksys_read+0xb9/0x150 ? vfs_write+0x3d0/0x3d0 ? fpregs_assert_state_consistent+0x55/0x60 ? exit_to_user_mode_prepare+0x39/0x1e0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fbc080d92fd Code: ce 20 00 00 75 10 b8 00 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 1 RSP: 002b:00007fbb95ff9cb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 00007fbb95ff9dc0 RCX: 00007fbc080d92fd RDX: 0000000000000100 RSI: 00007fbb95ff9cc0 RDI: 0000000000000045 RBP: 0000000000000045 R08: 0000000000406299 R09: 00000000fffffffd R10: 000000000153afa0 R11: 0000000000000293 R12: 00007fbb780008c0 R13: 00007fbb78000938 R14: 0000000000608b30 R15: 00007fbb780029c8 </TASK> Allocated by task 1050: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 do_blk_trace_setup+0xcb/0x410 __blk_trace_setup+0xac/0x130 blk_trace_ioctl+0xe9/0x1c0 blkdev_ioctl+0xf1/0x390 __x64_sys_ioctl+0xa5/0xe0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 1050: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 __kasan_slab_free+0x103/0x180 kfree+0x9a/0x4c0 __blk_trace_remove+0x53/0x70 blk_trace_ioctl+0x199/0x1c0 blkdev_common_ioctl+0x5e9/0xb30 blkdev_ioctl+0x1a5/0x390 __x64_sys_ioctl+0xa5/0xe0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae The buggy address belongs to the object at ffff88816912f380 which belongs to the cache kmalloc-96 of size 96 The buggy address is located 88 bytes inside of 96-byte region [ffff88816912f380, ffff88816912f3e0) The buggy address belongs to the page: page:000000009a1b4e7c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0f flags: 0x17ffffc0000200(slab|node=0|zone=2|lastcpupid=0x1fffff) raw: 0017ffffc0000200 ffffea00044f1100 dead000000000002 ffff88810004c780 raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88816912f280: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ffff88816912f300: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc >ffff88816912f380: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ^ ffff88816912f400: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ffff88816912f480: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ================================================================== Fixes: c0ea57608b69 ("blktrace: remove debugfs file dentries from struct blk_trace") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20220228034354.4047385-1-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-28spi: use specific last_cs instead of last_cs_enableYun Zhou
Commit d40f0b6f2e21 instroduced last_cs_enable to avoid setting chipselect if it's not necessary, but it also introduces a bug. The chipselect may not be set correctly on multi-device SPI busses. The reason is that we can't judge the chipselect by bool last_cs_enable, since chipselect may be modified after other devices were accessed. So we should record the specific state of chipselect in case of confusion. Signed-off-by: Yun Zhou <yun.zhou@windriver.com> Link: https://lore.kernel.org/r/20220217141234.72737-1-yun.zhou@windriver.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-02-28iommu/tegra-smmu: Fix missing put_device() call in tegra_smmu_findMiaoqian Lin
The reference taken by 'of_find_device_by_node()' must be released when not needed anymore. Add the corresponding 'put_device()' in the error handling path. Fixes: 765a9d1d02b2 ("iommu/tegra-smmu: Fix mc errors on tegra124-nyan") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://lore.kernel.org/r/20220107080915.12686-1-linmq006@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-02-28iommu/vt-d: Fix double list_add when enabling VMD in scalable modeAdrian Huang
When enabling VMD and IOMMU scalable mode, the following kernel panic call trace/kernel log is shown in Eagle Stream platform (Sapphire Rapids CPU) during booting: pci 0000:59:00.5: Adding to iommu group 42 ... vmd 0000:59:00.5: PCI host bridge to bus 10000:80 pci 10000:80:01.0: [8086:352a] type 01 class 0x060400 pci 10000:80:01.0: reg 0x10: [mem 0x00000000-0x0001ffff 64bit] pci 10000:80:01.0: enabling Extended Tags pci 10000:80:01.0: PME# supported from D0 D3hot D3cold pci 10000:80:01.0: DMAR: Setup RID2PASID failed pci 10000:80:01.0: Failed to add to iommu group 42: -16 pci 10000:80:03.0: [8086:352b] type 01 class 0x060400 pci 10000:80:03.0: reg 0x10: [mem 0x00000000-0x0001ffff 64bit] pci 10000:80:03.0: enabling Extended Tags pci 10000:80:03.0: PME# supported from D0 D3hot D3cold ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:29! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 7 Comm: kworker/0:1 Not tainted 5.17.0-rc3+ #7 Hardware name: Lenovo ThinkSystem SR650V3/SB27A86647, BIOS ESE101Y-1.00 01/13/2022 Workqueue: events work_for_cpu_fn RIP: 0010:__list_add_valid.cold+0x26/0x3f Code: 9a 4a ab ff 4c 89 c1 48 c7 c7 40 0c d9 9e e8 b9 b1 fe ff 0f 0b 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 f0 0c d9 9e e8 a2 b1 fe ff <0f> 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 98 0c d9 9e e8 8b b1 fe RSP: 0000:ff5ad434865b3a40 EFLAGS: 00010246 RAX: 0000000000000058 RBX: ff4d61160b74b880 RCX: ff4d61255e1fffa8 RDX: 0000000000000000 RSI: 00000000fffeffff RDI: ffffffff9fd34f20 RBP: ff4d611d8e245c00 R08: 0000000000000000 R09: ff5ad434865b3888 R10: ff5ad434865b3880 R11: ff4d61257fdc6fe8 R12: ff4d61160b74b8a0 R13: ff4d61160b74b8a0 R14: ff4d611d8e245c10 R15: ff4d611d8001ba70 FS: 0000000000000000(0000) GS:ff4d611d5ea00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ff4d611fa1401000 CR3: 0000000aa0210001 CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> intel_pasid_alloc_table+0x9c/0x1d0 dmar_insert_one_dev_info+0x423/0x540 ? device_to_iommu+0x12d/0x2f0 intel_iommu_attach_device+0x116/0x290 __iommu_attach_device+0x1a/0x90 iommu_group_add_device+0x190/0x2c0 __iommu_probe_device+0x13e/0x250 iommu_probe_device+0x24/0x150 iommu_bus_notifier+0x69/0x90 blocking_notifier_call_chain+0x5a/0x80 device_add+0x3db/0x7b0 ? arch_memremap_can_ram_remap+0x19/0x50 ? memremap+0x75/0x140 pci_device_add+0x193/0x1d0 pci_scan_single_device+0xb9/0xf0 pci_scan_slot+0x4c/0x110 pci_scan_child_bus_extend+0x3a/0x290 vmd_enable_domain.constprop.0+0x63e/0x820 vmd_probe+0x163/0x190 local_pci_probe+0x42/0x80 work_for_cpu_fn+0x13/0x20 process_one_work+0x1e2/0x3b0 worker_thread+0x1c4/0x3a0 ? rescuer_thread+0x370/0x370 kthread+0xc7/0xf0 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- ... Kernel panic - not syncing: Fatal exception Kernel Offset: 0x1ca00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception ]--- The following 'lspci' output shows devices '10000:80:*' are subdevices of the VMD device 0000:59:00.5: $ lspci ... 0000:59:00.5 RAID bus controller: Intel Corporation Volume Management Device NVMe RAID Controller (rev 20) ... 10000:80:01.0 PCI bridge: Intel Corporation Device 352a (rev 03) 10000:80:03.0 PCI bridge: Intel Corporation Device 352b (rev 03) 10000:80:05.0 PCI bridge: Intel Corporation Device 352c (rev 03) 10000:80:07.0 PCI bridge: Intel Corporation Device 352d (rev 03) 10000:81:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller] 10000:82:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller] The symptom 'list_add double add' is caused by the following failure message: pci 10000:80:01.0: DMAR: Setup RID2PASID failed pci 10000:80:01.0: Failed to add to iommu group 42: -16 pci 10000:80:03.0: [8086:352b] type 01 class 0x060400 Device 10000:80:01.0 is the subdevice of the VMD device 0000:59:00.5, so invoking intel_pasid_alloc_table() gets the pasid_table of the VMD device 0000:59:00.5. Here is call path: intel_pasid_alloc_table pci_for_each_dma_alias get_alias_pasid_table search_pasid_table pci_real_dma_dev() in pci_for_each_dma_alias() gets the real dma device which is the VMD device 0000:59:00.5. However, pte of the VMD device 0000:59:00.5 has been configured during this message "pci 0000:59:00.5: Adding to iommu group 42". So, the status -EBUSY is returned when configuring pasid entry for device 10000:80:01.0. It then invokes dmar_remove_one_dev_info() to release 'struct device_domain_info *' from iommu_devinfo_cache. But, the pasid table is not released because of the following statement in __dmar_remove_one_dev_info(): if (info->dev && !dev_is_real_dma_subdevice(info->dev)) { ... intel_pasid_free_table(info->dev); } The subsequent dmar_insert_one_dev_info() operation of device 10000:80:03.0 allocates 'struct device_domain_info *' from iommu_devinfo_cache. The allocated address is the same address that is released previously for device 10000:80:01.0. Finally, invoking device_attach_pasid_table() causes the issue. `git bisect` points to the offending commit 474dd1c65064 ("iommu/vt-d: Fix clearing real DMA device's scalable-mode context entries"), which releases the pasid table if the device is not the subdevice by checking the returned status of dev_is_real_dma_subdevice(). Reverting the offending commit can work around the issue. The solution is to prevent from allocating pasid table if those devices are subdevices of the VMD device. Fixes: 474dd1c65064 ("iommu/vt-d: Fix clearing real DMA device's scalable-mode context entries") Cc: stable@vger.kernel.org # v5.14+ Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Link: https://lore.kernel.org/r/20220216091307.703-1-adrianhuang0701@gmail.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20220221053348.262724-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-02-28mmc: meson: Fix usage of meson_mmc_post_req()Rong Chen
Currently meson_mmc_post_req() is called in meson_mmc_request() right after meson_mmc_start_cmd(). This could lead to DMA unmapping before the request is actually finished. To fix, don't call meson_mmc_post_req() until meson_mmc_request_done(). Signed-off-by: Rong Chen <rong.chen@amlogic.com> Reviewed-by: Kevin Hilman <khilman@baylibre.com> Fixes: 79ed05e329c3 ("mmc: meson-gx: add support for descriptor chain mode") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220216124239.4007667-1-rong.chen@amlogic.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2022-02-28spi: pxa2xx-pci: Constify struct pxa_spi_info variablesAndy Shevchenko
Now when there are no dynamical changes required, we may constify struct pxa_spi_info variables. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220225172350.69797-11-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-02-28spi: pxa2xx-pci: Drop temporary storage use for a handful of membersAndy Shevchenko
Instead of using temporary storage, assign the values directly to the corresponding struct pxa2xx_spi_controller members. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220225172350.69797-10-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-02-28spi: pxa2xx-pci: Extract pxa2xx_spi_pci_clk_register()Andy Shevchenko
Extract pxa2xx_spi_pci_clk_register() from ->probe() in order to reuse it later on for getting rid of max_clk_rate temporary storage. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220225172350.69797-9-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-02-28spi: pxa2xx-pci: Drop unneeded checks in lpss_spi_setup()Andy Shevchenko
All of the LPSS devices are using DMA and set the parameters up, hence no need to test for that. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220225172350.69797-8-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-02-28spi: pxa2xx-pci: Replace enum with direct use of PCI IDsAndy Shevchenko
Instead of creating an abstraction on top of PCI IDs, just use them directly. The corresponding enum can be dropped. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220225172350.69797-7-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-02-28spi: pxa2xx-pci: Move max_clk_rate assignment to ->setup()Andy Shevchenko
Move max_clk_rate to the corresponding ->setup() function to unify with the rest. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220225172350.69797-6-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-02-28spi: pxa2xx-pci: Move dma_burst_size assignment to ->setup()Andy Shevchenko
Instead of using conditional, move dma_burst_size to the corresponding ->setup() function. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220225172350.69797-5-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-02-28spi: pxa2xx-pci: Move port_id assignment to ->setup()Andy Shevchenko
Instead of using conditional, move port_id to the corresponding ->setup() functions. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220225172350.69797-4-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-02-28spi: pxa2xx-pci: Drop redundant NULL check in ->probe()Andy Shevchenko
Since all platforms are using ->setup() function, drop unneeded check. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220225172350.69797-3-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-02-28spi: pxa2xx-pci: Refactor Quark X1000 to use ->setup()Andy Shevchenko
Refactor Quark X1000 handling code to use ->setup() instead of using the configuration data structure directly. It will allow to refactor further to avoid intermediate storage for the used configuration parameters. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220225172350.69797-2-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-02-28spi: pxa2xx-pci: Refactor CE4100 to use ->setup()Andy Shevchenko
Refactor CE4100 handling code to use ->setup() instead of spreading potentially confusing conditional. Besides that, it will allow to refactor further to avoid intermediate storage for the used configuration parameters. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220225172350.69797-1-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-02-28drm/i915: s/JSP2/ICP2/ PCHVille Syrjälä
This JSP2 PCH actually seems to be some special Apple specific ICP variant rather than a JSP. Make it so. Or at least all the references to it seem to be some Apple ICL machines. Didn't manage to find these PCI IDs in any public chipset docs unfortunately. The only thing we're losing here with this JSP->ICP change is Wa_14011294188, but based on the HSD that isn't actually needed on any ICP based design (including JSP), only TGP based stuff (including MCC) really need it. The documented w/a just never made that distinction because Windows didn't want to differentiate between JSP and MCC (not sure how they handle hpd/ddc/etc. then though...). Cc: stable@vger.kernel.org Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Vivek Kasireddy <vivek.kasireddy@intel.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4226 Fixes: 943682e3bd19 ("drm/i915: Introduce Jasper Lake PCH") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220224132142.12927-1-ville.syrjala@linux.intel.com Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Tested-by: Tomas Bzatek <bugs@bzatek.net> (cherry picked from commit 53581504a8e216d435f114a4f2596ad0dfd902fc) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2022-02-28drm/i915/guc/slpc: Correct the param count for unset paramVinay Belgaumkar
SLPC unset param H2G only needs one parameter - the id of the param. Fixes: 025cb07bebfa ("drm/i915/guc/slpc: Cache platform frequency limits") Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Ramalingam C <ramalingam.c@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220216181504.7155-1-vinay.belgaumkar@intel.com (cherry picked from commit 9648f1c3739505557d94ff749a4f32192ea81fe3) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2022-02-28nvme: check that EUI/GUID/UUID are globally uniqueChristoph Hellwig
Add a check to verify that the unique identifiers are unique globally in addition to the existing check that verifies that they are unique inside a single subsystem. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-02-28nvme: check for duplicate identifiers earlierChristoph Hellwig
Lift the check for duplicate identifiers into nvme_init_ns_head, which avoids pointless error unwinding in case they don't match, and also matches where we check identifier validity for the multipath case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-02-28nvme: fix the check for duplicate unique identifiersChristoph Hellwig
nvme_subsys_check_duplicate_ids should needs to return an error if any of the identifiers matches, not just if all of them match. But it does not need to and should not look at the CSI value for this sanity check. Rewrite the logic to be separate from nvme_ns_ids_equal and optimize it by reducing duplicate checks for non-present identifiers. Fixes: ed754e5deeb1 ("nvme: track shared namespaces") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-02-28nvme: cleanup __nvme_check_idsChristoph Hellwig
Pass the actual nvme_ns_ids used for the comparison instead of the ns_head that isn't needed and use a more descriptive function name. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-02-28nvme: remove nssa from struct nvme_ctrlKeith Busch
The reported number of streams is not used outside the function that gets it, so no need to stash it in the controller structure. Use a local variable instead. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-28nvme: explicitly set non-error for directivesKeith Busch
Stream directives is an optional feature. It is not an error if a controller doesn't support as many as the kernel can optionally use. Explicitly set the non-error return value on this condition with a comment explaining why. Note, the return value was already 0 in this condition, so the setting is redundant. This patch should just silence bots that falsely believe the condition contains an error omission. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>