git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2023-06-20	crypto: af_alg/hash: Fix recvmsg() after sendmsg(MSG_MORE)	David Howells
	If an AF_ALG socket bound to a hashing algorithm is sent a zero-length message with MSG_MORE set and then recvmsg() is called without first sending another message without MSG_MORE set to end the operation, an oops will occur because the crypto context and result doesn't now get set up in advance because hash_sendmsg() now defers that as long as possible in the hope that it can use crypto_ahash_digest() - and then because the message is zero-length, it the data wrangling loop is skipped. Fix this by handling zero-length sends at the top of the hash_sendmsg() function. If we're not continuing the previous sendmsg(), then just ignore the send (hash_recvmsg() will invent something when called); if we are continuing, then we finalise the request at this point if MSG_MORE is not set to get any error here, otherwise the send is of no effect and can be ignored. Whilst we're at it, remove the code to create a kvmalloc'd scatterlist if we get more than ALG_MAX_PAGES - this shouldn't happen. Fixes: c662b043cdca ("crypto: af_alg/hash: Support MSG_SPLICE_PAGES") Reported-by: syzbot+13a08c0bf4d212766c3c@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/000000000000b928f705fdeb873a@google.com/ Reported-by: syzbot+14234ccf6d0ef629ec1a@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/000000000000c047db05fdeb8790@google.com/ Reported-by: syzbot+4e2e47f32607d0f72d43@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/000000000000bcca3205fdeb87fb@google.com/ Reported-by: syzbot+472626bb5e7c59fb768f@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/000000000000b55d8805fdeb8385@google.com/ Signed-off-by: David Howells <dhowells@redhat.com> Reported-and-tested-by: syzbot+6efc50cc1f8d718d6cb7@syzkaller.appspotmail.com cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/427646.1686913832@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20	clk: Fix memory leak in devm_clk_notifier_register()	Fei Shao
	devm_clk_notifier_register() allocates a devres resource for clk notifier but didn't register that to the device, so the notifier didn't get unregistered on device detach and the allocated resource was leaked. Fix the issue by registering the resource through devres_add(). This issue was found with kmemleak on a Chromebook. Fixes: 6d30d50d037d ("clk: add devm variant of clk_notifier_register") Signed-off-by: Fei Shao <fshao@chromium.org> Link: https://lore.kernel.org/r/20230619112253.v2.1.I13f060c10549ef181603e921291bdea95f83033c@changeid Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2023-06-20	net: phy: mediatek: fix compile-test dependencies	Arnd Bergmann
	The new phy driver attempts to select a driver from another subsystem, but that fails when the NVMEM subsystem is disabled: WARNING: unmet direct dependencies detected for NVMEM_MTK_EFUSE Depends on [n]: NVMEM [=n] && (ARCH_MEDIATEK [=n] \|\| COMPILE_TEST [=y]) && HAS_IOMEM [=y] Selected by [y]: - MEDIATEK_GE_SOC_PHY [=y] && NETDEVICES [=y] && PHYLIB [=y] && (ARM64 && ARCH_MEDIATEK [=n] \|\| COMPILE_TEST [=y]) I could not see an actual compile time dependency, so presumably this is only needed for for working correctly but not technically a dependency on that particular nvmem driver implementation, so it would likely be safe to remove the select for compile testing. To keep the spirit of the original 'select', just replace this with a 'depends on' that ensures that the driver will work but does not get in the way of build testing. Fixes: 98c485eaf509b ("net: phy: add driver for MediaTek SoC built-in GE PHYs") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Reviewed-by: Daniel Golle <daniel@makrotopia.org> Link: https://lore.kernel.org/r/20230616093009.3511692-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20	clk: mvebu: Iterate over possible CPUs instead of DT CPU nodes	Rob Herring
	Rework iterating over DT CPU nodes to iterate over possible CPUs instead. There's no need to walk the DT CPU nodes again. Possible CPUs is equal to the number of CPUs defined in the DT. Using the "reg" value for an array index is fragile as it assumes "reg" is 0-N which often is not the case. Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230327-mvebu-clk-fixes-v2-3-8333729ee45d@kernel.org Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2023-06-20	clk: mvebu: Use of_get_cpu_hwid() to read CPU ID	Rob Herring
	Use of_get_cpu_hwid() rather than the open coded reading of the CPU nodes "reg" property. The existing code is in fact wrong as the "reg" address cells size is 2 cells for arm64. The existing code happens to work because the DTS files are wrong as well. Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230327-mvebu-clk-fixes-v2-2-8333729ee45d@kernel.org Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2023-06-20	MAINTAINERS: Add Marvell mvebu clock drivers	Rob Herring
	drivers/clk/mvebu/ is missing a maintainers entry. Add it to the existing entry for the Marvell mvebu platforms. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230327-mvebu-clk-fixes-v2-1-8333729ee45d@kernel.org Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2023-06-20	block: add capacity validation in bdev_add_partition()	Min Li
	In the function bdev_add_partition(),there is no check that the start and end sectors exceed the size of the disk before calling add_partition. When we call the block's ioctl interface directly to add a partition, and the capacity of the disk is set to 0 by driver,the command will continue to execute. Signed-off-by: Min Li <min15.li@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230619091214.31615-1-min15.li@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	Merge tag '6.4-rc6-smb3-server-fixes' of git://git.samba.org/ksmbd	Linus Torvalds
	Pull smb server fixes from Steve French: "Four smb3 server fixes, all also for stable: - fix potential oops in parsing compounded requests - fix various paths (mkdir, create etc) where mnt_want_write was not checked first - fix slab out of bounds in check_message and write" * tag '6.4-rc6-smb3-server-fixes' of git://git.samba.org/ksmbd: ksmbd: validate session id and tree id in the compound request ksmbd: fix out-of-bound read in smb2_write ksmbd: add mnt_want_write to ksmbd vfs functions ksmbd: validate command payload size
2023-06-20	block: fine-granular CAP_SYS_ADMIN for Persistent Reservation	Jingbo Xu
	Allow of unprivileged Persistent Reservation operations on devices if the write permission check on the device node has passed. brw-rw---- 1 root disk 259, 0 Jun 13 07:09 /dev/nvme0n1 In the example above, the "disk" group of nvme0n1 is also allowed to make reservations on the device even without CAP_SYS_ADMIN. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230613084008.93795-3-jefflexu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	block: disallow Persistent Reservation on partitions	Jingbo Xu
	Refuse Persistent Reservation operations on partitions as reservation on partitions doesn't make sense. Besides, introduce blkdev_pr_allowed() helper, where more policies could be placed here later. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230613084008.93795-2-jefflexu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	s390/cpum_cf: open access to hwctr device for CAP_PERFMON privileged process	Thomas Richter
	The device /dev/hwctr was introduced to access complete CPU Measurement facility counter sets via an ioctl system call. The access the to device is limited to privileged processes running as root or superuser. The capability CAP_SYS_ADMIN is required. The device permissions are read/write for the device owner root. There is no need for this restriction. Make the device access permission read/write for all and reduce the capabilities to CAP_PERFMON. Any user space program with the CAP_PERFMON capability assigned to it can now read and display the CPU Measurement facility counter sets. For more details on perf tool usage and security, see linux documentation in Documentation/admin-guide/perf-security.rst. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-20	s390/module: fix rela calculation for R_390_GOTENT	Sumanth Korikkar
	During module load, module layout allocation occurs by initially allowing the architecture to frob the sections. This is performed via module_frob_arch_sections(). However, the size of each module memory types like text,data,rodata etc are updated correctly only after layout_sections(). After calculation of required module memory sizes for each types, move_module() is responsible for allocating the module memory for each type from modules vaddr range. Considering the sequence above, module_frob_arch_sections() updates the module mod_arch_specific got_offset before module memory text type size is fully updated in layout_sections(). Hence mod_arch_specific got_offset points to currently zero. As per s390 ABI, R_390_GOTENT : (G + O + A - P) >> 1 where G=me->mem[MOD_TEXT].base+me->arch.got_offset O=info->got_offset A=rela->r_addend P=loc fix R_390_GOTENT calculation in apply_rela(). Note: currently this doesn't break anything because me->arch.got_offset is zero. However, reordering of functions in the future could break it. Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-20	s390/boot: fix physmem_info virtual vs physical address confusion	Alexander Gordeev
	Fix virtual vs physical address confusion (which currently are the same). Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-20	s390/kasan: avoid short by one page shadow memory	Alexander Gordeev
	Kernel Address Sanitizer uses 3 bits per byte to encode memory. That is the number of bits the start and end address of a memory range is shifted right when the corresponding shadow memory is created for that memory range. The used memory mapping routine expects page-aligned addresses, while the above described 3-bit shift might turn the shadow memory range start and end boundaries into non-page-aligned in case the size of the original memory range is less than (PAGE_SIZE << 3). As result, the resulting shadow memory range could be short on one page. Align on page boundary the start and end addresses when mapping a shadow memory range and avoid the described issue in the future. Note, that does not fix a real problem, since currently no virtual regions of size less than (PAGE_SIZE << 3) exist. Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-20	s390/kasan: fix insecure W+X mapping warning	Alexander Gordeev
	Since commit 3b5c3f000c2e ("s390/kasan: move shadow mapping to decompressor") the decompressor establishes mappings for the shadow memory and sets initial protection attributes to RWX. The decompressed kernel resets protection to RW+NX later on. In case a shadow memory range is not aligned on page boundary (e.g. as result of mem= kernel command line parameter use), the "Checked W+X mappings: FAILED, 1 W+X pages found" warning hits. Reported-by: Vasily Gorbik <gor@linux.ibm.com> Fixes: 557b19709da9 ("s390/kasan: move shadow mapping to decompressor") Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-20	s390/crash: use the correct type for memory allocation	Christophe JAILLET
	get_elfcorehdr_size() returns a size_t, so there is no real point to store it in a u32. Turn 'alloc_size' into a size_t. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/0756118c9058338f3040edb91971d0bfd100027b.1686688212.git.christophe.jaillet@wanadoo.fr Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-20	btrfs: fix u32 overflows when left shifting stripe_nr	Qu Wenruo
	[BUG] David reported an ASSERT() get triggered during fio load on 8 devices with data/raid6 and metadata/raid1c3: fio --rw=randrw --randrepeat=1 --size=3000m \ --bsrange=512b-64k --bs_unaligned \ --ioengine=libaio --fsync=1024 \ --name=job0 --name=job1 \ The ASSERT() is from rbio_add_bio() of raid56.c: ASSERT(orig_logical >= full_stripe_start && orig_logical + orig_len <= full_stripe_start + rbio->nr_data * BTRFS_STRIPE_LEN); Which is checking if the target rbio is crossing the full stripe boundary. [100.789] assertion failed: orig_logical >= full_stripe_start && orig_logical + orig_len <= full_stripe_start + rbio->nr_data * BTRFS_STRIPE_LEN, in fs/btrfs/raid56.c:1622 [100.795] ------------[ cut here ]------------ [100.796] kernel BUG at fs/btrfs/raid56.c:1622! [100.797] invalid opcode: 0000 [#1] PREEMPT SMP KASAN [100.798] CPU: 1 PID: 100 Comm: kworker/u8:4 Not tainted 6.4.0-rc6-default+ #124 [100.799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014 [100.802] Workqueue: writeback wb_workfn (flush-btrfs-1) [100.803] RIP: 0010:rbio_add_bio+0x204/0x210 [btrfs] [100.806] RSP: 0018:ffff888104a8f300 EFLAGS: 00010246 [100.808] RAX: 00000000000000a1 RBX: ffff8881075907e0 RCX: ffffed1020951e01 [100.809] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000001 [100.811] RBP: 0000000141d20000 R08: 0000000000000001 R09: ffff888104a8f04f [100.813] R10: ffffed1020951e09 R11: 0000000000000003 R12: ffff88810e87f400 [100.815] R13: 0000000041d20000 R14: 0000000144529000 R15: ffff888101524000 [100.817] FS: 0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000 [100.821] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [100.822] CR2: 000055d54e44c270 CR3: 000000010a9a1006 CR4: 00000000003706a0 [100.824] Call Trace: [100.825] <TASK> [100.825] ? die+0x32/0x80 [100.826] ? do_trap+0x12d/0x160 [100.827] ? rbio_add_bio+0x204/0x210 [btrfs] [100.827] ? rbio_add_bio+0x204/0x210 [btrfs] [100.829] ? do_error_trap+0x90/0x130 [100.830] ? rbio_add_bio+0x204/0x210 [btrfs] [100.831] ? handle_invalid_op+0x2c/0x30 [100.833] ? rbio_add_bio+0x204/0x210 [btrfs] [100.835] ? exc_invalid_op+0x29/0x40 [100.836] ? asm_exc_invalid_op+0x16/0x20 [100.837] ? rbio_add_bio+0x204/0x210 [btrfs] [100.837] raid56_parity_write+0x64/0x270 [btrfs] [100.838] btrfs_submit_chunk+0x26e/0x800 [btrfs] [100.840] ? btrfs_bio_init+0x80/0x80 [btrfs] [100.841] ? release_pages+0x503/0x6d0 [100.842] ? folio_unlock+0x2f/0x60 [100.844] ? __folio_put+0x60/0x60 [100.845] ? btrfs_do_readpage+0xae0/0xae0 [btrfs] [100.847] btrfs_submit_bio+0x21/0x60 [btrfs] [100.847] submit_one_bio+0x6a/0xb0 [btrfs] [100.849] extent_write_cache_pages+0x395/0x680 [btrfs] [100.850] ? __extent_writepage+0x520/0x520 [btrfs] [100.851] ? mark_usage+0x190/0x190 [100.852] extent_writepages+0xdb/0x130 [btrfs] [100.853] ? extent_write_locked_range+0x480/0x480 [btrfs] [100.854] ? mark_usage+0x190/0x190 [100.854] ? attach_extent_buffer_page+0x220/0x220 [btrfs] [100.855] ? reacquire_held_locks+0x178/0x280 [100.856] ? writeback_sb_inodes+0x245/0x7f0 [100.857] do_writepages+0x102/0x2e0 [100.858] ? page_writeback_cpu_online+0x10/0x10 [100.859] ? __lock_release.isra.0+0x14a/0x4d0 [100.860] ? reacquire_held_locks+0x280/0x280 [100.861] ? __lock_acquired+0x1e9/0x3d0 [100.862] ? do_raw_spin_lock+0x1b0/0x1b0 [100.863] __writeback_single_inode+0x94/0x450 [100.864] writeback_sb_inodes+0x372/0x7f0 [100.864] ? lock_sync+0xd0/0xd0 [100.865] ? do_raw_spin_unlock+0x93/0xf0 [100.866] ? sync_inode_metadata+0xc0/0xc0 [100.867] ? rwsem_optimistic_spin+0x340/0x340 [100.868] __writeback_inodes_wb+0x70/0x130 [100.869] wb_writeback+0x2d1/0x530 [100.869] ? __writeback_inodes_wb+0x130/0x130 [100.870] ? lockdep_hardirqs_on_prepare.part.0+0xf1/0x1c0 [100.870] wb_do_writeback+0x3eb/0x480 [100.871] ? wb_writeback+0x530/0x530 [100.871] ? mark_lock_irq+0xcd0/0xcd0 [100.872] wb_workfn+0xe0/0x3f0< [CAUSE] Commit a97699d1d610 ("btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN") changes how we calculate the map length, to reduce u64 division. Function btrfs_max_io_len() is to get the length to the stripe boundary. It calculates the full stripe start offset (inside the chunk) by the following code: full_stripe_start = rounddown(stripe_nr, nr_data_stripes(map)) << BTRFS_STRIPE_LEN_SHIFT; The calculation itself is fine, but the value returned by rounddown() is dependent on both @stripe_nr (which is u32) and nr_data_stripes() (which returned int). Thus the result is also u32, then we do the left shift, which can overflow u32. If such overflow happens, @full_stripe_start will be a value way smaller than @offset, causing later "full_stripe_len - (offset - *full_stripe_start)" to underflow, thus make later length calculation to have no stripe boundary limit, resulting a write bio to exceed stripe boundary. There are some other locations like this, with a u32 @stripe_nr got left shift, which can lead to a similar overflow. [FIX] Fix all @stripe_nr with left shift with a type cast to u64 before the left shift. Those involved @stripe_nr or similar variables are recording the stripe number inside the chunk, which is small enough to be contained by u32, but their offset inside the chunk can not fit into u32. Thus for those specific left shifts, a type cast to u64 is necessary so this patch does not touch them and the code will be cleaned up in the future to keep the fix minimal. Reported-by: David Sterba <dsterba@suse.com> Fixes: a97699d1d610 ("btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN") Tested-by: David Sterba <dsterba@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-20	io_uring: add helpers to decode the fixed file file_ptr	Christoph Hellwig
	Remove all the open coded magic on slot->file_ptr by introducing two helpers that return the file pointer and the flags instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	io_uring: use io_file_from_index in io_msg_grab_file	Christoph Hellwig
	Use io_file_from_index instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	io_uring: use io_file_from_index in __io_sync_cancel	Christoph Hellwig
	Use io_file_from_index instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	io_uring: return REQ_F_ flags from io_file_get_flags	Christoph Hellwig
	Two of the three callers want them, so return the more usual format, and shift into the FFS_ form only for the fixed file table. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	io_uring: remove io_req_ffs_set	Christoph Hellwig
	Just checking the flag directly makes it a lot more obvious what is going on here. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	io_uring: remove a confusing comment above io_file_get_flags	Christoph Hellwig
	The SCM inflight mechanism has nothing to do with the fact that a file might be a regular file or not and if it supports non-blocking operations. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	io_uring: remove the mode variable in io_file_get_flags	Christoph Hellwig
	The variable is only once now, so don't bother with it. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	io_uring: remove __io_file_supports_nowait	Christoph Hellwig
	Now that this only checks O_NONBLOCK and FMODE_NOWAIT, the helper is complete overkilļ, and the comments are confusing bordering to wrong. Just inline the check into the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	accel/qaic: Call DRM helper function to destroy prime GEM	Pranjal Ramajor Asha Kanojiya
	smatch warning: drivers/accel/qaic/qaic_data.c:620 qaic_free_object() error: dereferencing freed memory 'obj->import_attach' obj->import_attach is detached and freed using dma_buf_detach(). But used after free to decrease the dmabuf ref count using dma_buf_put(). drm_prime_gem_destroy() handles this issue and performs the proper clean up instead of open coding it in the driver. Fixes: ff13be830333 ("accel/qaic: Add datapath") Reported-by: Sukrut Bellary <sukrut.bellary@linux.com> Closes: https://lore.kernel.org/all/20230610021200.377452-1-sukrut.bellary@linux.com/ Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230614161528.11710-1-quic_jhugo@quicinc.com
2023-06-20	reiserfs: fix blkdev_put() warning from release_journal_dev()	Yu Kuai
	In journal_init_dev(), if super bdev is used as 'j_dev_bd', then blkdev_get_by_dev() is called with NULL holder, otherwise, holder will be journal. However, later in release_journal_dev(), blkdev_put() is called with journal unconditionally, cause following warning: WARNING: CPU: 1 PID: 5034 at block/bdev.c:617 bd_end_claim block/bdev.c:617 [inline] WARNING: CPU: 1 PID: 5034 at block/bdev.c:617 blkdev_put+0x562/0x8a0 block/bdev.c:901 RIP: 0010:blkdev_put+0x562/0x8a0 block/bdev.c:901 Call Trace: <TASK> release_journal_dev fs/reiserfs/journal.c:2592 [inline] free_journal_ram+0x421/0x5c0 fs/reiserfs/journal.c:1896 do_journal_release fs/reiserfs/journal.c:1960 [inline] journal_release+0x276/0x630 fs/reiserfs/journal.c:1971 reiserfs_put_super+0xe4/0x5c0 fs/reiserfs/super.c:616 generic_shutdown_super+0x158/0x480 fs/super.c:499 kill_block_super+0x64/0xb0 fs/super.c:1422 deactivate_locked_super+0x98/0x160 fs/super.c:330 deactivate_super+0xb1/0xd0 fs/super.c:361 cleanup_mnt+0x2ae/0x3d0 fs/namespace.c:1247 task_work_run+0x16f/0x270 kernel/task_work.c:179 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0xadc/0x2a30 kernel/exit.c:874 do_group_exit+0xd4/0x2a0 kernel/exit.c:1024 __do_sys_exit_group kernel/exit.c:1035 [inline] __se_sys_exit_group kernel/exit.c:1033 [inline] __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1033 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fix this problem by passing in NULL holder in this case. Reported-by: syzbot+04625c80899f4555de39@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=04625c80899f4555de39 Fixes: 2736e8eeb0cc ("block: use the holder as indication for exclusive opens") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620111322.1014775-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions()	Yu Kuai
	After commit 2736e8eeb0cc ("block: use the holder as indication for exclusive opens"), blkdev_get_by_dev() will warn if holder is NULL and mode contains 'FMODE_EXCL'. holder from blkdev_get_by_dev() from disk_scan_partitions() is always NULL, hence it should not use 'FMODE_EXCL', which is broben by the commit. For consequence, WARN_ON_ONCE() will be triggered from blkdev_get_by_dev() if user scan partitions with device opened exclusively. Fix this problem by removing 'FMODE_EXCL' from disk_scan_partitions(), as it used to be. Reported-by: syzbot+00cd27751f78817f167b@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=00cd27751f78817f167b Fixes: 2736e8eeb0cc ("block: use the holder as indication for exclusive opens") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230618140402.7556-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	block: document the holder argument to blkdev_get_by_path	Christoph Hellwig
	Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620043536.707249-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	block: increment diskseq on all media change events	Demi Marie Obenour
	Currently, associating a loop device with a different file descriptor does not increment its diskseq. This allows the following race condition: 1. Program X opens a loop device 2. Program X gets the diskseq of the loop device. 3. Program X associates a file with the loop device. 4. Program X passes the loop device major, minor, and diskseq to something. 5. Program X exits. 6. Program Y detaches the file from the loop device. 7. Program Y attaches a different file to the loop device. 8. The opener finally gets around to opening the loop device and checks that the diskseq is what it expects it to be. Even though the diskseq is the expected value, the result is that the opener is accessing the wrong file. From discussions with Christoph Hellwig, it appears that disk_force_media_change() was supposed to call inc_diskseq(), but in fact it does not. Adding a Fixes: tag to indicate this. Christoph's Reported-by is because he stated that disk_force_media_change() calls inc_diskseq(), which is what led me to discover that it should but does not. Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> Fixes: e6138dc12de9 ("block: add a helper to raise a media changed event") Cc: stable@vger.kernel.org # 5.15+ Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230607170837.1559-1-demi@invisiblethingslab.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	swim: fix a missing FMODE_ -> BLK_OPEN_ conversion in floppy_open	Christoph Hellwig
	Fix a missing conversion to the new BLK_OPEN constant in swim. Fixes: 05bdb9965305 ("block: replace fmode_t with a block-specific type for block open flags") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620043051.707196-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	x86/smp: Put CPUs into INIT on shutdown if possible	Thomas Gleixner
	Parking CPUs in a HLT loop is not completely safe vs. kexec() as HLT can resume execution due to NMI, SMI and MCE, which has the same issue as the MWAIT loop. Kicking the secondary CPUs into INIT makes this safe against NMI and SMI. A broadcast MCE will take the machine down, but a broadcast MCE which makes HLT resume and execute overwritten text, pagetables or data will end up in a disaster too. So chose the lesser of two evils and kick the secondary CPUs into INIT unless the system has installed special wakeup mechanisms which are not using INIT. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230615193330.608657211@linutronix.de
2023-06-20	x86/smp: Split sending INIT IPI out into a helper function	Thomas Gleixner
	Putting CPUs into INIT is a safer place during kexec() to park CPUs. Split the INIT assert/deassert sequence out so it can be reused. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Link: https://lore.kernel.org/r/20230615193330.551157083@linutronix.de
2023-06-20	x86/smp: Cure kexec() vs. mwait_play_dead() breakage	Thomas Gleixner
	TLDR: It's a mess. When kexec() is executed on a system with offline CPUs, which are parked in mwait_play_dead() it can end up in a triple fault during the bootup of the kexec kernel or cause hard to diagnose data corruption. The reason is that kexec() eventually overwrites the previous kernel's text, page tables, data and stack. If it writes to the cache line which is monitored by a previously offlined CPU, MWAIT resumes execution and ends up executing the wrong text, dereferencing overwritten page tables or corrupting the kexec kernels data. Cure this by bringing the offlined CPUs out of MWAIT into HLT. Write to the monitored cache line of each offline CPU, which makes MWAIT resume execution. The written control word tells the offlined CPUs to issue HLT, which does not have the MWAIT problem. That does not help, if a stray NMI, MCE or SMI hits the offlined CPUs as those make it come out of HLT. A follow up change will put them into INIT, which protects at least against NMI and SMI. Fixes: ea53069231f9 ("x86, hotplug: Use mwait to offline a processor, fix the legacy case") Reported-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230615193330.492257119@linutronix.de
2023-06-20	x86/smp: Use dedicated cache-line for mwait_play_dead()	Thomas Gleixner
	Monitoring idletask::thread_info::flags in mwait_play_dead() has been an obvious choice as all what is needed is a cache line which is not written by other CPUs. But there is a use case where a "dead" CPU needs to be brought out of MWAIT: kexec(). This is required as kexec() can overwrite text, pagetables, stacks and the monitored cacheline of the original kernel. The latter causes MWAIT to resume execution which obviously causes havoc on the kexec kernel which results usually in triple faults. Use a dedicated per CPU storage to prepare for that. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230615193330.434553750@linutronix.de
2023-06-20	x86/smp: Remove pointless wmb()s from native_stop_other_cpus()	Thomas Gleixner
	The wmb()s before sending the IPIs are not synchronizing anything. If at all then the apic IPI functions have to provide or act as appropriate barriers. Remove these cargo cult barriers which have no explanation of what they are synchronizing. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230615193330.378358382@linutronix.de
2023-06-20	x86/smp: Dont access non-existing CPUID leaf	Tony Battersby
	stop_this_cpu() tests CPUID leaf 0x8000001f::EAX unconditionally. Intel CPUs return the content of the highest supported leaf when a non-existing leaf is read, while AMD CPUs return all zeros for unsupported leafs. So the result of the test on Intel CPUs is lottery. While harmless it's incorrect and causes the conditional wbinvd() to be issued where not required. Check whether the leaf is supported before reading it. [ tglx: Adjusted changelog ] Fixes: 08f253ec3767 ("x86/cpu: Clear SME feature flag when not in use") Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/3817d810-e0f1-8ef8-0bbd-663b919ca49b@cybernetics.com Link: https://lore.kernel.org/r/20230615193330.322186388@linutronix.de
2023-06-20	x86/smp: Make stop_other_cpus() more robust	Thomas Gleixner
	Tony reported intermittent lockups on poweroff. His analysis identified the wbinvd() in stop_this_cpu() as the culprit. This was added to ensure that on SME enabled machines a kexec() does not leave any stale data in the caches when switching from encrypted to non-encrypted mode or vice versa. That wbinvd() is conditional on the SME feature bit which is read directly from CPUID. But that readout does not check whether the CPUID leaf is available or not. If it's not available the CPU will return the value of the highest supported leaf instead. Depending on the content the "SME" bit might be set or not. That's incorrect but harmless. Making the CPUID readout conditional makes the observed hangs go away, but it does not fix the underlying problem: CPU0 CPU1 stop_other_cpus() send_IPIs(REBOOT); stop_this_cpu() while (num_online_cpus() > 1); set_online(false); proceed... -> hang wbinvd() WBINVD is an expensive operation and if multiple CPUs issue it at the same time the resulting delays are even larger. But CPU0 already observed num_online_cpus() going down to 1 and proceeds which causes the system to hang. This issue exists independent of WBINVD, but the delays caused by WBINVD make it more prominent. Make this more robust by adding a cpumask which is initialized to the online CPU mask before sending the IPIs and CPUs clear their bit in stop_this_cpu() after the WBINVD completed. Check for that cpumask to become empty in stop_other_cpus() instead of watching num_online_cpus(). The cpumask cannot plug all holes either, but it's better than a raw counter and allows to restrict the NMI fallback IPI to be sent only the CPUs which have not reported within the timeout window. Fixes: 08f253ec3767 ("x86/cpu: Clear SME feature flag when not in use") Reported-by: Tony Battersby <tonyb@cybernetics.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/3817d810-e0f1-8ef8-0bbd-663b919ca49b@cybernetics.com Link: https://lore.kernel.org/r/87h6r770bv.ffs@tglx
2023-06-20	Merge tag 'ipsec-2023-06-20' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec ipsec-2023-06-20
2023-06-20	fs: Provide helpers for manipulating sb->s_readonly_remount	Jan Kara
	Provide helpers to set and clear sb->s_readonly_remount including appropriate memory barriers. Also use this opportunity to document what the barriers pair with and why they are needed. Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Dave Chinner <dchinner@redhat.com> Message-Id: <20230620112832.5158-1-jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-06-20	mmc: Add MMC_QUIRK_BROKEN_SD_CACHE for Kingston Canvas Go Plus from 11/2019	Marek Vasut
	This microSD card never clears Flush Cache bit after cache flush has been started in sd_flush_cache(). This leads e.g. to failure to mount file system. Add a quirk which disables the SD cache for this specific card from specific manufacturing date of 11/2019, since on newer dated cards from 05/2023 the cache flush works correctly. Fixes: 08ebf903af57 ("mmc: core: Fixup support for writeback-cache for eMMC and SD") Signed-off-by: Marek Vasut <marex@denx.de> Link: https://lore.kernel.org/r/20230620102713.7701-1-marex@denx.de Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2023-06-20	cpufreq: amd-pstate: Set default governor to schedutil	Mario Limonciello
	The Kconfig currently defaults the governor to schedutil on x86_64 only when intel-pstate and SMP have been selected. If the kernel is built only with amd-pstate, the default governor should also be schedutil. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Leo Li <sunpeng.li@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Tested-by: Perry Yuan <Perry.Yuan@amd.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-20	mmc: core: disable TRIM on Kingston EMMC04G-M627	Robert Marko
	It seems that Kingston EMMC04G-M627 despite advertising TRIM support does not work when the core is trying to use REQ_OP_WRITE_ZEROES. We are seeing I/O errors in OpenWrt under 6.1 on Zyxel NBG7815 that we did not previously have and tracked it down to REQ_OP_WRITE_ZEROES. Trying to use fstrim seems to also throw errors like: [93010.835112] I/O error, dev loop0, sector 16902 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 2 Disabling TRIM makes the error go away, so lets add a quirk for this eMMC to disable TRIM. Signed-off-by: Robert Marko <robimarko@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230619193621.437358-1-robimarko@gmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2023-06-20	mmc: mmci: stm32: add delay block support for STM32MP25	Yann Gautier
	On STM32MP25, the delay block is inside the SoC, and configured through the SYSCFG registers. The algorithm is also different from what was in STM32MP1 chip. Signed-off-by: Yann Gautier <yann.gautier@foss.st.com> Link: https://lore.kernel.org/r/20230619115120.64474-7-yann.gautier@foss.st.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2023-06-20	mmc: mmci: stm32: prepare other delay block support	Yann Gautier
	Create an sdmmc_tuning_ops struct to ease support for another delay block peripheral. Signed-off-by: Yann Gautier <yann.gautier@foss.st.com> Link: https://lore.kernel.org/r/20230619115120.64474-6-yann.gautier@foss.st.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2023-06-20	mmc: mmci: stm32: manage block gap hardware flow control	Yann Gautier
	In stm32 sdmmc variant revision v3.0, a block gap hardware flow control should be used with bus speed modes SDR104 and HS200. It is enabled by writing a non-null value to the new added register MMCI_STM32_FIFOTHRR. The threshold will be 2^(N-1) bytes, so we can use the ffs() function to compute the value N to be written to the register. The threshold used should be the data block size, but must not be bigger than the FIFO size. Signed-off-by: Yann Gautier <yann.gautier@foss.st.com> Link: https://lore.kernel.org/r/20230619115120.64474-5-yann.gautier@foss.st.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2023-06-20	mmc: mmci: Add support for sdmmc variant revision v3.0	Yann Gautier
	This is an update of the SDMMC revision v2.2, with just an increased FIFO size, from 64B to 1kB. Signed-off-by: Yann Gautier <yann.gautier@foss.st.com> Link: https://lore.kernel.org/r/20230619115120.64474-4-yann.gautier@foss.st.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2023-06-20	mmc: mmci: add stm32_idmabsize_align parameter	Yann Gautier
	The alignment for the IDMA size depends on the peripheral version, it should then be configurable. Add stm32_idmabsize_align in the variant structure. And remove now unused (and wrong) MMCI_STM32_IDMABNDT_* macros. Signed-off-by: Yann Gautier <yann.gautier@foss.st.com> Link: https://lore.kernel.org/r/20230619115120.64474-3-yann.gautier@foss.st.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2023-06-20	dt-bindings: mmc: mmci: Add st,stm32mp25-sdmmc2 compatible	Yann Gautier
	For STM32MP25, we'll need to distinguish how is managed the delay block. This is done through a new comptible dedicated for this SoC, as the delay block registers are located in SYSCFG peripheral. Signed-off-by: Yann Gautier <yann.gautier@foss.st.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20230619115120.64474-2-yann.gautier@foss.st.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2023-06-20	Merge tag 'devfreq-next-for-6.5' of ↵	Rafael J. Wysocki
	git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux Merge devfreq updates for v6.5 from Chanwoo Choi: "1. Reorder fieldls in 'struct devfreq_dev_status' in order to shrink the size of 'struct devfreqw_dev_status' without any behavior changes. 2. Add exynos-ppmu.c driver as a soft module dependency in order to prevent the freeze issue between exynos-bus.c devfreq driver and exynos-ppmu.c devfreq event driver. 3. Fix variable deferencing before NULL check on mtk-cci-devfreq.c" * tag 'devfreq-next-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux: PM / devfreq: mtk-cci: Fix variable deferencing before NULL check PM / devfreq: exynos: add Exynos PPMU as a soft module dependency PM / devfreq: Reorder fields in 'struct devfreq_dev_status'