linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2023-06-20	net: fec: allow to build without PAGE_POOL_STATS	Lucas Stach
	Commit 6970ef27ff7f ("net: fec: add xdp and page pool statistics") selected CONFIG_PAGE_POOL_STATS from the FEC driver symbol, making it impossible to build without the page pool statistics when this driver is enabled. The help text of those statistics mentions increased overhead. Allow the user to choose between usefulness of the statistics and the added overhead. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230616191832.2944130-1-l.stach@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20	crypto: af_alg/hash: Fix recvmsg() after sendmsg(MSG_MORE)	David Howells
	If an AF_ALG socket bound to a hashing algorithm is sent a zero-length message with MSG_MORE set and then recvmsg() is called without first sending another message without MSG_MORE set to end the operation, an oops will occur because the crypto context and result doesn't now get set up in advance because hash_sendmsg() now defers that as long as possible in the hope that it can use crypto_ahash_digest() - and then because the message is zero-length, it the data wrangling loop is skipped. Fix this by handling zero-length sends at the top of the hash_sendmsg() function. If we're not continuing the previous sendmsg(), then just ignore the send (hash_recvmsg() will invent something when called); if we are continuing, then we finalise the request at this point if MSG_MORE is not set to get any error here, otherwise the send is of no effect and can be ignored. Whilst we're at it, remove the code to create a kvmalloc'd scatterlist if we get more than ALG_MAX_PAGES - this shouldn't happen. Fixes: c662b043cdca ("crypto: af_alg/hash: Support MSG_SPLICE_PAGES") Reported-by: syzbot+13a08c0bf4d212766c3c@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/000000000000b928f705fdeb873a@google.com/ Reported-by: syzbot+14234ccf6d0ef629ec1a@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/000000000000c047db05fdeb8790@google.com/ Reported-by: syzbot+4e2e47f32607d0f72d43@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/000000000000bcca3205fdeb87fb@google.com/ Reported-by: syzbot+472626bb5e7c59fb768f@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/000000000000b55d8805fdeb8385@google.com/ Signed-off-by: David Howells <dhowells@redhat.com> Reported-and-tested-by: syzbot+6efc50cc1f8d718d6cb7@syzkaller.appspotmail.com cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/427646.1686913832@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20	clk: Fix memory leak in devm_clk_notifier_register()	Fei Shao
	devm_clk_notifier_register() allocates a devres resource for clk notifier but didn't register that to the device, so the notifier didn't get unregistered on device detach and the allocated resource was leaked. Fix the issue by registering the resource through devres_add(). This issue was found with kmemleak on a Chromebook. Fixes: 6d30d50d037d ("clk: add devm variant of clk_notifier_register") Signed-off-by: Fei Shao <fshao@chromium.org> Link: https://lore.kernel.org/r/20230619112253.v2.1.I13f060c10549ef181603e921291bdea95f83033c@changeid Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2023-06-20	net: phy: mediatek: fix compile-test dependencies	Arnd Bergmann
	The new phy driver attempts to select a driver from another subsystem, but that fails when the NVMEM subsystem is disabled: WARNING: unmet direct dependencies detected for NVMEM_MTK_EFUSE Depends on [n]: NVMEM [=n] && (ARCH_MEDIATEK [=n] \|\| COMPILE_TEST [=y]) && HAS_IOMEM [=y] Selected by [y]: - MEDIATEK_GE_SOC_PHY [=y] && NETDEVICES [=y] && PHYLIB [=y] && (ARM64 && ARCH_MEDIATEK [=n] \|\| COMPILE_TEST [=y]) I could not see an actual compile time dependency, so presumably this is only needed for for working correctly but not technically a dependency on that particular nvmem driver implementation, so it would likely be safe to remove the select for compile testing. To keep the spirit of the original 'select', just replace this with a 'depends on' that ensures that the driver will work but does not get in the way of build testing. Fixes: 98c485eaf509b ("net: phy: add driver for MediaTek SoC built-in GE PHYs") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Reviewed-by: Daniel Golle <daniel@makrotopia.org> Link: https://lore.kernel.org/r/20230616093009.3511692-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20	clk: mvebu: Iterate over possible CPUs instead of DT CPU nodes	Rob Herring
	Rework iterating over DT CPU nodes to iterate over possible CPUs instead. There's no need to walk the DT CPU nodes again. Possible CPUs is equal to the number of CPUs defined in the DT. Using the "reg" value for an array index is fragile as it assumes "reg" is 0-N which often is not the case. Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230327-mvebu-clk-fixes-v2-3-8333729ee45d@kernel.org Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2023-06-20	clk: mvebu: Use of_get_cpu_hwid() to read CPU ID	Rob Herring
	Use of_get_cpu_hwid() rather than the open coded reading of the CPU nodes "reg" property. The existing code is in fact wrong as the "reg" address cells size is 2 cells for arm64. The existing code happens to work because the DTS files are wrong as well. Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230327-mvebu-clk-fixes-v2-2-8333729ee45d@kernel.org Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2023-06-20	MAINTAINERS: Add Marvell mvebu clock drivers	Rob Herring
	drivers/clk/mvebu/ is missing a maintainers entry. Add it to the existing entry for the Marvell mvebu platforms. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230327-mvebu-clk-fixes-v2-1-8333729ee45d@kernel.org Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2023-06-20	block: add capacity validation in bdev_add_partition()	Min Li
	In the function bdev_add_partition(),there is no check that the start and end sectors exceed the size of the disk before calling add_partition. When we call the block's ioctl interface directly to add a partition, and the capacity of the disk is set to 0 by driver,the command will continue to execute. Signed-off-by: Min Li <min15.li@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230619091214.31615-1-min15.li@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	Merge tag '6.4-rc6-smb3-server-fixes' of git://git.samba.org/ksmbd	Linus Torvalds
	Pull smb server fixes from Steve French: "Four smb3 server fixes, all also for stable: - fix potential oops in parsing compounded requests - fix various paths (mkdir, create etc) where mnt_want_write was not checked first - fix slab out of bounds in check_message and write" * tag '6.4-rc6-smb3-server-fixes' of git://git.samba.org/ksmbd: ksmbd: validate session id and tree id in the compound request ksmbd: fix out-of-bound read in smb2_write ksmbd: add mnt_want_write to ksmbd vfs functions ksmbd: validate command payload size
2023-06-20	block: fine-granular CAP_SYS_ADMIN for Persistent Reservation	Jingbo Xu
	Allow of unprivileged Persistent Reservation operations on devices if the write permission check on the device node has passed. brw-rw---- 1 root disk 259, 0 Jun 13 07:09 /dev/nvme0n1 In the example above, the "disk" group of nvme0n1 is also allowed to make reservations on the device even without CAP_SYS_ADMIN. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230613084008.93795-3-jefflexu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	block: disallow Persistent Reservation on partitions	Jingbo Xu
	Refuse Persistent Reservation operations on partitions as reservation on partitions doesn't make sense. Besides, introduce blkdev_pr_allowed() helper, where more policies could be placed here later. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230613084008.93795-2-jefflexu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	s390/cpum_cf: open access to hwctr device for CAP_PERFMON privileged process	Thomas Richter
	The device /dev/hwctr was introduced to access complete CPU Measurement facility counter sets via an ioctl system call. The access the to device is limited to privileged processes running as root or superuser. The capability CAP_SYS_ADMIN is required. The device permissions are read/write for the device owner root. There is no need for this restriction. Make the device access permission read/write for all and reduce the capabilities to CAP_PERFMON. Any user space program with the CAP_PERFMON capability assigned to it can now read and display the CPU Measurement facility counter sets. For more details on perf tool usage and security, see linux documentation in Documentation/admin-guide/perf-security.rst. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-20	s390/module: fix rela calculation for R_390_GOTENT	Sumanth Korikkar
	During module load, module layout allocation occurs by initially allowing the architecture to frob the sections. This is performed via module_frob_arch_sections(). However, the size of each module memory types like text,data,rodata etc are updated correctly only after layout_sections(). After calculation of required module memory sizes for each types, move_module() is responsible for allocating the module memory for each type from modules vaddr range. Considering the sequence above, module_frob_arch_sections() updates the module mod_arch_specific got_offset before module memory text type size is fully updated in layout_sections(). Hence mod_arch_specific got_offset points to currently zero. As per s390 ABI, R_390_GOTENT : (G + O + A - P) >> 1 where G=me->mem[MOD_TEXT].base+me->arch.got_offset O=info->got_offset A=rela->r_addend P=loc fix R_390_GOTENT calculation in apply_rela(). Note: currently this doesn't break anything because me->arch.got_offset is zero. However, reordering of functions in the future could break it. Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-20	s390/boot: fix physmem_info virtual vs physical address confusion	Alexander Gordeev
	Fix virtual vs physical address confusion (which currently are the same). Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-20	s390/kasan: avoid short by one page shadow memory	Alexander Gordeev
	Kernel Address Sanitizer uses 3 bits per byte to encode memory. That is the number of bits the start and end address of a memory range is shifted right when the corresponding shadow memory is created for that memory range. The used memory mapping routine expects page-aligned addresses, while the above described 3-bit shift might turn the shadow memory range start and end boundaries into non-page-aligned in case the size of the original memory range is less than (PAGE_SIZE << 3). As result, the resulting shadow memory range could be short on one page. Align on page boundary the start and end addresses when mapping a shadow memory range and avoid the described issue in the future. Note, that does not fix a real problem, since currently no virtual regions of size less than (PAGE_SIZE << 3) exist. Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-20	s390/kasan: fix insecure W+X mapping warning	Alexander Gordeev
	Since commit 3b5c3f000c2e ("s390/kasan: move shadow mapping to decompressor") the decompressor establishes mappings for the shadow memory and sets initial protection attributes to RWX. The decompressed kernel resets protection to RW+NX later on. In case a shadow memory range is not aligned on page boundary (e.g. as result of mem= kernel command line parameter use), the "Checked W+X mappings: FAILED, 1 W+X pages found" warning hits. Reported-by: Vasily Gorbik <gor@linux.ibm.com> Fixes: 557b19709da9 ("s390/kasan: move shadow mapping to decompressor") Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-20	s390/crash: use the correct type for memory allocation	Christophe JAILLET
	get_elfcorehdr_size() returns a size_t, so there is no real point to store it in a u32. Turn 'alloc_size' into a size_t. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/0756118c9058338f3040edb91971d0bfd100027b.1686688212.git.christophe.jaillet@wanadoo.fr Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-20	clk: qcom: mmcc-msm8974: fix MDSS_GDSC power flags	Dmitry Baryshkov
	Using PWRSTS_RET on msm8974's MDSS_GDSC causes display to stop working. The gdsc doesn't fully come out of retention mode. Change it's pwrsts flags to PWRSTS_OFF_ON. Fixes: d399723950c4 ("clk: qcom: gdsc: Fix the handling of PWRSTS_RET support") Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Rajendra Nayak <quic_rjendra@quicinc.com> Tested-by: Luca Weiss <luca@z3ntu.xyz> Link: https://lore.kernel.org/r/20230507175335.2321503-2-dmitry.baryshkov@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2023-06-20	jfs: jfs_dmap: Validate db_l2nbperpage while mounting	Siddh Raman Pant
	In jfs_dmap.c at line 381, BLKTODMAP is used to get a logical block number inside dbFree(). db_l2nbperpage, which is the log2 number of blocks per page, is passed as an argument to BLKTODMAP which uses it for shifting. Syzbot reported a shift out-of-bounds crash because db_l2nbperpage is too big. This happens because the large value is set without any validation in dbMount() at line 181. Thus, make sure that db_l2nbperpage is correct while mounting. Max number of blocks per page = Page size / Min block size => log2(Max num_block per page) = log2(Page size / Min block size) = log2(Page size) - log2(Min block size) => Max db_l2nbperpage = L2PSIZE - L2MINBLOCKSIZE Reported-and-tested-by: syzbot+d2cd27dcf8e04b232eb2@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?id=2a70a453331db32ed491f5cbb07e81bf2d225715 Cc: stable@vger.kernel.org Suggested-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Siddh Raman Pant <code@siddh.me> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-06-20	btrfs: fix u32 overflows when left shifting stripe_nr	Qu Wenruo
	[BUG] David reported an ASSERT() get triggered during fio load on 8 devices with data/raid6 and metadata/raid1c3: fio --rw=randrw --randrepeat=1 --size=3000m \ --bsrange=512b-64k --bs_unaligned \ --ioengine=libaio --fsync=1024 \ --name=job0 --name=job1 \ The ASSERT() is from rbio_add_bio() of raid56.c: ASSERT(orig_logical >= full_stripe_start && orig_logical + orig_len <= full_stripe_start + rbio->nr_data * BTRFS_STRIPE_LEN); Which is checking if the target rbio is crossing the full stripe boundary. [100.789] assertion failed: orig_logical >= full_stripe_start && orig_logical + orig_len <= full_stripe_start + rbio->nr_data * BTRFS_STRIPE_LEN, in fs/btrfs/raid56.c:1622 [100.795] ------------[ cut here ]------------ [100.796] kernel BUG at fs/btrfs/raid56.c:1622! [100.797] invalid opcode: 0000 [#1] PREEMPT SMP KASAN [100.798] CPU: 1 PID: 100 Comm: kworker/u8:4 Not tainted 6.4.0-rc6-default+ #124 [100.799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014 [100.802] Workqueue: writeback wb_workfn (flush-btrfs-1) [100.803] RIP: 0010:rbio_add_bio+0x204/0x210 [btrfs] [100.806] RSP: 0018:ffff888104a8f300 EFLAGS: 00010246 [100.808] RAX: 00000000000000a1 RBX: ffff8881075907e0 RCX: ffffed1020951e01 [100.809] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000001 [100.811] RBP: 0000000141d20000 R08: 0000000000000001 R09: ffff888104a8f04f [100.813] R10: ffffed1020951e09 R11: 0000000000000003 R12: ffff88810e87f400 [100.815] R13: 0000000041d20000 R14: 0000000144529000 R15: ffff888101524000 [100.817] FS: 0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000 [100.821] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [100.822] CR2: 000055d54e44c270 CR3: 000000010a9a1006 CR4: 00000000003706a0 [100.824] Call Trace: [100.825] <TASK> [100.825] ? die+0x32/0x80 [100.826] ? do_trap+0x12d/0x160 [100.827] ? rbio_add_bio+0x204/0x210 [btrfs] [100.827] ? rbio_add_bio+0x204/0x210 [btrfs] [100.829] ? do_error_trap+0x90/0x130 [100.830] ? rbio_add_bio+0x204/0x210 [btrfs] [100.831] ? handle_invalid_op+0x2c/0x30 [100.833] ? rbio_add_bio+0x204/0x210 [btrfs] [100.835] ? exc_invalid_op+0x29/0x40 [100.836] ? asm_exc_invalid_op+0x16/0x20 [100.837] ? rbio_add_bio+0x204/0x210 [btrfs] [100.837] raid56_parity_write+0x64/0x270 [btrfs] [100.838] btrfs_submit_chunk+0x26e/0x800 [btrfs] [100.840] ? btrfs_bio_init+0x80/0x80 [btrfs] [100.841] ? release_pages+0x503/0x6d0 [100.842] ? folio_unlock+0x2f/0x60 [100.844] ? __folio_put+0x60/0x60 [100.845] ? btrfs_do_readpage+0xae0/0xae0 [btrfs] [100.847] btrfs_submit_bio+0x21/0x60 [btrfs] [100.847] submit_one_bio+0x6a/0xb0 [btrfs] [100.849] extent_write_cache_pages+0x395/0x680 [btrfs] [100.850] ? __extent_writepage+0x520/0x520 [btrfs] [100.851] ? mark_usage+0x190/0x190 [100.852] extent_writepages+0xdb/0x130 [btrfs] [100.853] ? extent_write_locked_range+0x480/0x480 [btrfs] [100.854] ? mark_usage+0x190/0x190 [100.854] ? attach_extent_buffer_page+0x220/0x220 [btrfs] [100.855] ? reacquire_held_locks+0x178/0x280 [100.856] ? writeback_sb_inodes+0x245/0x7f0 [100.857] do_writepages+0x102/0x2e0 [100.858] ? page_writeback_cpu_online+0x10/0x10 [100.859] ? __lock_release.isra.0+0x14a/0x4d0 [100.860] ? reacquire_held_locks+0x280/0x280 [100.861] ? __lock_acquired+0x1e9/0x3d0 [100.862] ? do_raw_spin_lock+0x1b0/0x1b0 [100.863] __writeback_single_inode+0x94/0x450 [100.864] writeback_sb_inodes+0x372/0x7f0 [100.864] ? lock_sync+0xd0/0xd0 [100.865] ? do_raw_spin_unlock+0x93/0xf0 [100.866] ? sync_inode_metadata+0xc0/0xc0 [100.867] ? rwsem_optimistic_spin+0x340/0x340 [100.868] __writeback_inodes_wb+0x70/0x130 [100.869] wb_writeback+0x2d1/0x530 [100.869] ? __writeback_inodes_wb+0x130/0x130 [100.870] ? lockdep_hardirqs_on_prepare.part.0+0xf1/0x1c0 [100.870] wb_do_writeback+0x3eb/0x480 [100.871] ? wb_writeback+0x530/0x530 [100.871] ? mark_lock_irq+0xcd0/0xcd0 [100.872] wb_workfn+0xe0/0x3f0< [CAUSE] Commit a97699d1d610 ("btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN") changes how we calculate the map length, to reduce u64 division. Function btrfs_max_io_len() is to get the length to the stripe boundary. It calculates the full stripe start offset (inside the chunk) by the following code: full_stripe_start = rounddown(stripe_nr, nr_data_stripes(map)) << BTRFS_STRIPE_LEN_SHIFT; The calculation itself is fine, but the value returned by rounddown() is dependent on both @stripe_nr (which is u32) and nr_data_stripes() (which returned int). Thus the result is also u32, then we do the left shift, which can overflow u32. If such overflow happens, @full_stripe_start will be a value way smaller than @offset, causing later "full_stripe_len - (offset - *full_stripe_start)" to underflow, thus make later length calculation to have no stripe boundary limit, resulting a write bio to exceed stripe boundary. There are some other locations like this, with a u32 @stripe_nr got left shift, which can lead to a similar overflow. [FIX] Fix all @stripe_nr with left shift with a type cast to u64 before the left shift. Those involved @stripe_nr or similar variables are recording the stripe number inside the chunk, which is small enough to be contained by u32, but their offset inside the chunk can not fit into u32. Thus for those specific left shifts, a type cast to u64 is necessary so this patch does not touch them and the code will be cleaned up in the future to keep the fix minimal. Reported-by: David Sterba <dsterba@suse.com> Fixes: a97699d1d610 ("btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN") Tested-by: David Sterba <dsterba@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-20	riscv: replace deprecated scall with ecall	Fangrui Song
	scall is a deprecated alias for ecall. ecall is used in several places, so there is no assembler compatibility concern. Signed-off-by: Fangrui Song <maskray@google.com> Link: https://lore.kernel.org/r/20230423223210.126948-1-maskray@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-20	PCI: Add failed link recovery for device reset events	Maciej W. Rozycki
	Request failed link recovery with any upstream PCIe bridge where a device has not come back after reset within PCI_RESET_WAIT time. Reset the polling interval if recovery succeeded, otherwise continue as usual. [bhelgaas: inline pcie_parent_link_retrain()] Link: https://lore.kernel.org/r/alpine.DEB.2.21.2306111631050.64925@angie.orcam.me.uk Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-06-20	PCI: Work around PCIe link training failures	Maciej W. Rozycki
	Attempt to handle cases such as with a downstream port of the ASMedia ASM2824 PCIe switch where link training never completes and the link continues switching between speeds indefinitely with the data link layer never reaching the active state. It has been observed with a downstream port of the ASMedia ASM2824 Gen 3 switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2 switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device, P/N 41433, wired to a SiFive HiFive Unmatched board. In this setup the switches should negotiate a link speed of 5.0GT/s, falling back to 2.5GT/s if necessary. Instead the link continues oscillating between the two speeds, at the rate of 34-35 times per second, with link training reported repeatedly active ~84% of the time. Limiting the target link speed to 2.5GT/s with the upstream ASM2824 device makes the two switches communicate correctly. Removing the speed restriction afterwards makes the two devices switch to 5.0GT/s then. Make use of these observations and detect the inability to train the link by checking for the Data Link Layer Link Active status bit being off while the Link Bandwidth Management Status indicating that hardware has changed the link speed or width in an attempt to correct unreliable link operation. Restrict the speed to 2.5GT/s then with the Target Link Speed field, request a retrain and wait 200ms for the data link to go up. If this is successful, lift the restriction, letting the devices negotiate a higher speed. Also check for a 2.5GT/s speed restriction the firmware may have already arranged and lift it too with ports of devices known to continue working afterwards (currently only ASM2824), that already report their data link being up. [bhelgaas: reorder and squash stubs from https://lore.kernel.org/r/alpine.DEB.2.21.2306111619570.64925@angie.orcam.me.uk to avoid adding stubs that do nothing] Link: https://lore.kernel.org/r/alpine.DEB.2.21.2203022037020.56670@angie.orcam.me.uk/ Link: https://source.denx.de/u-boot/u-boot/-/commit/a398a51ccc68 Link: https://lore.kernel.org/r/alpine.DEB.2.21.2305310038540.59226@angie.orcam.me.uk Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-06-20	PCI: Use pcie_wait_for_link_status() in pcie_wait_for_link_delay()	Maciej W. Rozycki
	Remove a DLLLA status bit polling loop from pcie_wait_for_link_delay() and call almost identical code in pcie_wait_for_link_status() instead. This reduces the lower bound on the polling interval from 10ms to 1ms, possibly increasing the CPU load on the system in favour to reducing the wait time. Link: https://lore.kernel.org/r/alpine.DEB.2.21.2306111611170.64925@angie.orcam.me.uk Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-06-20	PCI: Add support for polling DLLLA to pcie_retrain_link()	Maciej W. Rozycki
	Let the caller of pcie_retrain_link() specify whether they want to use the LT bit or the DLLLA bit of the Link Status Register to determine if link training has completed. It is up to the caller to verify whether the use of the DLLLA bit, the implementation of which is optional, is valid for the device requested. Link: https://lore.kernel.org/r/alpine.DEB.2.21.2306110310540.64925@angie.orcam.me.uk Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-06-20	PCI: Export pcie_retrain_link() for use outside ASPM	Maciej W. Rozycki
	Export pcie_retrain_link() for link retrain needs outside ASPM. Struct pcie_link_state is local to ASPM and only used by pcie_retrain_link() to get at the associated PCI device, so change the operand and adjust the lone call site accordingly. Document the interface. No functional change at this point. Link: https://lore.kernel.org/r/alpine.DEB.2.21.2306110229010.64925@angie.orcam.me.uk Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-06-20	PCI: Export PCIe link retrain timeout	Maciej W. Rozycki
	Convert LINK_RETRAIN_TIMEOUT from jiffies to milliseconds, accordingly rename to PCIE_LINK_RETRAIN_TIMEOUT_MS, and make available via "pci.h" for the PCI core to use. Use in pcie_wait_for_link_delay(). Link: https://lore.kernel.org/r/alpine.DEB.2.21.2305310030280.59226@angie.orcam.me.uk Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-06-20	PCI: Execute quirk_enable_clear_retrain_link() earlier	Maciej W. Rozycki
	Make quirk_enable_clear_retrain_link() an early quirk so that any later fixups can rely on dev->clear_retrain_link to have been already initialised. [bhelgaas: reorder to just before it becomes possible to call pcie_retrain_link() earlier] Link: https://lore.kernel.org/r/alpine.DEB.2.21.2305310049000.59226@angie.orcam.me.uk Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-06-20	PCI/ASPM: Factor out waiting for link training to complete	Maciej W. Rozycki
	Move code polling for the Link Training bit to clear into a function of its own. [bhelgaas: reorder to clean up before exposing to PCI core] Link: https://lore.kernel.org/r/alpine.DEB.2.21.2306111605060.64925@angie.orcam.me.uk Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-06-20	PCI/ASPM: Avoid unnecessary pcie_link_state use	Maciej W. Rozycki
	[bhelgaas: extract from expose patch, reorder to clean up before exposing] Link: https://lore.kernel.org/r/alpine.DEB.2.21.2306110229010.64925@angie.orcam.me.uk Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-06-20	RDMA/cma: Remove NULL check before dev_{put, hold}	Yang Li
	The call netdev_{put, hold} of dev_{put, hold} will check NULL, so there is no need to check before using dev_{put, hold}, remove it to silence the warning: ./drivers/infiniband/core/cma.c:4812:2-9: WARNING: NULL check before dev_{put, hold} functions is not needed. Link: https://lore.kernel.org/r/20230614014328.14007-1-yang.lee@linux.alibaba.com Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5521 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-20	RDMA/rxe: Simplify cq->notify code	Bob Pearson
	The flags parameter to the request notify verb is a bitmask. But, rxe driver treats cq->notify as an int. If someone ever set both the IB_CQ_SOLICITED and the IB_CQ_NEXT_COMP bits rxe_cq_post could fail to generate a completion event. This patch treats the notify flags as a bit mask consistently and can handle the above case correctly. Link: https://lore.kernel.org/r/20230612162244.20038-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-20	io_uring: add helpers to decode the fixed file file_ptr	Christoph Hellwig
	Remove all the open coded magic on slot->file_ptr by introducing two helpers that return the file pointer and the flags instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	io_uring: use io_file_from_index in io_msg_grab_file	Christoph Hellwig
	Use io_file_from_index instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	io_uring: use io_file_from_index in __io_sync_cancel	Christoph Hellwig
	Use io_file_from_index instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	io_uring: return REQ_F_ flags from io_file_get_flags	Christoph Hellwig
	Two of the three callers want them, so return the more usual format, and shift into the FFS_ form only for the fixed file table. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	io_uring: remove io_req_ffs_set	Christoph Hellwig
	Just checking the flag directly makes it a lot more obvious what is going on here. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	io_uring: remove a confusing comment above io_file_get_flags	Christoph Hellwig
	The SCM inflight mechanism has nothing to do with the fact that a file might be a regular file or not and if it supports non-blocking operations. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	io_uring: remove the mode variable in io_file_get_flags	Christoph Hellwig
	The variable is only once now, so don't bother with it. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	io_uring: remove __io_file_supports_nowait	Christoph Hellwig
	Now that this only checks O_NONBLOCK and FMODE_NOWAIT, the helper is complete overkilļ, and the comments are confusing bordering to wrong. Just inline the check into the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20	of: reserved_mem: Use stable allocation order	Stephan Gerhold
	sort() in Linux is based on heapsort which is not a stable sort algorithm - equal elements are being reordered. For reserved memory in the device tree this happens mainly for dynamic allocations: They do not have an address to sort with, so they are reordered somewhat randomly when adding/removing other unrelated reserved memory nodes. Functionally this is not a big problem, but it's confusing during development when all the addresses change after adding unrelated reserved memory nodes. Make the order stable by sorting dynamic allocations according to the node order in the device tree. Static allocations are not affected by this because they are still sorted by their (fixed) address. Signed-off-by: Stephan Gerhold <stephan@gerhold.net> Link: https://lore.kernel.org/r/20230510-dt-resv-bottom-up-v2-2-aeb2afc8ac25@gerhold.net Signed-off-by: Rob Herring <robh@kernel.org>
2023-06-20	of: reserved_mem: Try to keep range allocations contiguous	Stephan Gerhold
	Right now dynamic reserved memory regions are allocated either bottom-up or top-down, depending on the memblock setting of the architecture. This is fine when the address is arbitrary. However, when using "alloc-ranges" the regions are often placed somewhere in the middle of (free) RAM, even if the range starts or ends next to another (static) reservation. Try to detect this situation, and choose explicitly between bottom-up or top-down to allocate the memory close to the other reservations: 1. If the "alloc-range" starts at the end or inside an existing reservation, use bottom-up. 2. If the "alloc-range" ends at the start or inside an existing reservation, use top-down. 3. If both or none is the case, keep the current (architecture-specific) behavior. There are plenty of edge cases where only a more complex algorithm would help, but even this simple approach helps in many cases to keep the reserved memory (and therefore also the free memory) contiguous. Signed-off-by: Stephan Gerhold <stephan@gerhold.net> Link: https://lore.kernel.org/r/20230510-dt-resv-bottom-up-v2-1-aeb2afc8ac25@gerhold.net Signed-off-by: Rob Herring <robh@kernel.org>
2023-06-20	ovl: add Amir as co-maintainer	Miklos Szeredi
	Amir has implemented lots of features in overlayfs and is very active in maintenance. Make this official in the MAINTAINERS file. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-06-20	ovl: reserve ability to reconfigure mount options with new mount api	Christian Brauner
	Using the old mount api to remount an overlayfs superblock via mount(MS_REMOUNT) all mount options will be silently ignored. For example, if you create an overlayfs mount: mount -t overlay overlay -o lowerdir=/mnt/a:/mnt/b,upperdir=/mnt/upper,workdir=/mnt/work /mnt/merged and then issue a remount via: # force mount(8) to use mount(2) export LIBMOUNT_FORCE_MOUNT2=always mount -t overlay overlay -o remount,WOOTWOOT,lowerdir=/DOESNT-EXIST /mnt/merged with completely nonsensical mount options whatsoever it will succeed nonetheless. This prevents us from every changing any mount options we might introduce in the future that could reasonably be changed during a remount. We don't need to carry this issue into the new mount api port. Similar to FUSE we can use the fs_context::oldapi member to figure out that this is a request coming through the legacy mount api. If we detect it we continue silently ignoring all mount options. But for the new mount api we simply report that mount options cannot currently be changed. This will allow us to potentially alter mount properties for new or even old properties. It any case, silently ignoring everything is not something new apis should do. Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-06-20	riscv: uprobes: Restore thread.bad_cause	Tiezhu Yang
	thread.bad_cause is saved in arch_uprobe_pre_xol(), it should be restored in arch_uprobe_{post,abort}_xol() accordingly, otherwise the save operation is meaningless, this change is similar with x86 and powerpc. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Guo Ren <guoren@kernel.org> Fixes: 74784081aac8 ("riscv: Add uprobes supported") Link: https://lore.kernel.org/r/1682214146-3756-1-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-20	riscv: mm: try VMA lock-based page fault handling first	Jisheng Zhang
	Attempt VMA lock-based page fault handling first, and fall back to the existing mmap_lock-based handling if that fails. A simple running the ebizzy benchmark on Lichee Pi 4A shows that PER_VMA_LOCK can improve the ebizzy benchmark by about 32.68%. In theory, the more CPUs, the bigger improvement, but I don't have any HW platform which has more than 4 CPUs. This is the riscv variant of "x86/mm: try VMA lock-based page fault handling first". Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Link: https://lore.kernel.org/r/20230523165942.2630-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-20	dt-bindings: arm: drop unneeded quotes and use absolute /schemas path	Krzysztof Kozlowski
	Cleanup bindings dropping unneeded quotes. Once all these are fixed, checking for this can be enabled in yamllint. Also absolute path starting with /schemas is preferred. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20230609140754.65158-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2023-06-20	dt-bindings: firmware: arm,scmi: drop unneeded quotes and use absolute ↵	Krzysztof Kozlowski
	/schemas path Cleanup bindings dropping unneeded quotes. Once all these are fixed, checking for this can be enabled in yamllint. Also absolute path starting with /schemas is preferred. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20230609140749.65102-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2023-06-20	dt-bindings: dvfs: drop unneeded quotes	Krzysztof Kozlowski
	Cleanup bindings dropping unneeded quotes. Once all these are fixed, checking for this can be enabled in yamllint. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20230609140742.65018-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2023-06-20	dt-bindings: gpu: drop unneeded quotes	Krzysztof Kozlowski
	Cleanup bindings dropping unneeded quotes. Once all these are fixed, checking for this can be enabled in yamllint. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20230609140738.64958-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>