summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-06-20block: fix signed int overflow in Amiga partition supportMichael Schmitz
The Amiga partition parser module uses signed int for partition sector address and count, which will overflow for disks larger than 1 TB. Use sector_t as type for sector address and size to allow using disks up to 2 TB without LBD support, and disks larger than 2 TB with LBD. This bug was reported originally in 2012, and the fix was created by the RDB author, Joanne Dow <jdow@earthlink.net>. A patch had been discussed and reviewed on linux-m68k at that time but never officially submitted. This patch differs from Joanne's patch only in its use of sector_t instead of unsigned int. No checking for overflows is done (see patch 3 of this series for that). Reported-by: Martin Steigerwald <Martin@lichtvoll.de> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=43511 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Message-ID: <201206192146.09327.Martin@lichtvoll.de> Cc: <stable@vger.kernel.org> # 5.2 Signed-off-by: Michael Schmitz <schmitzmic@gmail.com> Tested-by: Martin Steigerwald <Martin@lichtvoll.de> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620201725.7020-2-schmitzmic@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20drm: use mgr->dev in drm_dbg_kms in drm_dp_add_payload_part2Jeff Layton
I've been experiencing some intermittent crashes down in the display driver code. The symptoms are ususally a line like this in dmesg: amdgpu 0000:30:00.0: [drm] Failed to create MST payload for port 000000006d3a3885: -5 ...followed by an Oops due to a NULL pointer dereference. Switch to using mgr->dev instead of state->dev since "state" can be NULL in some cases. Link: https://bugzilla.redhat.com/show_bug.cgi?id=2184855 Suggested-by: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230419112447.18471-1-jlayton@kernel.org
2023-06-20wifi: iwlwifi: pcie: Handle SO-F device for PCI id 0x7AF0Mukesh Sisodiya
Add support for AX1690i and AX1690s devices with PCIE id 0x7AF0. Cc: stable@vger.kernel.org # 6.1+ Signed-off-by: Mukesh Sisodiya <mukesh.sisodiya@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20230619150233.461290-2-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20be2net: Extend xmit workaround to BE3 chipRoss Lagerwall
We have seen a bug where the NIC incorrectly changes the length in the IP header of a padded packet to include the padding bytes. The driver already has a workaround for this so do the workaround for this NIC too. This resolves the issue. The NIC in question identifies itself as follows: [ 8.828494] be2net 0000:02:00.0: FW version is 10.7.110.31 [ 8.834759] be2net 0000:02:00.0: Emulex OneConnect(be3): PF FLEX10 port 1 02:00.0 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01) Fixes: ca34fe38f06d ("be2net: fix wrong usage of adapter->generation") Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Link: https://lore.kernel.org/r/20230616164549.2863037-1-ross.lagerwall@citrix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-20block: add capacity validation in bdev_add_partition()Min Li
In the function bdev_add_partition(),there is no check that the start and end sectors exceed the size of the disk before calling add_partition. When we call the block's ioctl interface directly to add a partition, and the capacity of the disk is set to 0 by driver,the command will continue to execute. Signed-off-by: Min Li <min15.li@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230619091214.31615-1-min15.li@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20Merge tag '6.4-rc6-smb3-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: "Four smb3 server fixes, all also for stable: - fix potential oops in parsing compounded requests - fix various paths (mkdir, create etc) where mnt_want_write was not checked first - fix slab out of bounds in check_message and write" * tag '6.4-rc6-smb3-server-fixes' of git://git.samba.org/ksmbd: ksmbd: validate session id and tree id in the compound request ksmbd: fix out-of-bound read in smb2_write ksmbd: add mnt_want_write to ksmbd vfs functions ksmbd: validate command payload size
2023-06-20block: fine-granular CAP_SYS_ADMIN for Persistent ReservationJingbo Xu
Allow of unprivileged Persistent Reservation operations on devices if the write permission check on the device node has passed. brw-rw---- 1 root disk 259, 0 Jun 13 07:09 /dev/nvme0n1 In the example above, the "disk" group of nvme0n1 is also allowed to make reservations on the device even without CAP_SYS_ADMIN. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230613084008.93795-3-jefflexu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20block: disallow Persistent Reservation on partitionsJingbo Xu
Refuse Persistent Reservation operations on partitions as reservation on partitions doesn't make sense. Besides, introduce blkdev_pr_allowed() helper, where more policies could be placed here later. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230613084008.93795-2-jefflexu@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20s390/boot: fix physmem_info virtual vs physical address confusionAlexander Gordeev
Fix virtual vs physical address confusion (which currently are the same). Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-20s390/kasan: avoid short by one page shadow memoryAlexander Gordeev
Kernel Address Sanitizer uses 3 bits per byte to encode memory. That is the number of bits the start and end address of a memory range is shifted right when the corresponding shadow memory is created for that memory range. The used memory mapping routine expects page-aligned addresses, while the above described 3-bit shift might turn the shadow memory range start and end boundaries into non-page-aligned in case the size of the original memory range is less than (PAGE_SIZE << 3). As result, the resulting shadow memory range could be short on one page. Align on page boundary the start and end addresses when mapping a shadow memory range and avoid the described issue in the future. Note, that does not fix a real problem, since currently no virtual regions of size less than (PAGE_SIZE << 3) exist. Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-20s390/kasan: fix insecure W+X mapping warningAlexander Gordeev
Since commit 3b5c3f000c2e ("s390/kasan: move shadow mapping to decompressor") the decompressor establishes mappings for the shadow memory and sets initial protection attributes to RWX. The decompressed kernel resets protection to RW+NX later on. In case a shadow memory range is not aligned on page boundary (e.g. as result of mem= kernel command line parameter use), the "Checked W+X mappings: FAILED, 1 W+X pages found" warning hits. Reported-by: Vasily Gorbik <gor@linux.ibm.com> Fixes: 557b19709da9 ("s390/kasan: move shadow mapping to decompressor") Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-20s390/crash: use the correct type for memory allocationChristophe JAILLET
get_elfcorehdr_size() returns a size_t, so there is no real point to store it in a u32. Turn 'alloc_size' into a size_t. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/0756118c9058338f3040edb91971d0bfd100027b.1686688212.git.christophe.jaillet@wanadoo.fr Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-20btrfs: fix u32 overflows when left shifting stripe_nrQu Wenruo
[BUG] David reported an ASSERT() get triggered during fio load on 8 devices with data/raid6 and metadata/raid1c3: fio --rw=randrw --randrepeat=1 --size=3000m \ --bsrange=512b-64k --bs_unaligned \ --ioengine=libaio --fsync=1024 \ --name=job0 --name=job1 \ The ASSERT() is from rbio_add_bio() of raid56.c: ASSERT(orig_logical >= full_stripe_start && orig_logical + orig_len <= full_stripe_start + rbio->nr_data * BTRFS_STRIPE_LEN); Which is checking if the target rbio is crossing the full stripe boundary. [100.789] assertion failed: orig_logical >= full_stripe_start && orig_logical + orig_len <= full_stripe_start + rbio->nr_data * BTRFS_STRIPE_LEN, in fs/btrfs/raid56.c:1622 [100.795] ------------[ cut here ]------------ [100.796] kernel BUG at fs/btrfs/raid56.c:1622! [100.797] invalid opcode: 0000 [#1] PREEMPT SMP KASAN [100.798] CPU: 1 PID: 100 Comm: kworker/u8:4 Not tainted 6.4.0-rc6-default+ #124 [100.799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014 [100.802] Workqueue: writeback wb_workfn (flush-btrfs-1) [100.803] RIP: 0010:rbio_add_bio+0x204/0x210 [btrfs] [100.806] RSP: 0018:ffff888104a8f300 EFLAGS: 00010246 [100.808] RAX: 00000000000000a1 RBX: ffff8881075907e0 RCX: ffffed1020951e01 [100.809] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000001 [100.811] RBP: 0000000141d20000 R08: 0000000000000001 R09: ffff888104a8f04f [100.813] R10: ffffed1020951e09 R11: 0000000000000003 R12: ffff88810e87f400 [100.815] R13: 0000000041d20000 R14: 0000000144529000 R15: ffff888101524000 [100.817] FS: 0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000 [100.821] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [100.822] CR2: 000055d54e44c270 CR3: 000000010a9a1006 CR4: 00000000003706a0 [100.824] Call Trace: [100.825] <TASK> [100.825] ? die+0x32/0x80 [100.826] ? do_trap+0x12d/0x160 [100.827] ? rbio_add_bio+0x204/0x210 [btrfs] [100.827] ? rbio_add_bio+0x204/0x210 [btrfs] [100.829] ? do_error_trap+0x90/0x130 [100.830] ? rbio_add_bio+0x204/0x210 [btrfs] [100.831] ? handle_invalid_op+0x2c/0x30 [100.833] ? rbio_add_bio+0x204/0x210 [btrfs] [100.835] ? exc_invalid_op+0x29/0x40 [100.836] ? asm_exc_invalid_op+0x16/0x20 [100.837] ? rbio_add_bio+0x204/0x210 [btrfs] [100.837] raid56_parity_write+0x64/0x270 [btrfs] [100.838] btrfs_submit_chunk+0x26e/0x800 [btrfs] [100.840] ? btrfs_bio_init+0x80/0x80 [btrfs] [100.841] ? release_pages+0x503/0x6d0 [100.842] ? folio_unlock+0x2f/0x60 [100.844] ? __folio_put+0x60/0x60 [100.845] ? btrfs_do_readpage+0xae0/0xae0 [btrfs] [100.847] btrfs_submit_bio+0x21/0x60 [btrfs] [100.847] submit_one_bio+0x6a/0xb0 [btrfs] [100.849] extent_write_cache_pages+0x395/0x680 [btrfs] [100.850] ? __extent_writepage+0x520/0x520 [btrfs] [100.851] ? mark_usage+0x190/0x190 [100.852] extent_writepages+0xdb/0x130 [btrfs] [100.853] ? extent_write_locked_range+0x480/0x480 [btrfs] [100.854] ? mark_usage+0x190/0x190 [100.854] ? attach_extent_buffer_page+0x220/0x220 [btrfs] [100.855] ? reacquire_held_locks+0x178/0x280 [100.856] ? writeback_sb_inodes+0x245/0x7f0 [100.857] do_writepages+0x102/0x2e0 [100.858] ? page_writeback_cpu_online+0x10/0x10 [100.859] ? __lock_release.isra.0+0x14a/0x4d0 [100.860] ? reacquire_held_locks+0x280/0x280 [100.861] ? __lock_acquired+0x1e9/0x3d0 [100.862] ? do_raw_spin_lock+0x1b0/0x1b0 [100.863] __writeback_single_inode+0x94/0x450 [100.864] writeback_sb_inodes+0x372/0x7f0 [100.864] ? lock_sync+0xd0/0xd0 [100.865] ? do_raw_spin_unlock+0x93/0xf0 [100.866] ? sync_inode_metadata+0xc0/0xc0 [100.867] ? rwsem_optimistic_spin+0x340/0x340 [100.868] __writeback_inodes_wb+0x70/0x130 [100.869] wb_writeback+0x2d1/0x530 [100.869] ? __writeback_inodes_wb+0x130/0x130 [100.870] ? lockdep_hardirqs_on_prepare.part.0+0xf1/0x1c0 [100.870] wb_do_writeback+0x3eb/0x480 [100.871] ? wb_writeback+0x530/0x530 [100.871] ? mark_lock_irq+0xcd0/0xcd0 [100.872] wb_workfn+0xe0/0x3f0< [CAUSE] Commit a97699d1d610 ("btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN") changes how we calculate the map length, to reduce u64 division. Function btrfs_max_io_len() is to get the length to the stripe boundary. It calculates the full stripe start offset (inside the chunk) by the following code: *full_stripe_start = rounddown(*stripe_nr, nr_data_stripes(map)) << BTRFS_STRIPE_LEN_SHIFT; The calculation itself is fine, but the value returned by rounddown() is dependent on both @stripe_nr (which is u32) and nr_data_stripes() (which returned int). Thus the result is also u32, then we do the left shift, which can overflow u32. If such overflow happens, @full_stripe_start will be a value way smaller than @offset, causing later "full_stripe_len - (offset - *full_stripe_start)" to underflow, thus make later length calculation to have no stripe boundary limit, resulting a write bio to exceed stripe boundary. There are some other locations like this, with a u32 @stripe_nr got left shift, which can lead to a similar overflow. [FIX] Fix all @stripe_nr with left shift with a type cast to u64 before the left shift. Those involved @stripe_nr or similar variables are recording the stripe number inside the chunk, which is small enough to be contained by u32, but their offset inside the chunk can not fit into u32. Thus for those specific left shifts, a type cast to u64 is necessary so this patch does not touch them and the code will be cleaned up in the future to keep the fix minimal. Reported-by: David Sterba <dsterba@suse.com> Fixes: a97699d1d610 ("btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN") Tested-by: David Sterba <dsterba@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-20io_uring: add helpers to decode the fixed file file_ptrChristoph Hellwig
Remove all the open coded magic on slot->file_ptr by introducing two helpers that return the file pointer and the flags instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20io_uring: use io_file_from_index in io_msg_grab_fileChristoph Hellwig
Use io_file_from_index instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20io_uring: use io_file_from_index in __io_sync_cancelChristoph Hellwig
Use io_file_from_index instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20io_uring: return REQ_F_ flags from io_file_get_flagsChristoph Hellwig
Two of the three callers want them, so return the more usual format, and shift into the FFS_ form only for the fixed file table. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20io_uring: remove io_req_ffs_setChristoph Hellwig
Just checking the flag directly makes it a lot more obvious what is going on here. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20io_uring: remove a confusing comment above io_file_get_flagsChristoph Hellwig
The SCM inflight mechanism has nothing to do with the fact that a file might be a regular file or not and if it supports non-blocking operations. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20io_uring: remove the mode variable in io_file_get_flagsChristoph Hellwig
The variable is only once now, so don't bother with it. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20io_uring: remove __io_file_supports_nowaitChristoph Hellwig
Now that this only checks O_NONBLOCK and FMODE_NOWAIT, the helper is complete overkilļ, and the comments are confusing bordering to wrong. Just inline the check into the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20accel/qaic: Call DRM helper function to destroy prime GEMPranjal Ramajor Asha Kanojiya
smatch warning: drivers/accel/qaic/qaic_data.c:620 qaic_free_object() error: dereferencing freed memory 'obj->import_attach' obj->import_attach is detached and freed using dma_buf_detach(). But used after free to decrease the dmabuf ref count using dma_buf_put(). drm_prime_gem_destroy() handles this issue and performs the proper clean up instead of open coding it in the driver. Fixes: ff13be830333 ("accel/qaic: Add datapath") Reported-by: Sukrut Bellary <sukrut.bellary@linux.com> Closes: https://lore.kernel.org/all/20230610021200.377452-1-sukrut.bellary@linux.com/ Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230614161528.11710-1-quic_jhugo@quicinc.com
2023-06-20reiserfs: fix blkdev_put() warning from release_journal_dev()Yu Kuai
In journal_init_dev(), if super bdev is used as 'j_dev_bd', then blkdev_get_by_dev() is called with NULL holder, otherwise, holder will be journal. However, later in release_journal_dev(), blkdev_put() is called with journal unconditionally, cause following warning: WARNING: CPU: 1 PID: 5034 at block/bdev.c:617 bd_end_claim block/bdev.c:617 [inline] WARNING: CPU: 1 PID: 5034 at block/bdev.c:617 blkdev_put+0x562/0x8a0 block/bdev.c:901 RIP: 0010:blkdev_put+0x562/0x8a0 block/bdev.c:901 Call Trace: <TASK> release_journal_dev fs/reiserfs/journal.c:2592 [inline] free_journal_ram+0x421/0x5c0 fs/reiserfs/journal.c:1896 do_journal_release fs/reiserfs/journal.c:1960 [inline] journal_release+0x276/0x630 fs/reiserfs/journal.c:1971 reiserfs_put_super+0xe4/0x5c0 fs/reiserfs/super.c:616 generic_shutdown_super+0x158/0x480 fs/super.c:499 kill_block_super+0x64/0xb0 fs/super.c:1422 deactivate_locked_super+0x98/0x160 fs/super.c:330 deactivate_super+0xb1/0xd0 fs/super.c:361 cleanup_mnt+0x2ae/0x3d0 fs/namespace.c:1247 task_work_run+0x16f/0x270 kernel/task_work.c:179 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0xadc/0x2a30 kernel/exit.c:874 do_group_exit+0xd4/0x2a0 kernel/exit.c:1024 __do_sys_exit_group kernel/exit.c:1035 [inline] __se_sys_exit_group kernel/exit.c:1033 [inline] __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1033 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fix this problem by passing in NULL holder in this case. Reported-by: syzbot+04625c80899f4555de39@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=04625c80899f4555de39 Fixes: 2736e8eeb0cc ("block: use the holder as indication for exclusive opens") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620111322.1014775-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions()Yu Kuai
After commit 2736e8eeb0cc ("block: use the holder as indication for exclusive opens"), blkdev_get_by_dev() will warn if holder is NULL and mode contains 'FMODE_EXCL'. holder from blkdev_get_by_dev() from disk_scan_partitions() is always NULL, hence it should not use 'FMODE_EXCL', which is broben by the commit. For consequence, WARN_ON_ONCE() will be triggered from blkdev_get_by_dev() if user scan partitions with device opened exclusively. Fix this problem by removing 'FMODE_EXCL' from disk_scan_partitions(), as it used to be. Reported-by: syzbot+00cd27751f78817f167b@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=00cd27751f78817f167b Fixes: 2736e8eeb0cc ("block: use the holder as indication for exclusive opens") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230618140402.7556-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20block: document the holder argument to blkdev_get_by_pathChristoph Hellwig
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620043536.707249-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20block: increment diskseq on all media change eventsDemi Marie Obenour
Currently, associating a loop device with a different file descriptor does not increment its diskseq. This allows the following race condition: 1. Program X opens a loop device 2. Program X gets the diskseq of the loop device. 3. Program X associates a file with the loop device. 4. Program X passes the loop device major, minor, and diskseq to something. 5. Program X exits. 6. Program Y detaches the file from the loop device. 7. Program Y attaches a different file to the loop device. 8. The opener finally gets around to opening the loop device and checks that the diskseq is what it expects it to be. Even though the diskseq is the expected value, the result is that the opener is accessing the wrong file. From discussions with Christoph Hellwig, it appears that disk_force_media_change() was supposed to call inc_diskseq(), but in fact it does not. Adding a Fixes: tag to indicate this. Christoph's Reported-by is because he stated that disk_force_media_change() calls inc_diskseq(), which is what led me to discover that it should but does not. Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> Fixes: e6138dc12de9 ("block: add a helper to raise a media changed event") Cc: stable@vger.kernel.org # 5.15+ Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230607170837.1559-1-demi@invisiblethingslab.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20swim: fix a missing FMODE_ -> BLK_OPEN_ conversion in floppy_openChristoph Hellwig
Fix a missing conversion to the new BLK_OPEN constant in swim. Fixes: 05bdb9965305 ("block: replace fmode_t with a block-specific type for block open flags") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620043051.707196-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20x86/smp: Put CPUs into INIT on shutdown if possibleThomas Gleixner
Parking CPUs in a HLT loop is not completely safe vs. kexec() as HLT can resume execution due to NMI, SMI and MCE, which has the same issue as the MWAIT loop. Kicking the secondary CPUs into INIT makes this safe against NMI and SMI. A broadcast MCE will take the machine down, but a broadcast MCE which makes HLT resume and execute overwritten text, pagetables or data will end up in a disaster too. So chose the lesser of two evils and kick the secondary CPUs into INIT unless the system has installed special wakeup mechanisms which are not using INIT. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230615193330.608657211@linutronix.de
2023-06-20x86/smp: Split sending INIT IPI out into a helper functionThomas Gleixner
Putting CPUs into INIT is a safer place during kexec() to park CPUs. Split the INIT assert/deassert sequence out so it can be reused. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Link: https://lore.kernel.org/r/20230615193330.551157083@linutronix.de
2023-06-20x86/smp: Cure kexec() vs. mwait_play_dead() breakageThomas Gleixner
TLDR: It's a mess. When kexec() is executed on a system with offline CPUs, which are parked in mwait_play_dead() it can end up in a triple fault during the bootup of the kexec kernel or cause hard to diagnose data corruption. The reason is that kexec() eventually overwrites the previous kernel's text, page tables, data and stack. If it writes to the cache line which is monitored by a previously offlined CPU, MWAIT resumes execution and ends up executing the wrong text, dereferencing overwritten page tables or corrupting the kexec kernels data. Cure this by bringing the offlined CPUs out of MWAIT into HLT. Write to the monitored cache line of each offline CPU, which makes MWAIT resume execution. The written control word tells the offlined CPUs to issue HLT, which does not have the MWAIT problem. That does not help, if a stray NMI, MCE or SMI hits the offlined CPUs as those make it come out of HLT. A follow up change will put them into INIT, which protects at least against NMI and SMI. Fixes: ea53069231f9 ("x86, hotplug: Use mwait to offline a processor, fix the legacy case") Reported-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230615193330.492257119@linutronix.de
2023-06-20x86/smp: Use dedicated cache-line for mwait_play_dead()Thomas Gleixner
Monitoring idletask::thread_info::flags in mwait_play_dead() has been an obvious choice as all what is needed is a cache line which is not written by other CPUs. But there is a use case where a "dead" CPU needs to be brought out of MWAIT: kexec(). This is required as kexec() can overwrite text, pagetables, stacks and the monitored cacheline of the original kernel. The latter causes MWAIT to resume execution which obviously causes havoc on the kexec kernel which results usually in triple faults. Use a dedicated per CPU storage to prepare for that. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230615193330.434553750@linutronix.de
2023-06-20x86/smp: Remove pointless wmb()s from native_stop_other_cpus()Thomas Gleixner
The wmb()s before sending the IPIs are not synchronizing anything. If at all then the apic IPI functions have to provide or act as appropriate barriers. Remove these cargo cult barriers which have no explanation of what they are synchronizing. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230615193330.378358382@linutronix.de
2023-06-20x86/smp: Dont access non-existing CPUID leafTony Battersby
stop_this_cpu() tests CPUID leaf 0x8000001f::EAX unconditionally. Intel CPUs return the content of the highest supported leaf when a non-existing leaf is read, while AMD CPUs return all zeros for unsupported leafs. So the result of the test on Intel CPUs is lottery. While harmless it's incorrect and causes the conditional wbinvd() to be issued where not required. Check whether the leaf is supported before reading it. [ tglx: Adjusted changelog ] Fixes: 08f253ec3767 ("x86/cpu: Clear SME feature flag when not in use") Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/3817d810-e0f1-8ef8-0bbd-663b919ca49b@cybernetics.com Link: https://lore.kernel.org/r/20230615193330.322186388@linutronix.de
2023-06-20x86/smp: Make stop_other_cpus() more robustThomas Gleixner
Tony reported intermittent lockups on poweroff. His analysis identified the wbinvd() in stop_this_cpu() as the culprit. This was added to ensure that on SME enabled machines a kexec() does not leave any stale data in the caches when switching from encrypted to non-encrypted mode or vice versa. That wbinvd() is conditional on the SME feature bit which is read directly from CPUID. But that readout does not check whether the CPUID leaf is available or not. If it's not available the CPU will return the value of the highest supported leaf instead. Depending on the content the "SME" bit might be set or not. That's incorrect but harmless. Making the CPUID readout conditional makes the observed hangs go away, but it does not fix the underlying problem: CPU0 CPU1 stop_other_cpus() send_IPIs(REBOOT); stop_this_cpu() while (num_online_cpus() > 1); set_online(false); proceed... -> hang wbinvd() WBINVD is an expensive operation and if multiple CPUs issue it at the same time the resulting delays are even larger. But CPU0 already observed num_online_cpus() going down to 1 and proceeds which causes the system to hang. This issue exists independent of WBINVD, but the delays caused by WBINVD make it more prominent. Make this more robust by adding a cpumask which is initialized to the online CPU mask before sending the IPIs and CPUs clear their bit in stop_this_cpu() after the WBINVD completed. Check for that cpumask to become empty in stop_other_cpus() instead of watching num_online_cpus(). The cpumask cannot plug all holes either, but it's better than a raw counter and allows to restrict the NMI fallback IPI to be sent only the CPUs which have not reported within the timeout window. Fixes: 08f253ec3767 ("x86/cpu: Clear SME feature flag when not in use") Reported-by: Tony Battersby <tonyb@cybernetics.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/3817d810-e0f1-8ef8-0bbd-663b919ca49b@cybernetics.com Link: https://lore.kernel.org/r/87h6r770bv.ffs@tglx
2023-06-20Merge tag 'ipsec-2023-06-20' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec ipsec-2023-06-20
2023-06-20fs: Provide helpers for manipulating sb->s_readonly_remountJan Kara
Provide helpers to set and clear sb->s_readonly_remount including appropriate memory barriers. Also use this opportunity to document what the barriers pair with and why they are needed. Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Dave Chinner <dchinner@redhat.com> Message-Id: <20230620112832.5158-1-jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-06-20Merge branch 'dsa-mt7530-fixes'David S. Miller
Arınç ÜNAL says: ==================== net: dsa: mt7530: fix multiple CPU ports, BPDU and LLDP handling This patch series fixes all non-theoretical issues regarding multiple CPU ports and the handling of LLDP frames and BPDUs. I am adding me as a maintainer, I've got some code improvements on the way. I will keep an eye on this driver and the patches submitted for it in the future. Arınç v6: - Change a small portion of the comment in the diff on "net: dsa: mt7530: set all CPU ports in MT7531_CPU_PMAP" with Russell's suggestion. - Change the patch log of "net: dsa: mt7530: fix trapping frames on non-MT7621 SoC MT7530 switch" with Vladimir's suggestion. - Group the code for trapping frames into a common function and call that. - Add Vladimir and Russell's reviewed-by tags to where they're given. v5: - Change the comment in the diff on the first patch with Russell's words. - Change the patch log of the first patch to state that the patch is just preparatory work for change "net: dsa: introduce preferred_default_local_cpu_port and use on MT7530" and not a fix to an existing problem on the code base. - Remove the "net: dsa: mt7530: fix trapping frames with multiple CPU ports on MT7530" patch. It fixes a theoretical issue, therefore it is net-next material. - Remove unnecessary information from the patch logs. Remove the enum renaming change. - Strengthen the point of the "net: dsa: introduce preferred_default_local_cpu_port and use on MT7530" patch. v4: Make the patch logs and my comments in the code easier to understand. v3: Fix the from header on the patches. Write a cover letter. v2: Add patches to fix the handling of LLDP frames and BPDUs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-20MAINTAINERS: add me as maintainer of MEDIATEK SWITCH DRIVERArınç ÜNAL
Add me as a maintainer of the MediaTek MT7530 DSA subdriver. List maintainers in alphabetical order by first name. Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-20net: dsa: introduce preferred_default_local_cpu_port and use on MT7530Vladimir Oltean
Since the introduction of the OF bindings, DSA has always had a policy that in case multiple CPU ports are present in the device tree, the numerically smallest one is always chosen. The MT7530 switch family, except the switch on the MT7988 SoC, has 2 CPU ports, 5 and 6, where port 6 is preferable on the MT7531BE switch because it has higher bandwidth. The MT7530 driver developers had 3 options: - to modify DSA when the MT7531 switch support was introduced, such as to prefer the better port - to declare both CPU ports in device trees as CPU ports, and live with the sub-optimal performance resulting from not preferring the better port - to declare just port 6 in the device tree as a CPU port Of course they chose the path of least resistance (3rd option), kicking the can down the road. The hardware description in the device tree is supposed to be stable - developers are not supposed to adopt the strategy of piecemeal hardware description, where the device tree is updated in lockstep with the features that the kernel currently supports. Now, as a result of the fact that they did that, any attempts to modify the device tree and describe both CPU ports as CPU ports would make DSA change its default selection from port 6 to 5, effectively resulting in a performance degradation visible to users with the MT7531BE switch as can be seen below. Without preferring port 6: [ ID][Role] Interval Transfer Bitrate Retr [ 5][TX-C] 0.00-20.00 sec 374 MBytes 157 Mbits/sec 734 sender [ 5][TX-C] 0.00-20.00 sec 373 MBytes 156 Mbits/sec receiver [ 7][RX-C] 0.00-20.00 sec 1.81 GBytes 778 Mbits/sec 0 sender [ 7][RX-C] 0.00-20.00 sec 1.81 GBytes 777 Mbits/sec receiver With preferring port 6: [ ID][Role] Interval Transfer Bitrate Retr [ 5][TX-C] 0.00-20.00 sec 1.99 GBytes 856 Mbits/sec 273 sender [ 5][TX-C] 0.00-20.00 sec 1.99 GBytes 855 Mbits/sec receiver [ 7][RX-C] 0.00-20.00 sec 1.72 GBytes 737 Mbits/sec 15 sender [ 7][RX-C] 0.00-20.00 sec 1.71 GBytes 736 Mbits/sec receiver Using one port for WAN and the other ports for LAN is a very popular use case which is what this test emulates. As such, this change proposes that we retroactively modify stable kernels (which don't support the modification of the CPU port assignments, so as to let user space fix the problem and restore the throughput) to keep the mt7530 driver preferring port 6 even with device trees where the hardware is more fully described. Fixes: c288575f7810 ("net: dsa: mt7530: Add the support of MT7531 switch") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-20net: dsa: mt7530: fix handling of LLDP framesArınç ÜNAL
LLDP frames are link-local frames, therefore they must be trapped to the CPU port. Currently, the MT753X switches treat LLDP frames as regular multicast frames, therefore flooding them to user ports. To fix this, set LLDP frames to be trapped to the CPU port(s). Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-20net: dsa: mt7530: fix handling of BPDUs on MT7530 switchArınç ÜNAL
BPDUs are link-local frames, therefore they must be trapped to the CPU port. Currently, the MT7530 switch treats BPDUs as regular multicast frames, therefore flooding them to user ports. To fix this, set BPDUs to be trapped to the CPU port. Group this on mt7530_setup() and mt7531_setup_common() into mt753x_trap_frames() and call that. Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-20net: dsa: mt7530: fix trapping frames on non-MT7621 SoC MT7530 switchArınç ÜNAL
All MT7530 switch IP variants share the MT7530_MFC register, but the current driver only writes it for the switch variant that is integrated in the MT7621 SoC. Modify the code to include all MT7530 derivatives. Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-20net: dsa: mt7530: set all CPU ports in MT7531_CPU_PMAPArınç ÜNAL
MT7531_CPU_PMAP represents the destination port mask for trapped-to-CPU frames (further restricted by PCR_MATRIX). Currently the driver sets the first CPU port as the single port in this bit mask, which works fine regardless of whether the device tree defines port 5, 6 or 5+6 as CPU ports. This is because the logic coincides with DSA's logic of picking the first CPU port as the CPU port that all user ports are affine to, by default. An upcoming change would like to influence DSA's selection of the default CPU port to no longer be the first one, and in that case, this logic needs adaptation. Since there is no observed leakage or duplication of frames if all CPU ports are defined in this bit mask, simply include them all. Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk> Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-20net: dpaa2-mac: add 25gbase-r supportJosua Mayer
Layerscape MACs support 25Gbps network speed with dpmac "CAUI" mode. Add the mappings between DPMAC_ETH_IF_* and HY_INTERFACE_MODE_*, as well as the 25000 mac capability. Tested on SolidRun LX2162a Clearfog, serdes 1 protocol 18. Signed-off-by: Josua Mayer <josua@solid-run.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-20Merge tag 'ieee802154-for-net-2023-06-19' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan Stefan Schmidt says: ==================== An update from ieee802154 for your *net* tree: Two small fixes and MAINTAINERS update this time. Azeem Shaikh ensured consistent use of strscpy through the tree and fixed the usage in our trace.h. Chen Aotian fixed a potential memory leak in the hwsim simulator for ieee802154. Miquel Raynal updated the MAINATINERS file with the new team git tree locations and patchwork URLs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-19Merge tag 'hyperv-fixes-signed-20230619' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Fix races in Hyper-V PCI controller (Dexuan Cui) - Fix handling of hyperv_pcpu_input_arg (Michael Kelley) - Fix vmbus_wait_for_unload to scan present CPUs (Michael Kelley) - Call hv_synic_free in the failure path of hv_synic_alloc (Dexuan Cui) - Add noop for real mode handlers for virtual trust level code (Saurabh Sengar) * tag 'hyperv-fixes-signed-20230619' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: PCI: hv: Add a per-bus mutex state_lock Revert "PCI: hv: Fix a timing issue which causes kdump to fail occasionally" PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panic PCI: hv: Fix a race condition bug in hv_pci_query_relations() arm64/hyperv: Use CPUHP_AP_HYPERV_ONLINE state to fix CPU online sequencing x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offline Drivers: hv: vmbus: Fix vmbus_wait_for_unload() to scan present CPUs Drivers: hv: vmbus: Call hv_synic_free() if hv_synic_alloc() fails x86/hyperv/vtl: Add noop for realmode pointers
2023-06-19selftests/mm: fix cross compilation with LLVMMark Brown
Currently the MM selftests attempt to work out the target architecture by using CROSS_COMPILE or otherwise querying the host machine, storing the target architecture in a variable called MACHINE rather than the usual ARCH though as far as I can tell (including for x86_64) the value is the same as we would use for architecture. When cross compiling with LLVM we don't need a CROSS_COMPILE as LLVM can support many target architectures in a single build so this logic does not work, CROSS_COMPILE is not set and we end up selecting tests for the host rather than target architecture. Fix this by using the more standard ARCH to describe the architecture, taking it from the environment if specified. Link: https://lkml.kernel.org/r/20230614-kselftest-mm-llvm-v1-1-180523f277d3@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Tom Rix <trix@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19mailmap: add entries for Ben DooksBen Dooks
I am going to be losing my sifive.com address soon and I also realised my old Simtec address (from >10 years ago) is also not been updates so update .mailmap for both. Link: https://lkml.kernel.org/r/20230615081820.79485-1-ben.dooks@codethink.co.uk Signed-off-by: Ben Dooks <ben.dooks@sifive.com> Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19nilfs2: prevent general protection fault in nilfs_clear_dirty_page()Ryusuke Konishi
In a syzbot stress test that deliberately causes file system errors on nilfs2 with a corrupted disk image, it has been reported that nilfs_clear_dirty_page() called from nilfs_clear_dirty_pages() can cause a general protection fault. In nilfs_clear_dirty_pages(), when looking up dirty pages from the page cache and calling nilfs_clear_dirty_page() for each dirty page/folio retrieved, the back reference from the argument page to "mapping" may have been changed to NULL (and possibly others). It is necessary to check this after locking the page/folio. So, fix this issue by not calling nilfs_clear_dirty_page() on a page/folio after locking it in nilfs_clear_dirty_pages() if the back reference "mapping" from the page/folio is different from the "mapping" that held the page/folio just before. Link: https://lkml.kernel.org/r/20230612021456.3682-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+53369d11851d8f26735c@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/000000000000da4f6b05eb9bf593@google.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-19Revert "mm: vmscan: make global slab shrink lockless"Qi Zheng
This reverts commit f95bdb700bc6bb74e1199b1f5f90c613e152cfa7. Kernel test robot reports -88.8% regression in stress-ng.ramfs.ops_per_sec test case [1], which is caused by commit f95bdb700bc6 ("mm: vmscan: make global slab shrink lockless"). The root cause is that SRCU has to be careful to not frequently check for SRCU read-side critical section exits. Therefore, even if no one is currently in the SRCU read-side critical section, synchronize_srcu() cannot return quickly. That's why unregister_shrinker() has become slower. After discussion, we will try to use the refcount+RCU method [2] proposed by Dave Chinner to continue to re-implement the lockless slab shrink. So revert the shrinker_srcu related changes first. [1]. https://lore.kernel.org/lkml/202305230837.db2c233f-yujie.liu@intel.com/ [2]. https://lore.kernel.org/lkml/ZIJhou1d55d4H1s0@dread.disaster.area/ Link: https://lkml.kernel.org/r/20230609081518.3039120-8-qi.zheng@linux.dev Reported-by: kernel test robot <yujie.liu@intel.com> Closes: https://lore.kernel.org/oe-lkp/202305230837.db2c233f-yujie.liu@intel.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Kirill Tkhai <tkhai@ya.ru> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>