summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-06-20s390/boot: fix physmem_info virtual vs physical address confusionAlexander Gordeev
Fix virtual vs physical address confusion (which currently are the same). Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-20s390/kasan: avoid short by one page shadow memoryAlexander Gordeev
Kernel Address Sanitizer uses 3 bits per byte to encode memory. That is the number of bits the start and end address of a memory range is shifted right when the corresponding shadow memory is created for that memory range. The used memory mapping routine expects page-aligned addresses, while the above described 3-bit shift might turn the shadow memory range start and end boundaries into non-page-aligned in case the size of the original memory range is less than (PAGE_SIZE << 3). As result, the resulting shadow memory range could be short on one page. Align on page boundary the start and end addresses when mapping a shadow memory range and avoid the described issue in the future. Note, that does not fix a real problem, since currently no virtual regions of size less than (PAGE_SIZE << 3) exist. Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-20s390/kasan: fix insecure W+X mapping warningAlexander Gordeev
Since commit 3b5c3f000c2e ("s390/kasan: move shadow mapping to decompressor") the decompressor establishes mappings for the shadow memory and sets initial protection attributes to RWX. The decompressed kernel resets protection to RW+NX later on. In case a shadow memory range is not aligned on page boundary (e.g. as result of mem= kernel command line parameter use), the "Checked W+X mappings: FAILED, 1 W+X pages found" warning hits. Reported-by: Vasily Gorbik <gor@linux.ibm.com> Fixes: 557b19709da9 ("s390/kasan: move shadow mapping to decompressor") Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-20s390/crash: use the correct type for memory allocationChristophe JAILLET
get_elfcorehdr_size() returns a size_t, so there is no real point to store it in a u32. Turn 'alloc_size' into a size_t. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/0756118c9058338f3040edb91971d0bfd100027b.1686688212.git.christophe.jaillet@wanadoo.fr Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-20jfs: jfs_dmap: Validate db_l2nbperpage while mountingSiddh Raman Pant
In jfs_dmap.c at line 381, BLKTODMAP is used to get a logical block number inside dbFree(). db_l2nbperpage, which is the log2 number of blocks per page, is passed as an argument to BLKTODMAP which uses it for shifting. Syzbot reported a shift out-of-bounds crash because db_l2nbperpage is too big. This happens because the large value is set without any validation in dbMount() at line 181. Thus, make sure that db_l2nbperpage is correct while mounting. Max number of blocks per page = Page size / Min block size => log2(Max num_block per page) = log2(Page size / Min block size) = log2(Page size) - log2(Min block size) => Max db_l2nbperpage = L2PSIZE - L2MINBLOCKSIZE Reported-and-tested-by: syzbot+d2cd27dcf8e04b232eb2@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?id=2a70a453331db32ed491f5cbb07e81bf2d225715 Cc: stable@vger.kernel.org Suggested-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Siddh Raman Pant <code@siddh.me> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-06-20btrfs: fix u32 overflows when left shifting stripe_nrQu Wenruo
[BUG] David reported an ASSERT() get triggered during fio load on 8 devices with data/raid6 and metadata/raid1c3: fio --rw=randrw --randrepeat=1 --size=3000m \ --bsrange=512b-64k --bs_unaligned \ --ioengine=libaio --fsync=1024 \ --name=job0 --name=job1 \ The ASSERT() is from rbio_add_bio() of raid56.c: ASSERT(orig_logical >= full_stripe_start && orig_logical + orig_len <= full_stripe_start + rbio->nr_data * BTRFS_STRIPE_LEN); Which is checking if the target rbio is crossing the full stripe boundary. [100.789] assertion failed: orig_logical >= full_stripe_start && orig_logical + orig_len <= full_stripe_start + rbio->nr_data * BTRFS_STRIPE_LEN, in fs/btrfs/raid56.c:1622 [100.795] ------------[ cut here ]------------ [100.796] kernel BUG at fs/btrfs/raid56.c:1622! [100.797] invalid opcode: 0000 [#1] PREEMPT SMP KASAN [100.798] CPU: 1 PID: 100 Comm: kworker/u8:4 Not tainted 6.4.0-rc6-default+ #124 [100.799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014 [100.802] Workqueue: writeback wb_workfn (flush-btrfs-1) [100.803] RIP: 0010:rbio_add_bio+0x204/0x210 [btrfs] [100.806] RSP: 0018:ffff888104a8f300 EFLAGS: 00010246 [100.808] RAX: 00000000000000a1 RBX: ffff8881075907e0 RCX: ffffed1020951e01 [100.809] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000001 [100.811] RBP: 0000000141d20000 R08: 0000000000000001 R09: ffff888104a8f04f [100.813] R10: ffffed1020951e09 R11: 0000000000000003 R12: ffff88810e87f400 [100.815] R13: 0000000041d20000 R14: 0000000144529000 R15: ffff888101524000 [100.817] FS: 0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000 [100.821] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [100.822] CR2: 000055d54e44c270 CR3: 000000010a9a1006 CR4: 00000000003706a0 [100.824] Call Trace: [100.825] <TASK> [100.825] ? die+0x32/0x80 [100.826] ? do_trap+0x12d/0x160 [100.827] ? rbio_add_bio+0x204/0x210 [btrfs] [100.827] ? rbio_add_bio+0x204/0x210 [btrfs] [100.829] ? do_error_trap+0x90/0x130 [100.830] ? rbio_add_bio+0x204/0x210 [btrfs] [100.831] ? handle_invalid_op+0x2c/0x30 [100.833] ? rbio_add_bio+0x204/0x210 [btrfs] [100.835] ? exc_invalid_op+0x29/0x40 [100.836] ? asm_exc_invalid_op+0x16/0x20 [100.837] ? rbio_add_bio+0x204/0x210 [btrfs] [100.837] raid56_parity_write+0x64/0x270 [btrfs] [100.838] btrfs_submit_chunk+0x26e/0x800 [btrfs] [100.840] ? btrfs_bio_init+0x80/0x80 [btrfs] [100.841] ? release_pages+0x503/0x6d0 [100.842] ? folio_unlock+0x2f/0x60 [100.844] ? __folio_put+0x60/0x60 [100.845] ? btrfs_do_readpage+0xae0/0xae0 [btrfs] [100.847] btrfs_submit_bio+0x21/0x60 [btrfs] [100.847] submit_one_bio+0x6a/0xb0 [btrfs] [100.849] extent_write_cache_pages+0x395/0x680 [btrfs] [100.850] ? __extent_writepage+0x520/0x520 [btrfs] [100.851] ? mark_usage+0x190/0x190 [100.852] extent_writepages+0xdb/0x130 [btrfs] [100.853] ? extent_write_locked_range+0x480/0x480 [btrfs] [100.854] ? mark_usage+0x190/0x190 [100.854] ? attach_extent_buffer_page+0x220/0x220 [btrfs] [100.855] ? reacquire_held_locks+0x178/0x280 [100.856] ? writeback_sb_inodes+0x245/0x7f0 [100.857] do_writepages+0x102/0x2e0 [100.858] ? page_writeback_cpu_online+0x10/0x10 [100.859] ? __lock_release.isra.0+0x14a/0x4d0 [100.860] ? reacquire_held_locks+0x280/0x280 [100.861] ? __lock_acquired+0x1e9/0x3d0 [100.862] ? do_raw_spin_lock+0x1b0/0x1b0 [100.863] __writeback_single_inode+0x94/0x450 [100.864] writeback_sb_inodes+0x372/0x7f0 [100.864] ? lock_sync+0xd0/0xd0 [100.865] ? do_raw_spin_unlock+0x93/0xf0 [100.866] ? sync_inode_metadata+0xc0/0xc0 [100.867] ? rwsem_optimistic_spin+0x340/0x340 [100.868] __writeback_inodes_wb+0x70/0x130 [100.869] wb_writeback+0x2d1/0x530 [100.869] ? __writeback_inodes_wb+0x130/0x130 [100.870] ? lockdep_hardirqs_on_prepare.part.0+0xf1/0x1c0 [100.870] wb_do_writeback+0x3eb/0x480 [100.871] ? wb_writeback+0x530/0x530 [100.871] ? mark_lock_irq+0xcd0/0xcd0 [100.872] wb_workfn+0xe0/0x3f0< [CAUSE] Commit a97699d1d610 ("btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN") changes how we calculate the map length, to reduce u64 division. Function btrfs_max_io_len() is to get the length to the stripe boundary. It calculates the full stripe start offset (inside the chunk) by the following code: *full_stripe_start = rounddown(*stripe_nr, nr_data_stripes(map)) << BTRFS_STRIPE_LEN_SHIFT; The calculation itself is fine, but the value returned by rounddown() is dependent on both @stripe_nr (which is u32) and nr_data_stripes() (which returned int). Thus the result is also u32, then we do the left shift, which can overflow u32. If such overflow happens, @full_stripe_start will be a value way smaller than @offset, causing later "full_stripe_len - (offset - *full_stripe_start)" to underflow, thus make later length calculation to have no stripe boundary limit, resulting a write bio to exceed stripe boundary. There are some other locations like this, with a u32 @stripe_nr got left shift, which can lead to a similar overflow. [FIX] Fix all @stripe_nr with left shift with a type cast to u64 before the left shift. Those involved @stripe_nr or similar variables are recording the stripe number inside the chunk, which is small enough to be contained by u32, but their offset inside the chunk can not fit into u32. Thus for those specific left shifts, a type cast to u64 is necessary so this patch does not touch them and the code will be cleaned up in the future to keep the fix minimal. Reported-by: David Sterba <dsterba@suse.com> Fixes: a97699d1d610 ("btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN") Tested-by: David Sterba <dsterba@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-20riscv: replace deprecated scall with ecallFangrui Song
scall is a deprecated alias for ecall. ecall is used in several places, so there is no assembler compatibility concern. Signed-off-by: Fangrui Song <maskray@google.com> Link: https://lore.kernel.org/r/20230423223210.126948-1-maskray@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-20PCI: Add failed link recovery for device reset eventsMaciej W. Rozycki
Request failed link recovery with any upstream PCIe bridge where a device has not come back after reset within PCI_RESET_WAIT time. Reset the polling interval if recovery succeeded, otherwise continue as usual. [bhelgaas: inline pcie_parent_link_retrain()] Link: https://lore.kernel.org/r/alpine.DEB.2.21.2306111631050.64925@angie.orcam.me.uk Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-06-20PCI: Work around PCIe link training failuresMaciej W. Rozycki
Attempt to handle cases such as with a downstream port of the ASMedia ASM2824 PCIe switch where link training never completes and the link continues switching between speeds indefinitely with the data link layer never reaching the active state. It has been observed with a downstream port of the ASMedia ASM2824 Gen 3 switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2 switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device, P/N 41433, wired to a SiFive HiFive Unmatched board. In this setup the switches should negotiate a link speed of 5.0GT/s, falling back to 2.5GT/s if necessary. Instead the link continues oscillating between the two speeds, at the rate of 34-35 times per second, with link training reported repeatedly active ~84% of the time. Limiting the target link speed to 2.5GT/s with the upstream ASM2824 device makes the two switches communicate correctly. Removing the speed restriction afterwards makes the two devices switch to 5.0GT/s then. Make use of these observations and detect the inability to train the link by checking for the Data Link Layer Link Active status bit being off while the Link Bandwidth Management Status indicating that hardware has changed the link speed or width in an attempt to correct unreliable link operation. Restrict the speed to 2.5GT/s then with the Target Link Speed field, request a retrain and wait 200ms for the data link to go up. If this is successful, lift the restriction, letting the devices negotiate a higher speed. Also check for a 2.5GT/s speed restriction the firmware may have already arranged and lift it too with ports of devices known to continue working afterwards (currently only ASM2824), that already report their data link being up. [bhelgaas: reorder and squash stubs from https://lore.kernel.org/r/alpine.DEB.2.21.2306111619570.64925@angie.orcam.me.uk to avoid adding stubs that do nothing] Link: https://lore.kernel.org/r/alpine.DEB.2.21.2203022037020.56670@angie.orcam.me.uk/ Link: https://source.denx.de/u-boot/u-boot/-/commit/a398a51ccc68 Link: https://lore.kernel.org/r/alpine.DEB.2.21.2305310038540.59226@angie.orcam.me.uk Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-06-20PCI: Use pcie_wait_for_link_status() in pcie_wait_for_link_delay()Maciej W. Rozycki
Remove a DLLLA status bit polling loop from pcie_wait_for_link_delay() and call almost identical code in pcie_wait_for_link_status() instead. This reduces the lower bound on the polling interval from 10ms to 1ms, possibly increasing the CPU load on the system in favour to reducing the wait time. Link: https://lore.kernel.org/r/alpine.DEB.2.21.2306111611170.64925@angie.orcam.me.uk Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-06-20PCI: Add support for polling DLLLA to pcie_retrain_link()Maciej W. Rozycki
Let the caller of pcie_retrain_link() specify whether they want to use the LT bit or the DLLLA bit of the Link Status Register to determine if link training has completed. It is up to the caller to verify whether the use of the DLLLA bit, the implementation of which is optional, is valid for the device requested. Link: https://lore.kernel.org/r/alpine.DEB.2.21.2306110310540.64925@angie.orcam.me.uk Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-06-20PCI: Export pcie_retrain_link() for use outside ASPMMaciej W. Rozycki
Export pcie_retrain_link() for link retrain needs outside ASPM. Struct pcie_link_state is local to ASPM and only used by pcie_retrain_link() to get at the associated PCI device, so change the operand and adjust the lone call site accordingly. Document the interface. No functional change at this point. Link: https://lore.kernel.org/r/alpine.DEB.2.21.2306110229010.64925@angie.orcam.me.uk Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-06-20PCI: Export PCIe link retrain timeoutMaciej W. Rozycki
Convert LINK_RETRAIN_TIMEOUT from jiffies to milliseconds, accordingly rename to PCIE_LINK_RETRAIN_TIMEOUT_MS, and make available via "pci.h" for the PCI core to use. Use in pcie_wait_for_link_delay(). Link: https://lore.kernel.org/r/alpine.DEB.2.21.2305310030280.59226@angie.orcam.me.uk Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-06-20PCI: Execute quirk_enable_clear_retrain_link() earlierMaciej W. Rozycki
Make quirk_enable_clear_retrain_link() an early quirk so that any later fixups can rely on dev->clear_retrain_link to have been already initialised. [bhelgaas: reorder to just before it becomes possible to call pcie_retrain_link() earlier] Link: https://lore.kernel.org/r/alpine.DEB.2.21.2305310049000.59226@angie.orcam.me.uk Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-06-20PCI/ASPM: Factor out waiting for link training to completeMaciej W. Rozycki
Move code polling for the Link Training bit to clear into a function of its own. [bhelgaas: reorder to clean up before exposing to PCI core] Link: https://lore.kernel.org/r/alpine.DEB.2.21.2306111605060.64925@angie.orcam.me.uk Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-06-20PCI/ASPM: Avoid unnecessary pcie_link_state useMaciej W. Rozycki
[bhelgaas: extract from expose patch, reorder to clean up before exposing] Link: https://lore.kernel.org/r/alpine.DEB.2.21.2306110229010.64925@angie.orcam.me.uk Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-06-20RDMA/cma: Remove NULL check before dev_{put, hold}Yang Li
The call netdev_{put, hold} of dev_{put, hold} will check NULL, so there is no need to check before using dev_{put, hold}, remove it to silence the warning: ./drivers/infiniband/core/cma.c:4812:2-9: WARNING: NULL check before dev_{put, hold} functions is not needed. Link: https://lore.kernel.org/r/20230614014328.14007-1-yang.lee@linux.alibaba.com Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5521 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-20RDMA/rxe: Simplify cq->notify codeBob Pearson
The flags parameter to the request notify verb is a bitmask. But, rxe driver treats cq->notify as an int. If someone ever set both the IB_CQ_SOLICITED and the IB_CQ_NEXT_COMP bits rxe_cq_post could fail to generate a completion event. This patch treats the notify flags as a bit mask consistently and can handle the above case correctly. Link: https://lore.kernel.org/r/20230612162244.20038-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-20io_uring: add helpers to decode the fixed file file_ptrChristoph Hellwig
Remove all the open coded magic on slot->file_ptr by introducing two helpers that return the file pointer and the flags instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20io_uring: use io_file_from_index in io_msg_grab_fileChristoph Hellwig
Use io_file_from_index instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20io_uring: use io_file_from_index in __io_sync_cancelChristoph Hellwig
Use io_file_from_index instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20io_uring: return REQ_F_ flags from io_file_get_flagsChristoph Hellwig
Two of the three callers want them, so return the more usual format, and shift into the FFS_ form only for the fixed file table. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20io_uring: remove io_req_ffs_setChristoph Hellwig
Just checking the flag directly makes it a lot more obvious what is going on here. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20io_uring: remove a confusing comment above io_file_get_flagsChristoph Hellwig
The SCM inflight mechanism has nothing to do with the fact that a file might be a regular file or not and if it supports non-blocking operations. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20io_uring: remove the mode variable in io_file_get_flagsChristoph Hellwig
The variable is only once now, so don't bother with it. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20io_uring: remove __io_file_supports_nowaitChristoph Hellwig
Now that this only checks O_NONBLOCK and FMODE_NOWAIT, the helper is complete overkilļ, and the comments are confusing bordering to wrong. Just inline the check into the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620113235.920399-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20of: reserved_mem: Use stable allocation orderStephan Gerhold
sort() in Linux is based on heapsort which is not a stable sort algorithm - equal elements are being reordered. For reserved memory in the device tree this happens mainly for dynamic allocations: They do not have an address to sort with, so they are reordered somewhat randomly when adding/removing other unrelated reserved memory nodes. Functionally this is not a big problem, but it's confusing during development when all the addresses change after adding unrelated reserved memory nodes. Make the order stable by sorting dynamic allocations according to the node order in the device tree. Static allocations are not affected by this because they are still sorted by their (fixed) address. Signed-off-by: Stephan Gerhold <stephan@gerhold.net> Link: https://lore.kernel.org/r/20230510-dt-resv-bottom-up-v2-2-aeb2afc8ac25@gerhold.net Signed-off-by: Rob Herring <robh@kernel.org>
2023-06-20of: reserved_mem: Try to keep range allocations contiguousStephan Gerhold
Right now dynamic reserved memory regions are allocated either bottom-up or top-down, depending on the memblock setting of the architecture. This is fine when the address is arbitrary. However, when using "alloc-ranges" the regions are often placed somewhere in the middle of (free) RAM, even if the range starts or ends next to another (static) reservation. Try to detect this situation, and choose explicitly between bottom-up or top-down to allocate the memory close to the other reservations: 1. If the "alloc-range" starts at the end or inside an existing reservation, use bottom-up. 2. If the "alloc-range" ends at the start or inside an existing reservation, use top-down. 3. If both or none is the case, keep the current (architecture-specific) behavior. There are plenty of edge cases where only a more complex algorithm would help, but even this simple approach helps in many cases to keep the reserved memory (and therefore also the free memory) contiguous. Signed-off-by: Stephan Gerhold <stephan@gerhold.net> Link: https://lore.kernel.org/r/20230510-dt-resv-bottom-up-v2-1-aeb2afc8ac25@gerhold.net Signed-off-by: Rob Herring <robh@kernel.org>
2023-06-20ovl: add Amir as co-maintainerMiklos Szeredi
Amir has implemented lots of features in overlayfs and is very active in maintenance. Make this official in the MAINTAINERS file. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-06-20ovl: reserve ability to reconfigure mount options with new mount apiChristian Brauner
Using the old mount api to remount an overlayfs superblock via mount(MS_REMOUNT) all mount options will be silently ignored. For example, if you create an overlayfs mount: mount -t overlay overlay -o lowerdir=/mnt/a:/mnt/b,upperdir=/mnt/upper,workdir=/mnt/work /mnt/merged and then issue a remount via: # force mount(8) to use mount(2) export LIBMOUNT_FORCE_MOUNT2=always mount -t overlay overlay -o remount,WOOTWOOT,lowerdir=/DOESNT-EXIST /mnt/merged with completely nonsensical mount options whatsoever it will succeed nonetheless. This prevents us from every changing any mount options we might introduce in the future that could reasonably be changed during a remount. We don't need to carry this issue into the new mount api port. Similar to FUSE we can use the fs_context::oldapi member to figure out that this is a request coming through the legacy mount api. If we detect it we continue silently ignoring all mount options. But for the new mount api we simply report that mount options cannot currently be changed. This will allow us to potentially alter mount properties for new or even old properties. It any case, silently ignoring everything is not something new apis should do. Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2023-06-20riscv: uprobes: Restore thread.bad_causeTiezhu Yang
thread.bad_cause is saved in arch_uprobe_pre_xol(), it should be restored in arch_uprobe_{post,abort}_xol() accordingly, otherwise the save operation is meaningless, this change is similar with x86 and powerpc. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Guo Ren <guoren@kernel.org> Fixes: 74784081aac8 ("riscv: Add uprobes supported") Link: https://lore.kernel.org/r/1682214146-3756-1-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-20riscv: mm: try VMA lock-based page fault handling firstJisheng Zhang
Attempt VMA lock-based page fault handling first, and fall back to the existing mmap_lock-based handling if that fails. A simple running the ebizzy benchmark on Lichee Pi 4A shows that PER_VMA_LOCK can improve the ebizzy benchmark by about 32.68%. In theory, the more CPUs, the bigger improvement, but I don't have any HW platform which has more than 4 CPUs. This is the riscv variant of "x86/mm: try VMA lock-based page fault handling first". Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Link: https://lore.kernel.org/r/20230523165942.2630-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-20dt-bindings: arm: drop unneeded quotes and use absolute /schemas pathKrzysztof Kozlowski
Cleanup bindings dropping unneeded quotes. Once all these are fixed, checking for this can be enabled in yamllint. Also absolute path starting with /schemas is preferred. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20230609140754.65158-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2023-06-20dt-bindings: firmware: arm,scmi: drop unneeded quotes and use absolute ↵Krzysztof Kozlowski
/schemas path Cleanup bindings dropping unneeded quotes. Once all these are fixed, checking for this can be enabled in yamllint. Also absolute path starting with /schemas is preferred. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20230609140749.65102-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2023-06-20dt-bindings: dvfs: drop unneeded quotesKrzysztof Kozlowski
Cleanup bindings dropping unneeded quotes. Once all these are fixed, checking for this can be enabled in yamllint. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20230609140742.65018-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2023-06-20dt-bindings: gpu: drop unneeded quotesKrzysztof Kozlowski
Cleanup bindings dropping unneeded quotes. Once all these are fixed, checking for this can be enabled in yamllint. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20230609140738.64958-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2023-06-20dt-bindings: i3c: silvaco,i3c-master: drop unneeded quotesKrzysztof Kozlowski
Cleanup bindings dropping unneeded quotes. Once all these are fixed, checking for this can be enabled in yamllint. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20230609140735.64855-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2023-06-20RDMA/rxe: Fixes mr access supported listBob Pearson
A recent patch incorrectly did not include IB_ACCESS_RELAXED_ORDERING in the list of supported access flags for the rxe driver. The driver actually does nothing related to relaxed ordering but it causes no problems to include it as supported but with no effect. This change caused ib_send_bw and friends to not run correctly. The correct approach is for the driver to allow any of the optional access flags and otherwise ignore them. This patch adds IB_ACCESS_OPTIONAL to the list of rxe supported flags. Fixes: 02ed253770fb ("RDMA/rxe: Introduce rxe access supported flags") Link: https://lore.kernel.org/r/20230613171654.19334-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-20dt-bindings: rockchip: grf: drop unneeded quotesKrzysztof Kozlowski
Cleanup bindings dropping unneeded quotes. Once all these are fixed, checking for this can be enabled in yamllint. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20230609140702.64589-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2023-06-20dt-bindings: spmi: mtk,spmi-mtk-pmif: drop unneeded quotesKrzysztof Kozlowski
Cleanup bindings dropping unneeded quotes. Once all these are fixed, checking for this can be enabled in yamllint. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20230609140655.64529-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2023-06-20fs: dlm: remove filter local comms on closeAlexander Aring
The current way how lowcomms is configured is due configfs entries. Each comms configfs entry will create a lowcomms connection. Even the local connection itself will be stored as a lowcomms connection, although most functionality for a local lowcomms connection struct is not necessary. Now in some scenarios we will see that dlm_controld reports a -EEXIST when configure a node via configfs: ... /sys/kernel/config/dlm/cluster/comms/1/addr: write failed: 17 -1 Doing a: cat /sys/kernel/config/dlm/cluster/comms/1/addr_list reported nothing. This was being seen on cluster with nodeid 1 and it's local configuration. To be sure the configfs entries are in sync with lowcomms connection structures we always call dlm_midcomms_close() to be sure the lowcomms connection gets removed when the configfs entry gets dropped. Before commit 07ee38674a0b ("fs: dlm: filter ourself midcomms calls") it was just doing this by accident and the filter by doing: if (nodeid == dlm_our_nodeid()) return 0; inside dlm_midcomms_close() was never been hit because drop_comm() sets local_comm to NULL and cause that dlm_our_nodeid() returns always the invalid nodeid 0. Fixes: 07ee38674a0b ("fs: dlm: filter ourself midcomms calls") Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2023-06-20ASoC: tas2781: Fix spelling mistake "calibraiton" -> "calibration"Colin Ian King
There is a spelling mistake in a dev_err message. Fix it. Also fix grammar and add space between last word and (%d)". Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20230620095620.2522058-1-colin.i.king@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-06-20ASoC: qcom: audioreach: add compress offloadMark Brown
Merge series from Srinivas Kandagatla <srinivas.kandagatla@linaro.org>: This patchset adds compressed offload support to Qualcomm audioreach drivers. Currently it supports AAC, MP3 and FALC along with gapless. Tested this on SM8450 and sc7280.
2023-06-20ASoC: Use maple tree register cache for Everest SemiMark Brown
Merge series from Mark Brown <broonie@kernel.org>: Several of the Everest Semi CODECs only support single register read and write operations and therefore do not benefit from using the rbtree cache over the maple tree cache, convert them to the more modern maple tree cache.
2023-06-20ASoC: Convert Realtek I2C drivers to use maple treeMark Brown
Merge series from Mark Brown <broonie@kernel.org>: Many of the Realtek I2C/SPI devices only support single register read and write operations so don't benefit from using the rbtree cache instead of the more modern maple tree cache, convert them to maple tree.
2023-06-20accel/qaic: Call DRM helper function to destroy prime GEMPranjal Ramajor Asha Kanojiya
smatch warning: drivers/accel/qaic/qaic_data.c:620 qaic_free_object() error: dereferencing freed memory 'obj->import_attach' obj->import_attach is detached and freed using dma_buf_detach(). But used after free to decrease the dmabuf ref count using dma_buf_put(). drm_prime_gem_destroy() handles this issue and performs the proper clean up instead of open coding it in the driver. Fixes: ff13be830333 ("accel/qaic: Add datapath") Reported-by: Sukrut Bellary <sukrut.bellary@linux.com> Closes: https://lore.kernel.org/all/20230610021200.377452-1-sukrut.bellary@linux.com/ Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230614161528.11710-1-quic_jhugo@quicinc.com
2023-06-20reiserfs: fix blkdev_put() warning from release_journal_dev()Yu Kuai
In journal_init_dev(), if super bdev is used as 'j_dev_bd', then blkdev_get_by_dev() is called with NULL holder, otherwise, holder will be journal. However, later in release_journal_dev(), blkdev_put() is called with journal unconditionally, cause following warning: WARNING: CPU: 1 PID: 5034 at block/bdev.c:617 bd_end_claim block/bdev.c:617 [inline] WARNING: CPU: 1 PID: 5034 at block/bdev.c:617 blkdev_put+0x562/0x8a0 block/bdev.c:901 RIP: 0010:blkdev_put+0x562/0x8a0 block/bdev.c:901 Call Trace: <TASK> release_journal_dev fs/reiserfs/journal.c:2592 [inline] free_journal_ram+0x421/0x5c0 fs/reiserfs/journal.c:1896 do_journal_release fs/reiserfs/journal.c:1960 [inline] journal_release+0x276/0x630 fs/reiserfs/journal.c:1971 reiserfs_put_super+0xe4/0x5c0 fs/reiserfs/super.c:616 generic_shutdown_super+0x158/0x480 fs/super.c:499 kill_block_super+0x64/0xb0 fs/super.c:1422 deactivate_locked_super+0x98/0x160 fs/super.c:330 deactivate_super+0xb1/0xd0 fs/super.c:361 cleanup_mnt+0x2ae/0x3d0 fs/namespace.c:1247 task_work_run+0x16f/0x270 kernel/task_work.c:179 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0xadc/0x2a30 kernel/exit.c:874 do_group_exit+0xd4/0x2a0 kernel/exit.c:1024 __do_sys_exit_group kernel/exit.c:1035 [inline] __se_sys_exit_group kernel/exit.c:1033 [inline] __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1033 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fix this problem by passing in NULL holder in this case. Reported-by: syzbot+04625c80899f4555de39@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=04625c80899f4555de39 Fixes: 2736e8eeb0cc ("block: use the holder as indication for exclusive opens") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620111322.1014775-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions()Yu Kuai
After commit 2736e8eeb0cc ("block: use the holder as indication for exclusive opens"), blkdev_get_by_dev() will warn if holder is NULL and mode contains 'FMODE_EXCL'. holder from blkdev_get_by_dev() from disk_scan_partitions() is always NULL, hence it should not use 'FMODE_EXCL', which is broben by the commit. For consequence, WARN_ON_ONCE() will be triggered from blkdev_get_by_dev() if user scan partitions with device opened exclusively. Fix this problem by removing 'FMODE_EXCL' from disk_scan_partitions(), as it used to be. Reported-by: syzbot+00cd27751f78817f167b@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=00cd27751f78817f167b Fixes: 2736e8eeb0cc ("block: use the holder as indication for exclusive opens") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230618140402.7556-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20block: document the holder argument to blkdev_get_by_pathChristoph Hellwig
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620043536.707249-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-20block: increment diskseq on all media change eventsDemi Marie Obenour
Currently, associating a loop device with a different file descriptor does not increment its diskseq. This allows the following race condition: 1. Program X opens a loop device 2. Program X gets the diskseq of the loop device. 3. Program X associates a file with the loop device. 4. Program X passes the loop device major, minor, and diskseq to something. 5. Program X exits. 6. Program Y detaches the file from the loop device. 7. Program Y attaches a different file to the loop device. 8. The opener finally gets around to opening the loop device and checks that the diskseq is what it expects it to be. Even though the diskseq is the expected value, the result is that the opener is accessing the wrong file. From discussions with Christoph Hellwig, it appears that disk_force_media_change() was supposed to call inc_diskseq(), but in fact it does not. Adding a Fixes: tag to indicate this. Christoph's Reported-by is because he stated that disk_force_media_change() calls inc_diskseq(), which is what led me to discover that it should but does not. Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> Fixes: e6138dc12de9 ("block: add a helper to raise a media changed event") Cc: stable@vger.kernel.org # 5.15+ Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230607170837.1559-1-demi@invisiblethingslab.com Signed-off-by: Jens Axboe <axboe@kernel.dk>