summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2024-07-19Merge tag 'for-6.11-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "A fix for build breakage on 32bit platforms" * tag 'for-6.11-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: change BTRFS_MOUNT_* flags to 64bit type
2024-07-19Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: "Several new features here: - Virtio find vqs API has been reworked (required to fix the scalability issue we have with adminq, which I hope to merge later in the cycle) - vDPA driver for Marvell OCTEON - virtio fs performance improvement - mlx5 migration speedups Fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (56 commits) virtio: rename virtio_find_vqs_info() to virtio_find_vqs() virtio: remove unused virtio_find_vqs() and virtio_find_vqs_ctx() helpers virtio: convert the rest virtio_find_vqs() users to virtio_find_vqs_info() virtio_balloon: convert to use virtio_find_vqs_info() virtiofs: convert to use virtio_find_vqs_info() scsi: virtio_scsi: convert to use virtio_find_vqs_info() virtio_net: convert to use virtio_find_vqs_info() virtio_crypto: convert to use virtio_find_vqs_info() virtio_console: convert to use virtio_find_vqs_info() virtio_blk: convert to use virtio_find_vqs_info() virtio: rename find_vqs_info() op to find_vqs() virtio: remove the original find_vqs() op virtio: call virtio_find_vqs_info() from virtio_find_single_vq() directly virtio: convert find_vqs() op implementations to find_vqs_info() virtio_pci: convert vp_*find_vqs() ops to find_vqs_info() virtio: introduce virtio_queue_info struct and find_vqs_info() config op virtio: make virtio_find_single_vq() call virtio_find_vqs() virtio: make virtio_find_vqs() call virtio_find_vqs_ctx() caif_virtio: use virtio_find_single_vq() for single virtqueue finding vdpa/mlx5: Don't enable non-active VQs in .set_vq_ready() ...
2024-07-19mm: add MAP_DROPPABLE for designating always lazily freeable mappingsJason A. Donenfeld
The vDSO getrandom() implementation works with a buffer allocated with a new system call that has certain requirements: - It shouldn't be written to core dumps. * Easy: VM_DONTDUMP. - It should be zeroed on fork. * Easy: VM_WIPEONFORK. - It shouldn't be written to swap. * Uh-oh: mlock is rlimited. * Uh-oh: mlock isn't inherited by forks. - It shouldn't reserve actual memory, but it also shouldn't crash when page faulting in memory if none is available * Uh-oh: VM_NORESERVE means segfaults. It turns out that the vDSO getrandom() function has three really nice characteristics that we can exploit to solve this problem: 1) Due to being wiped during fork(), the vDSO code is already robust to having the contents of the pages it reads zeroed out midway through the function's execution. 2) In the absolute worst case of whatever contingency we're coding for, we have the option to fallback to the getrandom() syscall, and everything is fine. 3) The buffers the function uses are only ever useful for a maximum of 60 seconds -- a sort of cache, rather than a long term allocation. These characteristics mean that we can introduce VM_DROPPABLE, which has the following semantics: a) It never is written out to swap. b) Under memory pressure, mm can just drop the pages (so that they're zero when read back again). c) It is inherited by fork. d) It doesn't count against the mlock budget, since nothing is locked. e) If there's not enough memory to service a page fault, it's not fatal, and no signal is sent. This way, allocations used by vDSO getrandom() can use: VM_DROPPABLE | VM_DONTDUMP | VM_WIPEONFORK | VM_NORESERVE And there will be no problem with OOMing, crashing on overcommitment, using memory when not in use, not wiping on fork(), coredumps, or writing out to swap. In order to let vDSO getrandom() use this, expose these via mmap(2) as MAP_DROPPABLE. Note that this involves removing the MADV_FREE special case from sort_folio(), which according to Yu Zhao is unnecessary and will simply result in an extra call to shrink_folio_list() in the worst case. The chunk removed reenables the swapbacked flag, which we don't want for VM_DROPPABLE, and we can't conditionalize it here because there isn't a vma reference available. Finally, the provided self test ensures that this is working as desired. Cc: linux-mm@kvack.org Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-07-19cifs: Add a tracepoint to track credits involved in R/W requestsDavid Howells
Add a tracepoint to track the credit changes and server in_flight value involved in the lifetime of a R/W request, logging it against the request/subreq debugging ID. This requires the debugging IDs to be recorded in the cifs_credits struct. The tracepoint can be enabled with: echo 1 >/sys/kernel/debug/tracing/events/cifs/smb3_rw_credits/enable Also add a three-state flag to struct cifs_credits to note if we're interested in determining when the in_flight contribution ends and, if so, to track whether we've decremented the contribution yet. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2024-07-19cifs: Fix setting of zero_point after DIO writeDavid Howells
At the moment, at the end of a DIO write, cifs calls netfs_resize_file() to adjust the size of the file if it needs it. This will reduce the zero_point (the point above which we assume a read will just return zeros) if it's more than the new i_size, but won't increase it. With DIO writes, however, we definitely want to increase it as we have clobbered the local pagecache and then written some data that's not available locally. Fix cifs to make the zero_point above the end of a DIO or unbuffered write. This fixes corruption seen occasionally with the generic/708 xfs-test. In that case, the read-back of some of the written data is being short-circuited and replaced with zeroes. Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib") Cc: stable@vger.kernel.org Reported-by: Steve French <sfrench@samba.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2024-07-19cifs: Fix missing error code setDavid Howells
In cifs_strict_readv(), the default rc (-EACCES) is accidentally cleared by a successful return from netfs_start_io_direct(), such that if cifs_find_lock_conflict() fails, we don't return an error. Fix this by resetting the default error code. Fixes: 14b1cd25346b ("cifs: Fix locking in cifs_strict_readv()") Cc: stable@vger.kernel.org Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2024-07-19cifs: Fix server re-repick on subrequest retryDavid Howells
When a subrequest is marked for needing retry, netfs will call cifs_prepare_write() which will make cifs repick the server for the op before renegotiating credits; it then calls cifs_issue_write() which invokes smb2_async_writev() - which re-repicks the server. If a different server is then selected, this causes the increment of server->in_flight to happen against one record and the decrement to happen against another, leading to misaccounting. Fix this by just removing the repick code in smb2_async_writev(). As this is only called from netfslib-driven code, cifs_prepare_write() should always have been called first, and so server should never be NULL and the preparatory step is repeated in the event that we do a retry. The problem manifests as a warning looking something like: WARNING: CPU: 4 PID: 72896 at fs/smb/client/smb2ops.c:97 smb2_add_credits+0x3f0/0x9e0 [cifs] ... RIP: 0010:smb2_add_credits+0x3f0/0x9e0 [cifs] ... smb2_writev_callback+0x334/0x560 [cifs] cifs_demultiplex_thread+0x77a/0x11b0 [cifs] kthread+0x187/0x1d0 ret_from_fork+0x34/0x60 ret_from_fork_asm+0x1a/0x30 Which may be triggered by a number of different xfstests running against an Azure server in multichannel mode. generic/249 seems the most repeatable, but generic/215, generic/249 and generic/308 may also show it. Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib") Cc: stable@vger.kernel.org Reported-by: Steve French <smfrench@gmail.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: Aurelien Aptel <aaptel@suse.com> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2024-07-19cifs: fix noisy message on copy_file_rangeSteve French
There are common cases where copy_file_range can noisily log "source and target of copy not on same server" e.g. the mv command across mounts to two different server's shares. Change this to informational rather than logging as an error. A followon patch will add dynamic trace points e.g. for cifs_file_copychunk_range Cc: stable@vger.kernel.org Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-07-19btrfs: change BTRFS_MOUNT_* flags to 64bit typeQu Wenruo
Currently the BTRFS_MOUNT_* flags are already beyond 32 bits, this is going to cause compilation errors for some 32 bit systems, as their unsigned long is only 32 bits long, thus flag BTRFS_MOUNT_IGNORESUPERFLAGS overflows and can lead to errors. Fix the problem by: - Migrate all existing BTRFS_MOUNT_* flags to unsigned long long - Migrate all mount option related variables to unsigned long long * btrfs_fs_info::mount_opt * btrfs_fs_context::mount_opt * mount_opt parameter of btrfs_check_options() * old_opts parameter of btrfs_remount_begin() * old_opts parameter of btrfs_remount_cleanup() * mount_opt parameter of btrfs_check_mountopts_zoned() * mount_opt and opt parameters of check_ro_option() Fixes: 32e6216512b4 ("btrfs: introduce new "rescue=ignoresuperflags" mount option") Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-07-18bcachefs: kill btree_trans_too_many_iters() in bch2_bucket_alloc_freelist()Kent Overstreet
When we're called via trans commit -> btree split -> allocator We may have already arbitrarily many btree_paths, for the transaction commit we're trying to do; when this happens, the btree_trans_too_many_iters() call causes us to livelock. Since the allocator calls btree_iter_dontneed to release paths as it iterates, this shouldn't cause any problems. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-18Merge tag 'bcachefs-2024-07-18.2' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull bcachefs updates from Kent Overstreet: - Metadata version 1.8: Stripe sectors accounting, BCH_DATA_unstriped This splits out the accounting of dirty sectors and stripe sectors in alloc keys; this lets us see stripe buckets that still have unstriped data in them. This is needed for ensuring that erasure coding is working correctly, as well as completing stripe creation after a crash. - Metadata version 1.9: Disk accounting rewrite The previous disk accounting scheme relied heavily on percpu counters that were also sharded by outstanding journal buffer; it was fast but not extensible or scalable, and meant that all accounting counters were recorded in every journal entry. The new disk accounting scheme stores accounting as normal btree keys; updates are deltas until they are flushed by the btree write buffer. This means we have no practical limit on the number of counters, and a new tagged union format that's easy to extend. We now have counters for compression type/ratio, per-snapshot-id usage, per-btree-id usage, and pending rebalance work. - Self healing on read IO/checksum error Data is now automatically rewritten if we get a read error and then a successful retry - Mount API conversion (thanks to Thomas Bertschinger) - Better lockdep coverage Previously, btree node locks were tracked individually by lockdep, like any other lock. But we may take _many_ btree node locks simultaneously, we easily blow through the limit of 48 locks that lockdep can track, leading to lockdep turning itself off. Tracking each btree node lock individually isn't really necessary since we have our own cycle detector for deadlock avoidance and centralized tracking of btree node locks, so we now have a single lockdep_map in btree_trans for "any btree nodes are locked". - Some more small incremental work towards online check_allocations - Lots more debugging improvements - Fixes, including: - undefined behaviour fixes, originally noted as breaking userspace LTO builds - fix a spurious warning in fsck_err, reported by Marcin - fix an integer overflow on trans->nr_updates, also reported by Marcin; this broke during deletion of highly fragmented indirect extents * tag 'bcachefs-2024-07-18.2' of https://evilpiepirate.org/git/bcachefs: (120 commits) lockdep: Add comments for lockdep_set_no{validate,track}_class() bcachefs: Fix integer overflow on trans->nr_updates bcachefs: silence silly kdoc warning bcachefs: Fix fsck warning about btree_trans not passed to fsck error bcachefs: Add an error message for insufficient rw journal devs bcachefs: varint: Avoid left-shift of a negative value bcachefs: darray: Don't pass NULL to memcpy() bcachefs: Kill bch2_assert_btree_nodes_not_locked() bcachefs: Rename BCH_WRITE_DONE -> BCH_WRITE_SUBMITTED bcachefs: __bch2_read(): call trans_begin() on every loop iter bcachefs: show none if label is not set bcachefs: drop packed, aligned from bkey_inode_buf bcachefs: btree node scan: fall back to comparing by journal seq bcachefs: Add lockdep support for btree node locks lockdep: lockdep_set_notrack_class() bcachefs: Improve copygc_wait_to_text() bcachefs: Convert clock code to u64s bcachefs: Improve startup message bcachefs: Self healing on read IO error bcachefs: Make read_only a mount option again, but hidden ...
2024-07-18Merge tag 'nfs-for-6.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "New Features: - Add support for large folios - Implement rpcrdma generic device removal notification - Add client support for attribute delegations - Use a LAYOUTRETURN during reboot recovery to report layoutstats and errors - Improve throughput for random buffered writes - Add NVMe support to pnfs/blocklayout Bugfixes: - Fix rpcrdma_reqs_reset() - Avoid soft lockups when using UDP - Fix an nfs/blocklayout premature PR key unregestration - Another fix for EXCHGID4_FLAG_USE_PNFS_DS for DS server - Do not extend writes to the entire folio - Pass explicit offset and count values to tracepoints - Fix a race to wake up sleeping SUNRPC sync tasks - Fix gss_status tracepoint output Cleanups: - Add missing MODULE_DESCRIPTION() macros - Add blocklayout / SCSI layout tracepoints - Remove asm-generic headers from xprtrdma verbs.c - Remove unused 'struct mnt_fhstatus' - Other delegation related cleanups - Other folio related cleanups - Other pNFS related cleanups - Other xprtrdma cleanups" * tag 'nfs-for-6.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (63 commits) SUNRPC: Fixup gss_status tracepoint error output SUNRPC: Fix a race to wake a sync task nfs: split nfs_read_folio nfs: pass explicit offset/count to trace events nfs: do not extend writes to the entire folio nfs/blocklayout: add support for NVMe nfs: remove nfs_page_length nfs: remove the unused max_deviceinfo_size field from struct pnfs_layoutdriver_type nfs: don't reuse partially completed requests in nfs_lock_and_join_requests nfs: move nfs_wait_on_request to write.c nfs: fold nfs_page_group_lock_subrequests into nfs_lock_and_join_requests nfs: fold nfs_folio_find_and_lock_request into nfs_lock_and_join_requests nfs: simplify nfs_folio_find_and_lock_request nfs: remove nfs_folio_private_request nfs: remove dead code for the old swap over NFS implementation NFSv4.1 another fix for EXCHGID4_FLAG_USE_PNFS_DS for DS server nfs: Block on write congestion nfs: Properly initialize server->writeback nfs: Drop pointless check from nfs_commit_release_pages() nfs/blocklayout: SCSI layout trace points for reservation key reg/unreg ...
2024-07-18Merge tag 'ext4_for_linus-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Many cleanups and bug fixes in ext4, especially for the fast commit feature. Also some performance improvements; in particular, improving IOPS and throughput on fast devices running Async Direct I/O by up to 20% by optimizing jbd2_transaction_committed()" * tag 'ext4_for_linus-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits) ext4: make sure the first directory block is not a hole ext4: check dot and dotdot of dx_root before making dir indexed ext4: sanity check for NULL pointer after ext4_force_shutdown jbd2: increase maximum transaction size jbd2: drop pointless shrinker batch initialization jbd2: avoid infinite transaction commit loop jbd2: precompute number of transaction descriptor blocks jbd2: make jbd2_journal_get_max_txn_bufs() internal jbd2: avoid mount failed when commit block is partial submitted ext4: avoid writing unitialized memory to disk in EA inodes ext4: don't track ranges in fast_commit if inode has inlined data ext4: fix possible tid_t sequence overflows ext4: use ext4_update_inode_fsync_trans() helper in inode creation ext4: add missing MODULE_DESCRIPTION() jbd2: add missing MODULE_DESCRIPTION() ext4: use memtostr_pad() for s_volume_name jbd2: speed up jbd2_transaction_committed() ext4: make ext4_da_map_blocks() buffer_head unaware ext4: make ext4_insert_delayed_block() insert multi-blocks ext4: factor out a helper to check the cluster allocation state ...
2024-07-18Merge tag 'vfs-6.11-rc1.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix a missing rcu_read_unlock() in nsfs by switching to a cleanup guard - Add missing module descriptor for adfs * tag 'vfs-6.11-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nsfs: use cleanup guard fs/adfs: add MODULE_DESCRIPTION
2024-07-18bcachefs: mean_and_variance: Avoid too-large shift amountsTavian Barnes
Shifting a value by the width of its type or more is undefined. Signed-off-by: Tavian Barnes <tavianator@tavianator.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-18bcachefs: Fix integer overflow on trans->nr_updatesKent Overstreet
We can't have more updates than paths, so btree_path_idx_t is the correct type to use. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-18bcachefs: silence silly kdoc warningKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-18bcachefs: Fix fsck warning about btree_trans not passed to fsck errorKent Overstreet
If a btree_trans is in use it's supposed to be passed to fsck_err so that it can be unlocked if we're waiting on userspace input; but the btree IO paths do call fsck errors where a btree_trans exists on the stack but it's not passed through. But it's ok, because it's unlocked while doing IO. Fixes: a850bde6498b ("bcachefs: fsck_err() may now take a btree_trans") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-18bcachefs: Add an error message for insufficient rw journal devsKent Overstreet
This causes us to go read-only - need an error message saying why. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-18bcachefs: varint: Avoid left-shift of a negative valueTavian Barnes
Shifting a negative value left is undefined. Signed-off-by: Tavian Barnes <tavianator@tavianator.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-18Merge tag 'memblock-v6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock updates from Mike Rapoport: - 'reserve_mem' command line parameter to allow creation of named memory reservation at boot time. The driving use-case is to improve the ability of pstore to retain ramoops data across reboots. - cleanups and small improvements in memblock and mm_init - new tests cases in memblock test suite * tag 'memblock-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: memblock tests: fix implicit declaration of function 'numa_valid_node' memblock: Move late alloc warning down to phys alloc pstore/ramoops: Add ramoops.mem_name= command line option mm/memblock: Add "reserve_mem" to reserved named memory at boot up mm/mm_init.c: don't initialize page->lru again mm/mm_init.c: not always search next deferred_init_pfn from very beginning mm/mm_init.c: use deferred_init_mem_pfn_range_in_zone() to decide loop condition mm/mm_init.c: get the highest zone directly mm/mm_init.c: move nr_initialised reset down a bit mm/memblock: fix a typo in description of for_each_mem_region() mm/mm_init.c: use memblock_region_memory_base_pfn() to get startpfn mm/memblock: use PAGE_ALIGN_DOWN to get pgend in free_memmap mm/memblock: return true directly on finding overlap region memblock tests: add memblock_overlaps_region_checks mm/memblock: fix comment for memblock_isolate_range() memblock tests: add memblock_reserve_many_may_conflict_check() memblock tests: add memblock_reserve_all_locations_check() mm/memblock: remove empty dummy entry
2024-07-18nsfs: use cleanup guardChristian Brauner
Ensure that rcu read lock is given up before returning. Link: https://lore.kernel.org/r/20240716-elixier-fliesen-1ab342151a61@brauner Fixes: ca567df74a28 ("nsfs: add pid translation ioctls") Reported-by: syzbot+a3e82ae343b26b4d2335@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-18fs/adfs: add MODULE_DESCRIPTIONJeff Johnson
Fix the 'make W=1' issue: WARNING: modpost: missing MODULE_DESCRIPTION() in fs/adfs/adfs.o Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20240523-md-adfs-v1-1-364268e38370@quicinc.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-17hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address ↵Donet Tom
than mmap_min_addr generic_hugetlb_get_unmapped_area() was returning an address less than mmap_min_addr if the mmap argument addr, after alignment, was less than mmap_min_addr, causing mmap to fail. This is because current generic_hugetlb_get_unmapped_area() code does not take into account mmap_min_addr. This patch ensures that generic_hugetlb_get_unmapped_area() always returns an address that is greater than mmap_min_addr. Additionally, similar to generic_get_unmapped_area(), vm_end_gap() checks are included to maintain stack gap. How to reproduce ================ #include <stdio.h> #include <stdlib.h> #include <sys/mman.h> #include <unistd.h> #define HUGEPAGE_SIZE (16 * 1024 * 1024) int main() { void *addr = mmap((void *)-1, HUGEPAGE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0); if (addr == MAP_FAILED) { perror("mmap"); exit(EXIT_FAILURE); } snprintf((char *)addr, HUGEPAGE_SIZE, "Hello, Huge Pages!"); printf("%s\n", (char *)addr); if (munmap(addr, HUGEPAGE_SIZE) == -1) { perror("munmap"); exit(EXIT_FAILURE); } return 0; } Result without fix ================== # cat /proc/meminfo |grep -i HugePages_Free HugePages_Free: 20 # ./test mmap: Permission denied # Result with fix =============== # cat /proc/meminfo |grep -i HugePages_Free HugePages_Free: 20 # ./test Hello, Huge Pages! # Link: https://lkml.kernel.org/r/20240710051912.4681-1-donettom@linux.ibm.com Signed-off-by: Donet Tom <donettom@linux.ibm.com> Reported-by Pavithra Prakash <pavrampu@linux.vnet.ibm.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Tony Battersby <tonyb@cybernetics.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-17Merge tag 'zonefs-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs update from Damien Le Moal: "A single change to enable support for large folios (from Johannes)" * tag 'zonefs-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: enable support for large folios
2024-07-17Merge tag 'fs_for_v6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull udf, ext2, isofs fixes and cleanups from Jan Kara: - A few UDF cleanups and fixes for handling corrupted filesystems - ext2 fix for handling of corrupted filesystem - isofs module description - jbd2 module description * tag 'fs_for_v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: Verify bitmap and itable block numbers before using them udf: prevent integer overflow in udf_bitmap_free_blocks() udf: Avoid excessive partition lengths udf: Drop load_block_bitmap() wrapper udf: Avoid using corrupted block bitmap buffer udf: Fix bogus checksum computation in udf_rename() udf: Fix lock ordering in udf_evict_inode() udf: Drop pointless IS_IMMUTABLE and IS_APPEND check isofs: add missing MODULE_DESCRIPTION() jbd2: add missing MODULE_DESCRIPTION()
2024-07-17Merge tag 'fsnotify_for_v6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify fix from Jan Kara: "Fix possible softlockups on directories with many dentries in fsnotify code" * tag 'fsnotify_for_v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: clear PARENT_WATCHED flags lazily
2024-07-17Merge tag 'xfs-6.11-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs updates from Chandan Babu: "Major changes in this release are limited to enabling FITRIM on realtime devices and Byte-based grant head log reservation tracking. The remaining changes are limited to fixes and cleanups included in this pull request. Core: - Enable FITRIM on the realtime device - Introduce byte-based grant head log reservation tracking instead of physical log location tracking. This allows grant head to track a full 64 bit bytes space and hence overcome the limit of 4GB indexing that has been present until now Fixes: - xfs_flush_unmap_range() and xfs_prepare_shift() should consider RT extents in the flush unmap range - Implement bounds check when traversing log operations during log replay - Prevent out of bounds access when traversing a directory data block - Prevent incorrect ENOSPC when concurrently performing file creation and file writes - Fix rtalloc rotoring when delalloc is in use Cleanups: - Clean up I/O path inode locking helpers and the page fault handler - xfs: hoist inode operations to libxfs in anticipation of the metadata inode directory feature, which maintains a directory tree of metadata inodes. This will be necessary for further enhancements to the realtime feature, subvolume support - Clean up some warts in the extent freeing log intent code - Clean up the refcount and rmap intent code before adding support for realtime devices - Provide the correct email address for sysfs ABI documentation" * tag 'xfs-6.11-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (80 commits) xfs: fix rtalloc rotoring when delalloc is in use xfs: get rid of xfs_ag_resv_rmapbt_alloc xfs: skip flushing log items during push xfs: grant heads track byte counts, not LSNs xfs: pass the full grant head to accounting functions xfs: track log space pinned by the AIL xfs: collapse xlog_state_set_callback in caller xfs: l_last_sync_lsn is really AIL state xfs: ensure log tail is always up to date xfs: background AIL push should target physical space xfs: AIL doesn't need manual pushing xfs: move and rename xfs_trans_committed_bulk xfs: fix the contact address for the sysfs ABI documentation xfs: Avoid races with cnt_btree lastrec updates xfs: move xfs_refcount_update_defer_add to xfs_refcount_item.c xfs: simplify usage of the rcur local variable in xfs_refcount_finish_one xfs: don't bother calling xfs_refcount_finish_one_cleanup in xfs_refcount_finish_one xfs: reuse xfs_refcount_update_cancel_item xfs: add a ci_entry helper xfs: remove xfs_trans_set_refcount_flags ...
2024-07-17Merge tag 'exfat-for-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - Fix deadlock issue reported by syzbot - Handle idmapped mounts * tag 'exfat-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: fix potential deadlock on __exfat_get_dentry_set exfat: handle idmapped mounts
2024-07-17Merge tag 'for-6.11-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "The highlights are new logic behind background block group reclaim, automatic removal of qgroup after removing a subvolume and new 'rescue=' mount options. The rest is optimizations, cleanups and refactoring. User visible features: - dynamic block group reclaim: - tunable framework to avoid situations where eager data allocations prevent creating new metadata chunks due to lack of unallocated space - reuse sysfs knob bg_reclaim_threshold (otherwise used only in zoned mode) for a fixed value threshold - new on/off sysfs knob "dynamic_reclaim" calculating the value based on heuristics, aiming to keep spare working space for relocating chunks but not to needlessly relocate partially utilized block groups or reclaim newly allocated ones - stats are exported in sysfs per block group type, files "reclaim_*" - this may increase IO load at unexpected times but the corner case of no allocatable block groups is known to be worse - automatically remove qgroup of deleted subvolumes: - adjust qgroup removal conditions, make sure all related subvolume data are already removed, or return EBUSY, also take into account setting of sysfs drop_subtree_threshold - also works in squota mode - mount option updates: new modes of 'rescue=' that allow to mount images (read-only) that could have been partially converted by user space tools - ignoremetacsums - invalid metadata checksums are ignored - ignoresuperflags - super block flags that track conversion in progress (like UUID or checksums) Core: - size of struct btrfs_inode is now below 1024 (on a release config), improved memory packing and other secondary effects - switch tracking of open inodes from rb-tree to xarray, minor performance improvement - reduce number of empty transaction commits when there are no dirty data/metadata - memory allocation optimizations (reduced numbers, reordering out of critical sections) - extent map structure optimizations and refactoring, more sanity checks - more subpage in zoned mode preparations or fixes - general snapshot code cleanups, improvements and documentation - tree-checker updates: more file extent ram_bytes fixes, continued - raid-stripe-tree update (not backward compatible): - remove extent encoding field from the structure, can be inferred from other information - requires btrfs-progs 6.9.1 or newer - cleanups and refactoring - error message updates - error handling improvements - return type and parameter cleanups and improvements" * tag 'for-6.11-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (152 commits) btrfs: fix extent map use-after-free when adding pages to compressed bio btrfs: fix bitmap leak when loading free space cache on duplicate entry btrfs: remove the BUG_ON() inside extent_range_clear_dirty_for_io() btrfs: move extent_range_clear_dirty_for_io() into inode.c btrfs: enhance compression error messages btrfs: fix data race when accessing the last_trans field of a root btrfs: rename the extra_gfp parameter of btrfs_alloc_page_array() btrfs: remove the extra_gfp parameter from btrfs_alloc_folio_array() btrfs: introduce new "rescue=ignoresuperflags" mount option btrfs: introduce new "rescue=ignoremetacsums" mount option btrfs: output the unrecognized super block flags as hex btrfs: remove unused Opt enums btrfs: tree-checker: add extra ram_bytes and disk_num_bytes check btrfs: fix the ram_bytes assignment for truncated ordered extents btrfs: make validate_extent_map() catch ram_bytes mismatch btrfs: ignore incorrect btrfs_file_extent_item::ram_bytes btrfs: cleanup the bytenr usage inside btrfs_extent_item_to_extent_map() btrfs: fix typo in error message in btrfs_validate_super() btrfs: move the direct IO code into its own file btrfs: pass a btrfs_inode to btrfs_set_prop() ...
2024-07-17Merge tag 'gfs2-v6.10-rc1-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: "Fixes and cleanups: - Revise the glock reference counting model and LRU list handling to be more sensible - Several quota related fixes: clean up the quota code, add some missing locking, and work around the on-disk corruption that the reverted patch "gfs2: ignore negated quota changes" causes - Clean up the glock demote logic in glock_work_func()" * tag 'gfs2-v6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (29 commits) gfs2: Clean up glock demote logic gfs2: Revert "check for no eligible quota changes" gfs2: Be more careful with the quota sync generation gfs2: Get rid of some unnecessary quota locking gfs2: Add some missing quota locking gfs2: Fold qd_fish into gfs2_quota_sync gfs2: quota need_sync cleanup gfs2: Fix and clean up function do_qc gfs2: Revert "Add quota_change type" gfs2: Revert "ignore negated quota changes" gfs2: qd_check_sync cleanups gfs2: Revert "introduce qd_bh_get_or_undo" gfs2: Check quota consistency on mount gfs2: Minor gfs2_quota_init error path cleanup gfs2: Get rid of demote_ok checks Revert "GFS2: Don't add all glocks to the lru" gfs2: Revise glock reference counting model gfs2: Switch to a per-filesystem glock workqueue gfs2: Report when glocks cannot be freed for a long time gfs2: gfs2_glock_get cleanup ...
2024-07-17Merge tag 'dlm-6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: - New flag DLM_LSFL_SOFTIRQ_SAFE can be set by code using dlm to indicate callbacks can be run from softirq - Change md-cluster to set DLM_LSFL_SOFTIRQ_SAFE - Clean up for previous changes, e.g. unused code and parameters - Remove custom pre-allocation of rsb structs which is unnecessary with kmem caches - Change idr to xarray for lkb structs in use - Change idr to xarray for rsb structs being recovered - Change outdated naming related to internal rsb states - Fix some incorrect add/remove of rsb on scan list - Use rcu to free rsb structs * tag 'dlm-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: add rcu_barrier before destroy kmem cache dlm: remove DLM_LSFL_SOFTIRQ from exflags fs: dlm: remove unused struct 'dlm_processed_nodes' md-cluster: use DLM_LSFL_SOFTIRQ for dlm_new_lockspace() dlm: implement LSFL_SOFTIRQ_SAFE dlm: introduce DLM_LSFL_SOFTIRQ_SAFE dlm: use LSFL_FS to check for kernel lockspace dlm: use rcu to avoid an extra rsb struct lookup dlm: fix add_scan and del_scan usage dlm: change list and timer names dlm: move recover idr to xarray datastructure dlm: move lkb idr to xarray datastructure dlm: drop own rsb pre allocation mechanism dlm: remove ls_local_handle from struct dlm_ls dlm: remove unused parameter in dlm_midcomms_addr dlm: don't kref_init rsbs created for toss list dlm: remove scand leftovers
2024-07-17Merge tag 'erofs-for-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "Updates for folio conversions for compressed inodes: While large folio support for compressed data could work now, it remains disabled since the stress test could hang due to page migration in a few hours after enabling it. I need more time to investigate further before enabling this feature. Additionally, clean up stream decompressors and tracepoints for simplicity. Summary: - More folio conversions for compressed inodes - Stream decompressor (LZMA/DEFLATE/ZSTD) cleanups - Minor tracepoint cleanup" * tag 'erofs-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: silence uninitialized variable warning in z_erofs_scan_folio() erofs: avoid refcounting short-lived pages erofs: get rid of z_erofs_map_blocks_iter_* tracepoints erofs: tidy up stream decompressors erofs: refine z_erofs_{init,exit}_subsystem() erofs: move each decompressor to its own source file erofs: tidy up `struct z_erofs_bvec` erofs: teach z_erofs_scan_folios() to handle multi-page folios erofs: convert z_erofs_read_fragment() to folios erofs: convert z_erofs_pcluster_readmore() to folios
2024-07-17Merge tag 'nfsd-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "This is a light release containing optimizations, code clean-ups, and minor bug fixes. This development cycle focused on work outside of upstream kernel development: - Continuing to build upstream CI for NFSD based on kdevops - Continuing to focus on the quality of NFSD in LTS kernels - Participation in IETF nfsv4 WG discussions about NFSv4 ACLs, directory delegation, and NFSv4.2 COPY offload Notable features for v6.11 that do not come through the NFSD tree include NFS server-side support for the new pNFS NVMe layout type [RFC9561]. Functional testing for pNFS block layouts like this one has been introduced to our kdevops CI harness. Work on improving the resolution of file attribute time stamps in local filesystems is also ongoing tree-wide. As always I am grateful to NFSD contributors, reviewers, testers, and bug reporters who participated during this cycle" * tag 'nfsd-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: nfsd_file_lease_notifier_call gets a file_lease as an argument gss_krb5: Fix the error handling path for crypto_sync_skcipher_setkey MAINTAINERS: Add a bugzilla link for NFSD nfsd: new netlink ops to get/set server pool_mode sunrpc: refactor pool_mode setting code nfsd: allow passing in array of thread counts via netlink nfsd: make nfsd_svc take an array of thread counts sunrpc: fix up the special handling of sv_nrpools == 1 SUNRPC: Add a trace point in svc_xprt_deferred_close NFSD: Support write delegations in LAYOUTGET lockd: Use *-y instead of *-objs in Makefile NFSD: Fix nfsdcld warning svcrdma: Handle ADDR_CHANGE CM event properly svcrdma: Refactor the creation of listener CMA ID NFSD: remove unused structs 'nfsd3_voidargs' NFSD: harden svcxdr_dupstr() and svcxdr_tmpalloc() against integer overflows
2024-07-17Merge tag 'affs-6.11-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull affs updates from David Sterba: - conversions of one-element arrays to flexible arrays * tag 'affs-6.11-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: affs: struct slink_front: Replace 1-element array with flexible array affs: struct affs_data_head: Replace 1-element array with flexible array affs: struct affs_head: Replace 1-element array with flexible array
2024-07-17nfs: split nfs_read_folioChristoph Hellwig
nfs_read_folio is a bit hard to follow because it mixes highlevel logic with the actual data read. Split the latter into a helper and update the comments to be more accurate. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-17nfs: pass explicit offset/count to trace eventsChristoph Hellwig
nfs_folio_length is unsafe to use without having the folio locked and a check for a NULL ->f_mapping that protects against truncations and can lead to kernel crashes. E.g. when running xfstests generic/065 with all nfs trace points enabled. Follow the model of the XFS trace points and pass in an explіcit offset and length. This has the additional benefit that these values can be more accurate as some of the users touch partial folio ranges. Fixes: eb5654b3b89d ("NFS: Enable tracing of nfs_invalidate_folio() and nfs_launder_folio()") Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-17printk: Add a short description string to kmsg_dump()Jocelyn Falempe
kmsg_dump doesn't forward the panic reason string to the kmsg_dumper callback. This patch adds a new struct kmsg_dump_detail, that will hold the reason and description, and pass it to the dump() callback. To avoid updating all kmsg_dump() call, it adds a kmsg_dump_desc() function and a macro for backward compatibility. I've written this for drm_panic, but it can be useful for other kmsg_dumper. It allows to see the panic reason, like "sysrq triggered crash" or "VFS: Unable to mount root fs on xxxx" on the drm panic screen. v2: * Use a struct kmsg_dump_detail to hold the reason and description pointer, for more flexibility if we want to add other parameters. (Kees Cook) * Fix powerpc/nvram_64 build, as I didn't update the forward declaration of oops_to_nvram() Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> Acked-by: Petr Mladek <pmladek@suse.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Kees Cook <kees@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240702122639.248110-1-jfalempe@redhat.com
2024-07-17virtio: rename virtio_find_vqs_info() to virtio_find_vqs()Jiri Pirko
Since the original virtio_find_vqs() is no longer present, rename virtio_find_vqs_info() back to virtio_find_vqs(). Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240708074814.1739223-20-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-17virtiofs: convert to use virtio_find_vqs_info()Jiri Pirko
Instead of passing separate names and callbacks arrays to virtio_find_vqs(), allocate one of virtual_queue_info structs and pass it to virtio_find_vqs_info(). Suggested-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240708074814.1739223-16-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-16Merge tag 'net-next-6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Not much excitement - a handful of large patchsets (devmem among them) did not make it in time. Core & protocols: - Use local_lock in addition to local_bh_disable() to protect per-CPU resources in networking, a step closer for local_bh_disable() not to act as a big lock on PREEMPT_RT - Use flex array for netdevice priv area, ensure its cache alignment - Add a sysctl knob to allow user to specify a default rto_min at socket init time. Bit of a big hammer but multiple companies were independently carrying such patch downstream so clearly it's useful - Support scheduling transmission of packets based on CLOCK_TAI - Un-pin TCP TIMEWAIT timer to avoid it firing on CPUs later cordoned off using cpusets - Support multiple L2TPv3 UDP tunnels using the same 5-tuple address - Allow configuration of multipath hash seed, to both allow synchronizing hashing of two routers, and preventing partial accidental sync - Improve TCP compliance with RFC 9293 for simultaneous connect() - Support sending NAT keepalives in IPsec ESP in UDP states. Userspace IKE daemon had to do this before, but the kernel can better keep track of it - Support sending supervision HSR frames with MAC addresses stored in ProxyNodeTable when RedBox (i.e. HSR-SAN) is enabled - Introduce IPPROTO_SMC for selecting SMC when socket is created - Allow UDP GSO transmit from devices with no checksum offload - openvswitch: add packet sampling via psample, separating the sampled traffic from "upcall" packets sent to user space for forwarding - nf_tables: shrink memory consumption for transaction objects Things we sprinkled into general kernel code: - Power Sequencing subsystem (used by Qualcomm Bluetooth driver for QCA6390) [ Already merged separately - Linus ] - Add IRQ information in sysfs for auxiliary bus - Introduce guard definition for local_lock - Add aligned flavor of __cacheline_group_{begin, end}() markings for grouping fields in structures BPF: - Notify user space (via epoll) when a struct_ops object is getting detached/unregistered - Add new kfuncs for a generic, open-coded bits iterator - Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head - Support resilient split BTF which cuts down on duplication and makes BTF as compact as possible WRT BTF from modules - Add support for dumping kfunc prototypes from BTF which enables both detecting as well as dumping compilable prototypes for kfuncs - riscv64 BPF JIT improvements in particular to add 12-argument support for BPF trampolines and to utilize bpf_prog_pack for the latter - Add the capability to offload the netfilter flowtable in XDP layer through kfuncs Driver API: - Allow users to configure IRQ tresholds between which automatic IRQ moderation can choose - Expand Power Sourcing (PoE) status with power, class and failure reason. Support setting power limits - Track additional RSS contexts in the core, make sure configuration changes don't break them - Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated ESP data paths - Support updating firmware on SFP modules Tests and tooling: - mptcp: use net/lib.sh to manage netns - TCP-AO and TCP-MD5: replace debug prints used by tests with tracepoints - openvswitch: make test self-contained (don't depend on OvS CLI tools) Drivers: - Ethernet high-speed NICs: - Broadcom (bnxt): - increase the max total outstanding PTP TX packets to 4 - add timestamping statistics support - implement netdev_queue_mgmt_ops - support new RSS context API - Intel (100G, ice, idpf): - implement FEC statistics and dumping signal quality indicators - support E825C products (with 56Gbps PHYs) - nVidia/Mellanox: - support HW-GRO - mlx4/mlx5: support per-queue statistics via netlink - obey the max number of EQs setting in sub-functions - AMD/Solarflare: - support new RSS context API - AMD/Pensando: - ionic: rework fix for doorbell miss to lower overhead and skip it on new HW - Wangxun: - txgbe: support Flow Director perfect filters - Ethernet NICs consumer, embedded and virtual: - Add driver for Tehuti Networks TN40xx chips - Add driver for Meta's internal NIC chips - Add driver for Ethernet MAC on Airoha EN7581 SoCs - Add driver for Renesas Ethernet-TSN devices - Google cloud vNIC: - flow steering support - Microsoft vNIC: - support page sizes other than 4KB on ARM64 - vmware vNIC: - support latency measurement (update to version 9) - VirtIO net: - support for Byte Queue Limits - support configuring thresholds for automatic IRQ moderation - support for AF_XDP Rx zero-copy - Synopsys (stmmac): - support for STM32MP13 SoC - let platforms select the right PCS implementation - TI: - icssg-prueth: add multicast filtering support - icssg-prueth: enable PTP timestamping and PPS - Renesas: - ravb: improve Rx performance 30-400% by using page pool, theaded NAPI and timer-based IRQ coalescing - ravb: add MII support for R-Car V4M - Cadence (macb): - macb: add ARP support to Wake-On-LAN - Cortina: - use phylib for RX and TX pause configuration - Ethernet switches: - nVidia/Mellanox: - support configuration of multipath hash seed - report more accurate max MTU - use page_pool to improve Rx performance - MediaTek: - mt7530: add support for bridge port isolation - Qualcomm: - qca8k: add support for bridge port isolation - Microchip: - lan9371/2: add 100BaseTX PHY support - NXP: - vsc73xx: implement VLAN operations - Ethernet PHYs: - aquantia: enable support for aqr115c - aquantia: add support for PHY LEDs - realtek: add support for rtl8224 2.5Gbps PHY - xpcs: add memory-mapped device support - add BroadR-Reach link mode and support in Broadcom's PHY driver - CAN: - add document for ISO 15765-2 protocol support - mcp251xfd: workaround for erratum DS80000789E, use timestamps to catch when device returns incorrect FIFO status - WiFi: - mac80211/cfg80211: - parse Transmit Power Envelope (TPE) data in mac80211 instead of in drivers - improvements for 6 GHz regulatory flexibility - multi-link improvements - support multiple radios per wiphy - remove DEAUTH_NEED_MGD_TX_PREP flag - Intel (iwlwifi): - bump FW API to 91 for BZ/SC devices - report 64-bit radiotap timestamp - enable P2P low latency by default - handle Transmit Power Envelope (TPE) advertised by AP - remove support for older FW for new devices - fast resume (keeping the device configured) - mvm: re-enable Multi-Link Operation (MLO) - aggregation (A-MSDU) optimizations - MediaTek (mt76): - mt7925 Multi-Link Operation (MLO) support - Qualcomm (ath10k): - LED support for various chipsets - Qualcomm (ath12k): - remove unsupported Tx monitor handling - support channel 2 in 6 GHz band - support Spatial Multiplexing Power Save (SMPS) in 6 GHz band - supprt multiple BSSID (MBSSID) and Enhanced Multi-BSSID Advertisements (EMA) - support dynamic VLAN - add panic handler for resetting the firmware state - DebugFS support for datapath statistics - WCN7850: support for Wake on WLAN - Microchip (wilc1000): - read MAC address during probe to make it visible to user space - suspend/resume improvements - TI (wl18xx): - support newer firmware versions - RealTek (rtw89): - preparation for RTL8852BE-VT support - Wake on WLAN support for WiFi 6 chips - 36-bit PCI DMA support - RealTek (rtlwifi): - RTL8192DU support - Broadcom (brcmfmac): - Management Frame Protection support (to enable WPA3) - Bluetooth: - qualcomm: use the power sequencer for QCA6390 - btusb: mediatek: add ISO data transmission functions - hci_bcm4377: add BCM4388 support - btintel: add support for BlazarU core - btintel: add support for Whale Peak2 - btnxpuart: add support for AW693 A1 chipset - btnxpuart: add support for IW615 chipset - btusb: add Realtek RTL8852BE support ID 0x13d3:0x3591" * tag 'net-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1589 commits) eth: fbnic: Fix spelling mistake "tiggerring" -> "triggering" tcp: Replace strncpy() with strscpy() wifi: ath12k: fix build vs old compiler tcp: Don't access uninit tcp_rsk(req)->ao_keyid in tcp_create_openreq_child(). eth: fbnic: Write the TCAM tables used for RSS control and Rx to host eth: fbnic: Add L2 address programming eth: fbnic: Add basic Rx handling eth: fbnic: Add basic Tx handling eth: fbnic: Add link detection eth: fbnic: Add initial messaging to notify FW of our presence eth: fbnic: Implement Rx queue alloc/start/stop/free eth: fbnic: Implement Tx queue alloc/start/stop/free eth: fbnic: Allocate a netdevice and napi vectors with queues eth: fbnic: Add FW communication mechanism eth: fbnic: Add message parsing for FW messages eth: fbnic: Add register init to set PCIe/Ethernet device config eth: fbnic: Allocate core device specific structures and devlink interface eth: fbnic: Add scaffolding for Meta's NIC driver PCI: Add Meta Platforms vendor ID net/sched: cls_flower: propagate tca[TCA_OPTIONS] to NL_REQ_ATTR_CHECK ...
2024-07-16Merge tag 'sysctl-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl Pull sysctl updates from Joel Granados: - Remove "->procname == NULL" check when iterating through sysctl table arrays Removing sentinels in ctl_table arrays reduces the build time size and runtime memory consumed by ~64 bytes per array. With all ctl_table sentinels gone, the additional check for ->procname == NULL that worked in tandem with the ARRAY_SIZE to calculate the size of the ctl_table arrays is no longer needed and has been removed. The sysctl register functions now returns an error if a sentinel is used. - Preparation patches for sysctl constification Constifying ctl_table structs prevents the modification of proc_handler function pointers as they would reside in .rodata. The ctl_table arguments in sysctl utility functions are const qualified in preparation for a future treewide proc_handler argument constification commit. - Misc fixes Increase robustness of set_ownership by providing sane default ownership values in case the callee doesn't set them. Bound check proc_dou8vec_minmax to avoid loading buggy modules and give sysctl testing module a name to avoid compiler complaints. * tag 'sysctl-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: sysctl: Warn on an empty procname element sysctl: Remove ctl_table sentinel code comments sysctl: Remove "child" sysctl code comments sysctl: Remove superfluous empty allocations from sysctl internals sysctl: Replace nr_entries with ctl_table_size in new_links sysctl: Remove check for sentinel element in ctl_table arrays mm profiling: Remove superfluous sentinel element from ctl_table locking: Remove superfluous sentinel element from kern_lockdep_table sysctl: Add module description to sysctl-testing sysctl: constify ctl_table arguments of utility function utsname: constify ctl_table arguments of utility function sysctl: move the extra1/2 boundary check of u8 to sysctl_check_table_array sysctl: always initialize i_uid/i_gid
2024-07-16Merge tag 'pstore-v6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore updates from Kees Cook: - Add missing MODULE_DESCRIPTION() macro (Jeff Johnson) - Replace deprecated strncpy() with strscpy() (Justin Stitt) * tag 'pstore-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore: platform: add missing MODULE_DESCRIPTION() macro pstore/blk: replace deprecated strncpy with strscpy
2024-07-16Merge tag 'execve-v6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull execve updates from Kees Cook: - Use value of kernel.randomize_va_space once per exec (Alexey Dobriyan) - Honor PT_LOAD alignment for static PIE - Make bprm->argmin only visible under CONFIG_MMU - Add KUnit testing of bprm_stack_limits() * tag 'execve-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: exec: Avoid pathological argc, envc, and bprm->p values execve: Keep bprm->argmin behind CONFIG_MMU ELF: fix kernel.randomize_va_space double read exec: Add KUnit test for bprm_stack_limits() binfmt_elf: Honor PT_LOAD alignment for static PIE binfmt_elf: Calculate total_size earlier selftests/exec: Build both static and non-static load_address tests
2024-07-16Merge tag 'x86_sev_for_v6.11_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV updates from Borislav Petkov: - Add support for running the kernel in a SEV-SNP guest, over a Secure VM Service Module (SVSM). When running over a SVSM, different services can run at different protection levels, apart from the guest OS but still within the secure SNP environment. They can provide services to the guest, like a vTPM, for example. This series adds the required facilities to interface with such a SVSM module. - The usual fixlets, refactoring and cleanups [ And as always: "SEV" is AMD's "Secure Encrypted Virtualization". I can't be the only one who gets all the newer x86 TLA's confused, can I? - Linus ] * tag 'x86_sev_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/ABI/configfs-tsm: Fix an unexpected indentation silly x86/sev: Do RMP memory coverage check after max_pfn has been set x86/sev: Move SEV compilation units virt: sev-guest: Mark driver struct with __refdata to prevent section mismatch x86/sev: Allow non-VMPL0 execution when an SVSM is present x86/sev: Extend the config-fs attestation support for an SVSM x86/sev: Take advantage of configfs visibility support in TSM fs/configfs: Add a callback to determine attribute visibility sev-guest: configfs-tsm: Allow the privlevel_floor attribute to be updated virt: sev-guest: Choose the VMPCK key based on executing VMPL x86/sev: Provide guest VMPL level to userspace x86/sev: Provide SVSM discovery support x86/sev: Use the SVSM to create a vCPU when not in VMPL0 x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0 x86/sev: Use kernel provided SVSM Calling Areas x86/sev: Check for the presence of an SVSM in the SNP secrets page x86/irqflags: Provide native versions of the local_irq_save()/restore()
2024-07-15Merge tag 'for-6.11/block-20240710' of git://git.kernel.dk/linuxLinus Torvalds
Pull block updates from Jens Axboe: - NVMe updates via Keith: - Device initialization memory leak fixes (Keith) - More constants defined (Weiwen) - Target debugfs support (Hannes) - PCIe subsystem reset enhancements (Keith) - Queue-depth multipath policy (Redhat and PureStorage) - Implement get_unique_id (Christoph) - Authentication error fixes (Gaosheng) - MD updates via Song - sync_action fix and refactoring (Yu Kuai) - Various small fixes (Christoph Hellwig, Li Nan, and Ofir Gal, Yu Kuai, Benjamin Marzinski, Christophe JAILLET, Yang Li) - Fix loop detach/open race (Gulam) - Fix lower control limit for blk-throttle (Yu) - Add module descriptions to various drivers (Jeff) - Add support for atomic writes for block devices, and statx reporting for same. Includes SCSI and NVMe (John, Prasad, Alan) - Add IO priority information to block trace points (Dongliang) - Various zone improvements and tweaks (Damien) - mq-deadline tag reservation improvements (Bart) - Ignore direct reclaim swap writes in writeback throttling (Baokun) - Block integrity improvements and fixes (Anuj) - Add basic support for rust based block drivers. Has a dummy null_blk variant for now (Andreas) - Series converting driver settings to queue limits, and cleanups and fixes related to that (Christoph) - Cleanup for poking too deeply into the bvec internals, in preparation for DMA mapping API changes (Christoph) - Various minor tweaks and fixes (Jiapeng, John, Kanchan, Mikulas, Ming, Zhu, Damien, Christophe, Chaitanya) * tag 'for-6.11/block-20240710' of git://git.kernel.dk/linux: (206 commits) floppy: add missing MODULE_DESCRIPTION() macro loop: add missing MODULE_DESCRIPTION() macro ublk_drv: add missing MODULE_DESCRIPTION() macro xen/blkback: add missing MODULE_DESCRIPTION() macro block/rnbd: Constify struct kobj_type block: take offset into account in blk_bvec_map_sg again block: fix get_max_segment_size() warning loop: Don't bother validating blocksize virtio_blk: Don't bother validating blocksize null_blk: Don't bother validating blocksize block: Validate logical block size in blk_validate_limits() virtio_blk: Fix default logical block size fallback nvmet-auth: fix nvmet_auth hash error handling nvme: implement ->get_unique_id block: pass a phys_addr_t to get_max_segment_size block: add a bvec_phys helper blk-lib: check for kill signal in ioctl BLKZEROOUT block: limit the Write Zeroes to manually writing zeroes fallback block: refacto blkdev_issue_zeroout block: move read-only and supported checks into (__)blkdev_issue_zeroout ...
2024-07-15Merge tag 'vfs-6.11.iomap' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull iomap updates from Christian Brauner: "This contains some minor work for the iomap subsystem: - Add documentation on the design of iomap and how to port to it - Optimize iomap_read_folio() - Bring back the change to iomap_write_end() to no increase i_size. This is accompanied by a change to xfs to reserve blocks for truncating large realtime inodes to avoid exposing stale data when iomap_write_end() stops increasing i_size" * tag 'vfs-6.11.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: don't increase i_size in iomap_write_end() xfs: reserve blocks for truncating large realtime inode Documentation: the design of iomap and how to port iomap: Optimize iomap_read_folio
2024-07-15Merge tag 'vfs-6.11.pidfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull pidfs updates from Christian Brauner: "This contains work to make it possible to derive namespace file descriptors from pidfd file descriptors. Right now it is already possible to use a pidfd with setns() to atomically change multiple namespaces at the same time. In other words, it is possible to switch to the namespace context of a process using a pidfd. There is no need to first open namespace file descriptors via procfs. The work included here is an extension of these abilities by allowing to open namespace file descriptors using a pidfd. This means it is now possible to interact with namespaces without ever touching procfs. To this end a new set of ioctls() on pidfds is introduced covering all supported namespace types" * tag 'vfs-6.11.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: pidfs: allow retrieval of namespace file descriptors nsfs: add open_namespace() nsproxy: add helper to go from arbitrary namespace to ns_common nsproxy: add a cleanup helper for nsproxy file: add take_fd() cleanup helper
2024-07-15Merge tag 'vfs-6.11.nsfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull namespace-fs updates from Christian Brauner: "This adds ioctls allowing to translate PIDs between PID namespaces. The motivating use-case comes from LXCFS which is a tiny fuse filesystem used to virtualize various aspects of procfs. LXCFS is run on the host. The files and directories it creates can be bind-mounted by e.g. a container at startup and mounted over the various procfs files the container wishes to have virtualized. When e.g. a read request for uptime is received, LXCFS will receive the pid of the reader. In order to virtualize the corresponding read, LXCFS needs to know the pid of the init process of the reader's pid namespace. In order to do this, LXCFS first needs to fork() two helper processes. The first helper process setns() to the readers pid namespace. The second helper process is needed to create a process that is a proper member of the pid namespace. The second helper process then creates a ucred message with ucred.pid set to 1 and sends it back to LXCFS. The kernel will translate the ucred.pid field to the corresponding pid number in LXCFS's pid namespace. This way LXCFS can learn the init pid number of the reader's pid namespace and can go on to virtualize. Since these two forks() are costly LXCFS maintains an init pid cache that caches a given pid for a fixed amount of time. The cache is pruned during new read requests. However, even with the cache the hit of the two forks() is singificant when a very large number of containers are running. So this adds a simple set of ioctls that let's a caller translate PIDs from and into a given PID namespace. This significantly improves performance with a very simple change. To protect against races pidfds can be used to check whether the process is still valid" * tag 'vfs-6.11.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nsfs: add pid translation ioctls
2024-07-15Merge tag 'vfs-6.11.mount' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs mount query updates from Christian Brauner: "This contains work to extend the abilities of listmount() and statmount() and various fixes and cleanups. Features: - Allow iterating through mounts via listmount() from newest to oldest. This makes it possible for mount(8) to keep iterating the mount table in reverse order so it gets newest mounts first. - Relax permissions on listmount() and statmount(). It's not necessary to have capabilities in the initial namespace: it is sufficient to have capabilities in the owning namespace of the mount namespace we're located in to list unreachable mounts in that namespace. - Extend both listmount() and statmount() to list and stat mounts in foreign mount namespaces. Currently the only way to iterate over mount entries in mount namespaces that aren't in the caller's mount namespace is by crawling through /proc in order to find /proc/<pid>/mountinfo for the relevant mount namespace. This is both very clumsy and hugely inefficient. So extend struct mnt_id_req with a new member that allows to specify the mount namespace id of the mount namespace we want to look at. Luckily internally we already have most of the infrastructure for this so we just need to expose it to userspace. Give userspace a way to retrieve the id of a mount namespace via statmount() and through a new nsfs ioctl() on mount namespace file descriptor. This comes with appropriate selftests. - Expose mount options through statmount(). Currently if userspace wants to get mount options for a mount and with statmount(), they still have to open /proc/<pid>/mountinfo to parse mount options. Simply the information through statmount() directly. Afterwards it's possible to only rely on statmount() and listmount() to retrieve all and more information than /proc/<pid>/mountinfo provides. This comes with appropriate selftests. Fixes: - Avoid copying to userspace under the namespace semaphore in listmount. Cleanups: - Simplify the error handling in listmount by relying on our newly added cleanup infrastructure. - Refuse invalid mount ids early for both listmount and statmount" * tag 'vfs-6.11.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: reject invalid last mount id early fs: refuse mnt id requests with invalid ids early fs: find rootfs mount of the mount namespace fs: only copy to userspace on success in listmount() sefltests: extend the statmount test for mount options fs: use guard for namespace_sem in statmount() fs: export mount options via statmount() fs: rename show_mnt_opts -> show_vfsmnt_opts selftests: add a test for the foreign mnt ns extensions fs: add an ioctl to get the mnt ns id from nsfs fs: Allow statmount() in foreign mount namespace fs: Allow listmount() in foreign mount namespace fs: export the mount ns id via statmount fs: keep an index of current mount namespaces fs: relax permissions for statmount() listmount: allow listing in reverse order fs: relax permissions for listmount() fs: simplify error handling fs: don't copy to userspace under namespace semaphore path: add cleanup helper