linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2023-12-29	kasan: move kasan_mempool_poison_object	Andrey Konovalov
	Move kasan_mempool_poison_object after all slab-related KASAN hooks. This is a preparatory change for the following patches in this series. No functional changes. Link: https://lkml.kernel.org/r/23ea215409f43c13cdf9ecc454501a264c107d67.1703024586.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Lobakin <alobakin@pm.me> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Breno Leitao <leitao@debian.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	kasan: rename kasan_slab_free_mempool to kasan_mempool_poison_object	Andrey Konovalov
	Patch series "kasan: save mempool stack traces". This series updates KASAN to save alloc and free stack traces for secondary-level allocators that cache and reuse allocations internally instead of giving them back to the underlying allocator (e.g. mempool). As a part of this change, introduce and document a set of KASAN hooks: bool kasan_mempool_poison_pages(struct page page, unsigned int order); void kasan_mempool_unpoison_pages(struct page page, unsigned int order); bool kasan_mempool_poison_object(void ptr); void kasan_mempool_unpoison_object(void ptr, size_t size); and use them in the mempool code. Besides mempool, skbuff and io_uring also cache allocations and already use KASAN hooks to poison those. Their code is updated to use the new mempool hooks. The new hooks save alloc and free stack traces (for normal kmalloc and slab objects; stack traces for large kmalloc objects and page_alloc are not supported by KASAN yet), improve the readability of the users' code, and also allow the users to prevent double-free and invalid-free bugs; see the patches for the details. This patch (of 21): Rename kasan_slab_free_mempool to kasan_mempool_poison_object. kasan_slab_free_mempool is a slightly confusing name: it is unclear whether this function poisons the object when it is freed into mempool or does something when the object is freed from mempool to the underlying allocator. The new name also aligns with other mempool-related KASAN hooks added in the following patches in this series. Link: https://lkml.kernel.org/r/cover.1703024586.git.andreyknvl@google.com Link: https://lkml.kernel.org/r/c5618685abb7cdbf9fb4897f565e7759f601da84.1703024586.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Lobakin <alobakin@pm.me> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Breno Leitao <leitao@debian.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	fs: remove the bh_end_io argument from __block_write_full_folio	Matthew Wilcox (Oracle)
	All callers are passing end_buffer_async_write as this argument, so we can hardcode references to it within __block_write_full_folio(). That lets us make end_buffer_async_write() static. Link: https://lkml.kernel.org/r/20231215200245.748418-15-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	fs: convert block_write_full_page to block_write_full_folio	Matthew Wilcox (Oracle)
	Convert the function to be compatible with writepage_t so that it can be passed to write_cache_pages() by blkdev. This removes a call to compound_head(). We can also remove the function export as both callers are built-in. Link: https://lkml.kernel.org/r/20231215200245.748418-14-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	fs: remove clean_page_buffers()	Matthew Wilcox (Oracle)
	Patch series "Clean up the writeback paths". Most of these patches verge on the trivial, converting filesystems that just use block_write_full_page() to use mpage_writepages(). But as we saw with Christoph's earlier patchset, there can be some "interesting" gotchas, and I clearly haven't tested the majority of filesystems I've touched here. Patches 3 & 4 get rid of a lot of stack usage on architectures with larger page sizes; 1024 bytes on 64-bit systems with 64KiB pages. It starts to open the door to larger folio sizes on all architectures, but it's certainly not enough yet. Patch 14 is kind of trivial, but it's nice to get that simplification in. This patch (of 14): This function has been unused since the removal of bdev_write_page(). Link: https://lkml.kernel.org/r/20231215200245.748418-1-willy@infradead.org Link: https://lkml.kernel.org/r/20231215200245.748418-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	mm: remove page_swap_info()	Matthew Wilcox (Oracle)
	It's more efficient to get the swap_info_struct by calling swp_swap_info() directly. Link: https://lkml.kernel.org/r/20231213215842.671461-12-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	mm: convert swap_page_sector() to swap_folio_sector()	Matthew Wilcox (Oracle)
	All callers have a folio, so pass it in. Saves a couple of calls to compound_head(). Link: https://lkml.kernel.org/r/20231213215842.671461-10-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	mm: return the folio from __read_swap_cache_async()	Matthew Wilcox (Oracle)
	Patch series "More swap folio conversions". These all seem like fairly straightforward conversions to me. A lot of compound_head() calls get removed. And page_swap_info(), which is nice. This patch (of 13): Move the folio->page conversion into the callers that actually want that. Most of the callers are happier with the folio anyway. If the page_allocated boolean is set, the folio allocated is of order-0, so it is safe to pass the page directly to swap_readpage(). Link: https://lkml.kernel.org/r/20231213215842.671461-1-willy@infradead.org Link: https://lkml.kernel.org/r/20231213215842.671461-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	mm/zswap: change per-cpu mutex and buffer to per-acomp_ctx	Chengming Zhou
	First of all, we need to rename acomp_ctx->dstmem field to buffer, since we are now using for purposes other than compression. Then we change per-cpu mutex and buffer to per-acomp_ctx, since them belong to the acomp_ctx and are necessary parts when used in the compress/decompress contexts. So we can remove the old per-cpu mutex and dstmem. Link: https://lkml.kernel.org/r/20231213-zswap-dstmem-v5-5-9382162bbf05@bytedance.com Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Acked-by: Chris Li <chrisl@kernel.org> (Google) Reviewed-by: Nhat Pham <nphamcs@gmail.com> Cc: Barry Song <21cnbao@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	mm/ksm: add tracepoint for ksm advisor	Stefan Roesch
	This adds a new tracepoint for the ksm advisor. It reports the last scan time, the new setting of the pages_to_scan parameter and the average cpu percent usage of the ksmd background thread for the last scan. Link: https://lkml.kernel.org/r/20231218231054.1625219-4-shr@devkernel.io Signed-off-by: Stefan Roesch <shr@devkernel.io> Acked-by: David Hildenbrand <david@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	mm: remove page_add_new_anon_rmap and lru_cache_add_inactive_or_unevictable	Matthew Wilcox (Oracle)
	All callers have now been converted to folio_add_new_anon_rmap() and folio_add_lru_vma() so we can remove the wrapper. Link: https://lkml.kernel.org/r/20231211162214.2146080-10-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	mm: convert ksm_might_need_to_copy() to work on folios	Matthew Wilcox (Oracle)
	Patch series "Finish two folio conversions". Most callers of page_add_new_anon_rmap() and lru_cache_add_inactive_or_unevictable() have been converted to their folio equivalents, but there are still a few stragglers. There's a bit of preparatory work in ksm and unuse_pte(), but after that it's pretty mechanical. This patch (of 9): Accept a folio as an argument and return a folio result. Removes a call to compound_head() in do_swap_page(), and prevents folio & page from getting out of sync in unuse_pte(). Reviewed-by: David Hildenbrand <david@redhat.com> [willy@infradead.org: fix smatch warning] Link: https://lkml.kernel.org/r/ZXnPtblC6A1IkyAB@casper.infradead.org [david@redhat.com: only adjust the page if the folio changed] Link: https://lkml.kernel.org/r/6a8f2110-fa91-4c10-9eae-88315309a6e3@redhat.com Link: https://lkml.kernel.org/r/20231211162214.2146080-1-willy@infradead.org Link: https://lkml.kernel.org/r/20231211162214.2146080-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	userfaultfd: UFFDIO_MOVE uABI	Andrea Arcangeli
	Implement the uABI of UFFDIO_MOVE ioctl. UFFDIO_COPY performs ~20% better than UFFDIO_MOVE when the application needs pages to be allocated [1]. However, with UFFDIO_MOVE, if pages are available (in userspace) for recycling, as is usually the case in heap compaction algorithms, then we can avoid the page allocation and memcpy (done by UFFDIO_COPY). Also, since the pages are recycled in the userspace, we avoid the need to release (via madvise) the pages back to the kernel [2]. We see over 40% reduction (on a Google pixel 6 device) in the compacting thread's completion time by using UFFDIO_MOVE vs. UFFDIO_COPY. This was measured using a benchmark that emulates a heap compaction implementation using userfaultfd (to allow concurrent accesses by application threads). More details of the usecase are explained in [2]. Furthermore, UFFDIO_MOVE enables moving swapped-out pages without touching them within the same vma. Today, it can only be done by mremap, however it forces splitting the vma. [1] https://lore.kernel.org/all/1425575884-2574-1-git-send-email-aarcange@redhat.com/ [2] https://lore.kernel.org/linux-mm/CA+EESO4uO84SSnBhArH4HvLNhaUQ5nZKNKXqxRCyjniNVjp0Aw@mail.gmail.com/ Update for the ioctl_userfaultfd(2) manpage: UFFDIO_MOVE (Since Linux xxx) Move a continuous memory chunk into the userfault registered range and optionally wake up the blocked thread. The source and destination addresses and the number of bytes to move are specified by the src, dst, and len fields of the uffdio_move structure pointed to by argp: struct uffdio_move { __u64 dst; /* Destination of move / __u64 src; / Source of move / __u64 len; / Number of bytes to move / __u64 mode; / Flags controlling behavior of move / __s64 move; / Number of bytes moved, or negated error */ }; The following value may be bitwise ORed in mode to change the behavior of the UFFDIO_MOVE operation: UFFDIO_MOVE_MODE_DONTWAKE Do not wake up the thread that waits for page-fault resolution UFFDIO_MOVE_MODE_ALLOW_SRC_HOLES Allow holes in the source virtual range that is being moved. When not specified, the holes will result in ENOENT error. When specified, the holes will be accounted as successfully moved memory. This is mostly useful to move hugepage aligned virtual regions without knowing if there are transparent hugepages in the regions or not, but preventing the risk of having to split the hugepage during the operation. The move field is used by the kernel to return the number of bytes that was actually moved, or an error (a negated errno- style value). If the value returned in move doesn't match the value that was specified in len, the operation fails with the error EAGAIN. The move field is output-only; it is not read by the UFFDIO_MOVE operation. The operation may fail for various reasons. Usually, remapping of pages that are not exclusive to the given process fail; once KSM might deduplicate pages or fork() COW-shares pages during fork() with child processes, they are no longer exclusive. Further, the kernel might only perform lightweight checks for detecting whether the pages are exclusive, and return -EBUSY in case that check fails. To make the operation more likely to succeed, KSM should be disabled, fork() should be avoided or MADV_DONTFORK should be configured for the source VMA before fork(). This ioctl(2) operation returns 0 on success. In this case, the entire area was moved. On error, -1 is returned and errno is set to indicate the error. Possible errors include: EAGAIN The number of bytes moved (i.e., the value returned in the move field) does not equal the value that was specified in the len field. EINVAL Either dst or len was not a multiple of the system page size, or the range specified by src and len or dst and len was invalid. EINVAL An invalid bit was specified in the mode field. ENOENT The source virtual memory range has unmapped holes and UFFDIO_MOVE_MODE_ALLOW_SRC_HOLES is not set. EEXIST The destination virtual memory range is fully or partially mapped. EBUSY The pages in the source virtual memory range are either pinned or not exclusive to the process. The kernel might only perform lightweight checks for detecting whether the pages are exclusive. To make the operation more likely to succeed, KSM should be disabled, fork() should be avoided or MADV_DONTFORK should be configured for the source virtual memory area before fork(). ENOMEM Allocating memory needed for the operation failed. ESRCH The target process has exited at the time of a UFFDIO_MOVE operation. Link: https://lkml.kernel.org/r/20231206103702.3873743-3-surenb@google.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Nicolas Geoffray <ngeoffray@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: ZhangPeng <zhangpeng362@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29	Merge tag 'block-6.7-2023-12-29' of git://git.kernel.dk/linux	Linus Torvalds
	Pull block fixes from Jens Axboe: "Fix for a badly numbered flag, and a regression fix for the badblocks updates from this merge window" * tag 'block-6.7-2023-12-29' of git://git.kernel.dk/linux: block: renumber QUEUE_FLAG_HW_WC badblocks: avoid checking invalid range in badblocks_check()
2023-12-29	thermal/sysfs: Update governors when the 'weight' has changed	Lukasz Luba
	Support governors update when the thermal instance's weight has changed. This allows to adjust internal state for the governor. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> [ rjw: Add two empty code lines aroung the locking ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29	thermal: core: Add governor callback for thermal zone change	Lukasz Luba
	Add a new callback to the struct thermal_governor. It can be used for updating governors when there is a change in the thermal zone internals, e.g. thermal cooling device is bind to the thermal zone. That makes possible to move some heavy operations like memory allocations related to the number of cooling instances out of the throttle() callback. Both callback code paths (throttle() and update_tz()) are protected with the same thermal zone lock, which guaranties the consistency. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29	Merge branch 'topic/scarlett2' into for-next	Takashi Iwai
	Pull Scarlett2 USB audio mixer extensions. Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-12-29	ALSA: scarlett2: Add support for Solo, 2i2, and 4i4 Gen 4	Geoffrey D. Bennett
	Add new Focusrite Scarlett Gen 4 USB IDs, notification arrays, config sets, and device info data. Signed-off-by: Geoffrey D. Bennett <g@b4.vu> Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://lore.kernel.org/r/b33526d3b7a56bb2c86aa4eb2137a415bd23f1ce.1703612638.git.g@b4.vu
2023-12-29	ALSA: scarlett2: Add ioctl commands to erase flash segments	Geoffrey D. Bennett
	Add ioctls: - SCARLETT2_IOCTL_SELECT_FLASH_SEGMENT - SCARLETT2_IOCTL_ERASE_FLASH_SEGMENT - SCARLETT2_IOCTL_GET_ERASE_PROGRESS The settings or the firmware flash segment can be selected and then erased (asynchronous operation), and the erase progress can be monitored. If the erase progress is not monitored, then subsequent hwdep operations will block until the erase is complete. Once the erase is started, ALSA controls that communicate with the device will all return -EBUSY, and the device must be rebooted. Signed-off-by: Geoffrey D. Bennett <g@b4.vu> Link: https://lore.kernel.org/r/227409adb672f174bf3db211e9bda016fb4646ea.1703001053.git.g@b4.vu Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-12-29	ALSA: scarlett2: Add skeleton hwdep/ioctl interface	Geoffrey D. Bennett
	Add skeleton hwdep/ioctl interface, beginning with SCARLETT2_IOCTL_PVERSION and SCARLETT2_IOCTL_REBOOT. Signed-off-by: Geoffrey D. Bennett <g@b4.vu> Link: https://lore.kernel.org/r/24ffcd47a8a02ebad3c8b2438104af8f0169164e.1703001053.git.g@b4.vu Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-12-29	genetlink: Use internal flags for multicast groups	Ido Schimmel
	As explained in commit e03781879a0d ("drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group"), the "flags" field in the multicast group structure reuses uAPI flags despite the field not being exposed to user space. This makes it impossible to extend its use without adding new uAPI flags, which is inappropriate for internal kernel checks. Solve this by adding internal flags (i.e., "GENL_MCAST_*") and convert the existing users to use them instead of the uAPI flags. Tested using the reproducers in commit 44ec98ea5ea9 ("psample: Require 'CAP_NET_ADMIN' when joining "packets" group") and commit e03781879a0d ("drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group"). No functional changes intended. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-29	Merge tag 'nf-23-12-20' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablu Neira Syuso says: ==================== netfilter pull request 23-12-20 The following patchset contains Netfilter fixes for net: 1) Skip set commit for deleted/destroyed sets, this might trigger double deactivation of expired elements. 2) Fix packet mangling from egress, set transport offset from mac header for netdev/egress. Both fixes address bugs already present in several releases. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-29	iucv: make iucv_bus const	Greg Kroah-Hartman
	Now that the driver core can properly handle constant struct bus_type, move the iucv_bus variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Cc: Wenjia Zhang <wenjia@linux.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: linux-s390@vger.kernel.org Cc: netdev@vger.kernel.org Acked-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-29	crypto: skcipher - remove excess kerneldoc members	Vegard Nossum
	Commit 31865c4c4db2 ("crypto: skcipher - Add lskcipher") moved some fields from 'struct skcipher_alg' into SKCIPHER_ALG_COMMON but didn't remove the corresponding kerneldoc members, which results in these warnings when running 'make htmldocs': ./include/crypto/skcipher.h:182: warning: Excess struct member 'min_keysize' description in 'skcipher_alg' ./include/crypto/skcipher.h:182: warning: Excess struct member 'max_keysize' description in 'skcipher_alg' ./include/crypto/skcipher.h:182: warning: Excess struct member 'ivsize' description in 'skcipher_alg' ./include/crypto/skcipher.h:182: warning: Excess struct member 'chunksize' description in 'skcipher_alg' ./include/crypto/skcipher.h:182: warning: Excess struct member 'stat' description in 'skcipher_alg' ./include/crypto/skcipher.h:182: warning: Excess struct member 'base' description in 'skcipher_alg' SKCIPHER_ALG_COMMON already has the documentation for all these fields. Fixes: 31865c4c4db2 ("crypto: skcipher - Add lskcipher") Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-12-29	crypto: shash - remove excess kerneldoc members	Vegard Nossum
	Commit 42808e5dc602 ("crypto: hash - Count error stats differently") moved some fields from 'struct shash_alg' into HASH_ALG_COMMON but didn't remove the corresponding kerneldoc members, which results in these warnings when running 'make htmldocs': ./include/crypto/hash.h:248: warning: Excess struct member 'digestsize' description in 'shash_alg' ./include/crypto/hash.h:248: warning: Excess struct member 'statesize' description in 'shash_alg' ./include/crypto/hash.h:248: warning: Excess struct member 'stat' description in 'shash_alg' ./include/crypto/hash.h:248: warning: Excess struct member 'base' description in 'shash_alg' HASH_ALG_COMMON already has the documentation for all these fields. Fixes: 42808e5dc602 ("crypto: hash - Count error stats differently") Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-12-29	ethtool: reformat kerneldoc for struct ethtool_fec_stats	Jonathan Corbet
	The kerneldoc comment for struct ethtool_fec_stats attempts to describe the "total" and "lanes" fields of the ethtool_fec_stat substructure in a way leading to these warnings: ./include/linux/ethtool.h:424: warning: Excess struct member 'lane' description in 'ethtool_fec_stats' ./include/linux/ethtool.h:424: warning: Excess struct member 'total' description in 'ethtool_fec_stats' Reformat the comment to retain the information while eliminating the warnings. Signed-off-by: Jonathan Corbet <corbet@lwn.net> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-29	ethtool: reformat kerneldoc for struct ethtool_link_settings	Jonathan Corbet
	The kernel doc comments for struct ethtool_link_settings includes documentation for three fields that were never present there, leading to these docs-build warnings: ./include/uapi/linux/ethtool.h:2207: warning: Excess struct member 'supported' description in 'ethtool_link_settings' ./include/uapi/linux/ethtool.h:2207: warning: Excess struct member 'advertising' description in 'ethtool_link_settings' ./include/uapi/linux/ethtool.h:2207: warning: Excess struct member 'lp_advertising' description in 'ethtool_link_settings' Remove the entries to make the warnings go away. There was some information there on how data in >link_mode_masks is formatted; move that to the body of the comment to preserve it. Signed-off-by: Jonathan Corbet <corbet@lwn.net> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-29	net: sock: remove excess structure-member documentation	Jonathan Corbet
	Remove a couple of kerneldoc entries for struct members that do not exist, addressing these warnings: ./include/net/sock.h:548: warning: Excess struct member '__sk_flags_offset' description in 'sock' ./include/net/sock.h:548: warning: Excess struct member 'sk_padding' description in 'sock' Signed-off-by: Jonathan Corbet <corbet@lwn.net> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-28	Merge tag 'kbuild-fixes-v6.7-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Revive proper alignment for the ksymtab and kcrctab sections - Fix gen_compile_commands.py tool to resolve symbolic links - Fix symbolic links to installed debug VDSO files - Update MAINTAINERS * tag 'kbuild-fixes-v6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: linux/export: Ensure natural alignment of kcrctab array kbuild: fix build ID symlinks to installed debug VDSO files gen_compile_commands.py: fix path resolve with symlinks in it MAINTAINERS: Add scripts/clang-tools to Kbuild section linux/export: Fix alignment for 64-bit ksymtab entries
2023-12-29	linux/export: Ensure natural alignment of kcrctab array	Helge Deller
	The ___kcrctab section holds an array of 32-bit CRC values. Add a .balign 4 to tell the linker the correct memory alignment. Fixes: f3304ecd7f06 ("linux/export: use inline assembler to populate symbol CRCs") Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2023-12-28	thermal: core: Fix thermal zone suspend-resume synchronization	Rafael J. Wysocki
	There are 3 synchronization issues with thermal zone suspend-resume during system-wide transitions: 1. The resume code runs in a PM notifier which is invoked after user space has been thawed, so it can run concurrently with user space which can trigger a thermal zone device removal. If that happens, the thermal zone resume code may use a stale pointer to the next list element and crash, because it does not hold thermal_list_lock while walking thermal_tz_list. 2. The thermal zone resume code calls thermal_zone_device_init() outside the zone lock, so user space or an update triggered by the platform firmware may see an inconsistent state of a thermal zone leading to unexpected behavior. 3. Clearing the in_suspend global variable in thermal_pm_notify() allows __thermal_zone_device_update() to continue for all thermal zones and it may as well run before the thermal_tz_list walk (or at any point during the list walk for that matter) and attempt to operate on a thermal zone that has not been resumed yet. It may also race destructively with thermal_zone_device_init(). To address these issues, add thermal_list_lock locking to thermal_pm_notify(), especially arount the thermal_tz_list, make it call thermal_zone_device_init() back-to-back with __thermal_zone_device_update() under the zone lock and replace in_suspend with per-zone bool "suspend" indicators set and unset under the given zone's lock. Link: https://lore.kernel.org/linux-pm/20231218162348.69101-1-bo.ye@mediatek.com/ Reported-by: Bo Ye <bo.ye@mediatek.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-28	sysctl: remove struct ctl_path	Thomas Weißschuh
	All usages of this struct have been removed from the kernel tree. The struct is still referenced by scripts/check-sysctl-docs but that script is broken anyways as it only supports the register_sysctl_paths() API and not the currently used register_sysctl() one. Fixes: 0199849acd07 ("sysctl: remove register_sysctl_paths()") Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-12-28	sysctl: delete unused define SYSCTL_PERM_EMPTY_DIR	Thomas Weißschuh
	It seems it was never used. Fixes: 2f2665c13af4 ("sysctl: replace child with an enumeration") Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-12-28	fs: fix __sb_write_started() kerneldoc formatting	Vegard Nossum
	When running 'make htmldocs', I see the following warning: Documentation/filesystems/api-summary:14: ./include/linux/fs.h:1659: WARNING: Definition list ends without a blank line; unexpected unindent. The official guidance [1] seems to be to use lists, which will prevent both the "unexpected unindent" warning as well as ensure that each line is formatted on a separate line in the HTML output instead of being all considered a single paragraph. [1]: https://docs.kernel.org/doc-guide/kernel-doc.html#return-values Fixes: 8802e580ee64 ("fs: create __sb_write_started() helper") Cc: Amir Goldstein <amir73il@gmail.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Link: https://lore.kernel.org/r/20231228100608.3123987-1-vegard.nossum@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-12-28	afs: Use the netfs write helpers	David Howells
	Make afs use the netfs write helpers. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28	netfs: Optimise away reads above the point at which there can be no data	David Howells
	Track the file position above which the server is not expected to have any data (the "zero point") and preemptively assume that we can satisfy requests by filling them with zeroes locally rather than attempting to download them if they're over that line - even if we've written data back to the server. Assume that any data that was written back above that position is held in the local cache. Note that we have to split requests that straddle the line. Make use of this to optimise away some reads from the server. We need to set the zero point in the following circumstances: (1) When we see an extant remote inode and have no cache for it, we set the zero_point to i_size. (2) On local inode creation, we set zero_point to 0. (3) On local truncation down, we reduce zero_point to the new i_size if the new i_size is lower. (4) On local truncation up, we don't change zero_point. (5) On local modification, we don't change zero_point. (6) On remote invalidation, we set zero_point to the new i_size. (7) If stored data is discarded from the pagecache or culled from fscache, we must set zero_point above that if the data also got written to the server. (8) If dirty data is written back to the server, but not fscache, we must set zero_point above that. (9) If a direct I/O write is made, set zero_point above that. Assuming the above, any read from the server at or above the zero_point position will return all zeroes. The zero_point value can be stored in the cache, provided the above rules are applied to it by any code that culls part of the local cache. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28	netfs: Implement a write-through caching option	David Howells
	Provide a flag whereby a filesystem may request that cifs_perform_write() perform write-through caching. This involves putting pages directly into writeback rather than dirty and attaching them to a write operation as we go. Further, the writes being made are limited to the byte range being written rather than whole folios being written. This can be used by cifs, for example, to deal with strict byte-range locking. This can't be used with content encryption as that may require expansion of the write RPC beyond the write being made. This doesn't affect writes via mmap - those are written back in the normal way; similarly failed writethrough writes are marked dirty and left to writeback to retry. Another option would be to simply invalidate them, but the contents can be simultaneously accessed by read() and through mmap. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28	netfs: Provide a launder_folio implementation	David Howells
	Provide a launder_folio implementation for netfslib. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28	netfs: Provide a writepages implementation	David Howells
	Provide an implementation of writepages for network filesystems to delegate to. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28	netfs, cachefiles: Pass upper bound length to allow expansion	David Howells
	Make netfslib pass the maximum length to the ->prepare_write() op to tell the cache how much it can expand the length of a write to. This allows a write to the server at the end of a file to be limited to a few bytes whilst writing an entire block to the cache (something required by direct I/O). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28	netfs: Provide netfs_file_read_iter()	David Howells
	Provide a top-level-ish function that can be pointed to directly by ->read_iter file op. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28	netfs: Allow buffered shared-writeable mmap through netfs_page_mkwrite()	David Howells
	Provide an entry point to delegate a filesystem's ->page_mkwrite() to. This checks for conflicting writes, then attached any netfs-specific group marking (e.g. ceph snap) to the page to be considered dirty. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28	netfs: Implement buffered write API	David Howells
	Institute a netfs write helper, netfs_file_write_iter(), to be pointed at by the network filesystem ->write_iter() call. Make it handled buffered writes by calling the previously defined netfs_perform_write() to copy the source data into the pagecache. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28	netfs: Implement unbuffered/DIO write support	David Howells
	Implement support for unbuffered writes and direct I/O writes. If the write is misaligned with respect to the fscrypt block size, then RMW cycles are performed if necessary. DIO writes are a special case of unbuffered writes with extra restriction imposed, such as block size alignment requirements. Also provide a field that can tell the code to add some extra space onto the bounce buffer for use by the filesystem in the case of a content-encrypted file. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28	netfs: Implement unbuffered/DIO read support	David Howells
	Implement support for unbuffered and DIO reads in the netfs library, utilising the existing read helper code to do block splitting and individual queuing. The code also handles extraction of the destination buffer from the supplied iterator, allowing async unbuffered reads to take place. The read will be split up according to the rsize setting and, if supplied, the ->clamp_length() method. Note that the next subrequest will be issued as soon as issue_op returns, without waiting for previous ones to finish. The network filesystem needs to pause or handle queuing them if it doesn't want to fire them all at the server simultaneously. Once all the subrequests have finished, the state will be assessed and the amount of data to be indicated as having being obtained will be determined. As the subrequests may finish in any order, if an intermediate subrequest is short, any further subrequests may be copied into the buffer and then abandoned. In the future, this will also take care of doing an unbuffered read from encrypted content, with the decryption being done by the library. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28	netfs: Make netfs_read_folio() handle streaming-write pages	David Howells
	netfs_read_folio() needs to handle partially-valid pages that are marked dirty, but not uptodate in the event that someone tries to read a page was used to cache data by a streaming write. In such a case, make netfs_read_folio() set up a bvec iterator that points to the parts of the folio that need filling and to a sink page for the data that should be discarded and use that instead of i_pages as the iterator to be written to. This requires netfs_rreq_unlock_folios() to convert the page into a normal dirty uptodate page, getting rid of the partial write record and bumping the group pointer over to folio->private. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28	netfs: Provide func to copy data to pagecache for buffered write	David Howells
	Provide a netfs write helper, netfs_perform_write() to buffer data to be written in the pagecache and mark the modified folios dirty. It will perform "streaming writes" for folios that aren't currently resident, if possible, storing data in partially modified folios that are marked dirty, but not uptodate. It will also tag pages as belonging to fs-specific write groups if so directed by the filesystem. This is derived from generic_perform_write(), but doesn't use ->write_begin() and ->write_end(), having that logic rolled in instead. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28	netfs: Dispatch write requests to process a writeback slice	David Howells
	Dispatch one or more write reqeusts to process a writeback slice, where a slice is tailored more to logical block divisions within the file (such as crypto blocks, an object layout or cache granules) than the protocol RPC maximum capacity. The dispatch doesn't happen until throttling allows, at which point the entire writeback slice is processed and queued. A slice may be written to multiple destinations (one or more servers and the local cache) and the writes to each destination might be split up along different lines. The writeback slice holds the required folios pinned. An iov_iter is provided in netfs_write_request that describes the buffer to be used. This may be part of the pagecache, may have auxiliary padding pages attached or may be a bounce buffer resulting from crypto or compression. Consequently, the filesystem must not twiddle the folio markings directly. The following API is available to the filesystem: (1) The ->create_write_requests() method is called to ask the filesystem to create the requests it needs. This is passed the writeback slice to be processed. (2) The filesystem should then call netfs_create_write_request() to create the requests it needs. (3) Once a request is initialised, netfs_queue_write_request() can be called to dispatch it asynchronously, if not completed immediately. (4) netfs_write_request_completed() should be called to note the completion of a request. (5) netfs_get_write_request() and netfs_put_write_request() are provided to refcount a request. These take constants from the netfs_wreq_trace enum for logging into ftrace. (6) The ->free_write_request is method is called to ask the filesystem to clean up a request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28	netfs: Prep to use folio->private for write grouping and streaming write	David Howells
	Prepare to use folio->private to hold information write grouping and streaming write. These are implemented in the same commit as they both make use of folio->private and will be both checked at the same time in several places. "Write grouping" involves ordering the writeback of groups of writes, such as is needed for ceph snaps. A group is represented by a filesystem-supplied object which must contain a netfs_group struct. This contains just a refcount and a pointer to a destructor. "Streaming write" is the storage of data in folios that are marked dirty, but not uptodate, to avoid unnecessary reads of data. This is represented by a netfs_folio struct. This contains the offset and length of the modified region plus the otherwise displaced write grouping pointer. The way folio->private is multiplexed is: (1) If private is NULL then neither is in operation on a dirty folio. (2) If private is set, with bit 0 clear, then this points to a group. (3) If private is set, with bit 0 set, then this points to a netfs_folio struct (with bit 0 AND'ed out). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28	netfs: Make the refcounting of netfs_begin_read() easier to use	David Howells
	Make the refcounting of netfs_begin_read() easier to use by not eating the caller's ref on the netfs_io_request it's given. This makes it easier to use when we need to look in the request struct after. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org