summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2024-05-03spi: pxa2xx: Move contents of linux/spi/pxa2xx_spi.h to a local oneAndy Shevchenko
There is no user of the linux/spi/pxa2xx_spi.h. Move its contents to the drivers/spi/spi-pxa2xx.h. Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20240417110334.2671228-4-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-03regulator: devres: add API for reference voltage suppliesDavid Lechner
A common use case for regulators is to supply a reference voltage to an analog input or output device. This adds a new devres API to get, enable, and get the voltage in a single call. This allows eliminating boilerplate code in drivers that use reference supplies in this way. Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://lore.kernel.org/r/20240429-regulator-get-enable-get-votlage-v2-1-b1f11ab766c1@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-05-02bdev: move ->bd_make_it_fail to ->__bd_flagsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02bdev: move ->bd_ro_warned to ->__bd_flagsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02bdev: move ->bd_has_subit_bio to ->__bd_flagsAl Viro
In bdev_alloc() we have all flags initialized to false, so assignment to ->bh_has_submit_bio n there is a no-op unless we have partno != 0 and flag already set on entire device. In device_add_disk() we have just allocated the block_device in question and it had been a full-device one, so the flag is guaranteed to be still clear when we get to assignment. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02bdev: move ->bd_write_holder into ->__bd_flagsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02bdev: move ->bd_read_only to ->__bd_flagsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02bdev: infrastructure for flagsAl Viro
Replace bd_partno with a 32bit field (__bd_flags). The lower 8 bits contain the partition number, the upper 24 are for flags. Helpers: bdev_{test,set,clear}_flag(bdev, flag), with atomic_or() and atomic_andnot() used to set/clear. NOTE: this commit does not actually move any flags over there - they are still bool fields. As the result, it shifts the fields wrt cacheline boundaries; that's going to be restored once the first 3 flags are dealt with. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02PCI/ASPM: Consolidate link state definesIlpo Järvinen
The linux/pci.h and aspm.c files define their own sets of link state related defines which are almost the same. Consolidate the use of defines into those defined by linux/pci.h and expand PCIE_LINK_STATE_L0S to match earlier ASPM_STATE_L0S that includes both upstream and downstream bits. Rename also the defines that are internal to aspm.c to start with PCIE_LINK_STATE for consistency. While the PCIE_LINK_STATE_L0S BIT(0) -> (BIT(0) | BIT(1)) transformation is not 1:1, in practice aspm.c already used ASPM_STATE_L0S that has both bits enabled except during mapping. While at it, place the PCIE_LINK_STATE_CLKPM define last to have more logical grouping. Use static_assert() to ensure PCIE_LINK_STATE_L0S is strictly equal to the combination of PCIE_LINK_STATE_L0S_UP/DW. Link: https://lore.kernel.org/r/20240322123952.6384-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-05-02wrapper for access to ->bd_partnoAl Viro
On the next step it's going to get folded into a field where flags will go. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02Use bdev_is_paritition() instead of open-coding itAl Viro
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02set_blocksize(): switch to passing struct file *Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02swapon(2): open swap with O_EXCLAl Viro
... eliminating the need to reopen block devices so they could be exclusively held. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02swapon(2)/swapoff(2): don't bother with block sizeAl Viro
once upon a time that used to matter; these days we do swap IO for swap devices at the level that doesn't give a damn about block size, buffer_head or anything of that sort - just attach the page to bio, set the location and size (the latter to PAGE_SIZE) and feed into queue. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02cxl/cper: Fix non-ACPI-APEI-GHES buildIra Weiny
If ACPI_APEI_GHES is not configured the [un]register work functions are not properly declared. 0day notices that the cxl_cper_register_work() declaration in the CONFIG_ACPI_APEI_GHES=n is broken, fix it to be typical nop stub. Reported-by: kernel test robot <lkp@intel.com> Closes: http://lore.kernel.org/r/202405012230.6kXItWen-lkp@intel.com Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20240501-cper-fix-0day-v1-1-c0b0056eafbc@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: include/linux/filter.h kernel/bpf/core.c 66e13b615a0c ("bpf: verifier: prevent userspace memory access") d503a04f8bc0 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") https://lore.kernel.org/all/20240429114939.210328b0@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-02ovl: implement tmpfileMiklos Szeredi
Combine inode creation with opening a file. There are six separate objects that are being set up: the backing inode, dentry and file, and the overlay inode, dentry and file. Cleanup in case of an error is a bit of a challenge and is difficult to test, so careful review is needed. All tmpfile testcases except generic/509 now run/pass, and no regressions are observed with full xfstests. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
2024-05-02Merge tag 'net-6.9-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf. Relatively calm week, likely due to public holiday in most places. No known outstanding regressions. Current release - regressions: - rxrpc: fix wrong alignmask in __page_frag_alloc_align() - eth: e1000e: change usleep_range to udelay in PHY mdic access Previous releases - regressions: - gro: fix udp bad offset in socket lookup - bpf: fix incorrect runtime stat for arm64 - tipc: fix UAF in error path - netfs: fix a potential infinite loop in extract_user_to_sg() - eth: ice: ensure the copied buf is NUL terminated - eth: qeth: fix kernel panic after setting hsuid Previous releases - always broken: - bpf: - verifier: prevent userspace memory access - xdp: use flags field to disambiguate broadcast redirect - bridge: fix multicast-to-unicast with fraglist GSO - mptcp: ensure snd_nxt is properly initialized on connect - nsh: fix outer header access in nsh_gso_segment(). - eth: bcmgenet: fix racing registers access - eth: vxlan: fix stats counters. Misc: - a bunch of MAINTAINERS file updates" * tag 'net-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits) MAINTAINERS: mark MYRICOM MYRI-10G as Orphan MAINTAINERS: remove Ariel Elior net: gro: add flush check in udp_gro_receive_segment net: gro: fix udp bad offset in socket lookup by adding {inner_}network_offset to napi_gro_cb ipv4: Fix uninit-value access in __ip_make_skb() s390/qeth: Fix kernel panic after setting hsuid vxlan: Pull inner IP header in vxlan_rcv(). tipc: fix a possible memleak in tipc_buf_append tipc: fix UAF in error path rxrpc: Clients must accept conn from any address net: core: reject skb_copy(_expand) for fraglist GSO skbs net: bridge: fix multicast-to-unicast with fraglist GSO mptcp: ensure snd_nxt is properly initialized on connect e1000e: change usleep_range to udelay in PHY mdic access net: dsa: mv88e6xxx: Fix number of databases for 88E6141 / 88E6341 cxgb4: Properly lock TX queue for the selftest. rxrpc: Fix using alignmask being zero for __page_frag_alloc_align() vxlan: Add missing VNI filter counter update in arp_reduce(). vxlan: Fix racy device stats updates. net: qede: use return from qede_parse_actions() ...
2024-05-02string: Add additional __realloc_size() annotations for "dup" helpersKees Cook
Several other "dup"-style interfaces could use the __realloc_size() attribute. (As a reminder to myself and others: "realloc" is used here instead of "alloc" because the "alloc_size" attribute implies that the memory contents are uninitialized. Since we're copying contents into the resulting allocation, it must use "realloc_size" to avoid confusing the compiler's optimization passes.) Add KUnit test coverage where possible. (KUnit still does not have the ability to manipulate userspace memory.) Reviewed-by: Andy Shevchenko <andy@kernel.org> Link: https://lore.kernel.org/r/20240502145218.it.729-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-05-02KVM: Remove kvm_make_all_cpus_request_except()Venkatesh Srinivas
Remove kvm_make_all_cpus_request_except() as it effectively has no users, and arguably should never have been added in the first place. Commit 54163a346d4a ("KVM: Introduce kvm_make_all_cpus_request_except()") added the "except" variation for use in SVM's AVIC update path, which used it to skip sending a request to the current vCPU (commit 7d611233b016 ("KVM: SVM: Disable AVIC before setting V_IRQ")). But the AVIC usage of kvm_make_all_cpus_request_except() was essentially a hack-a-fix that simply squashed the most likely scenario of a racy WARN without addressing the underlying problem(s). Commit f1577ab21442 ("KVM: SVM: svm_set_vintr don't warn if AVIC is active but is about to be deactivated") eventually fixed the WARN itself, and the "except" usage was subsequently dropped by df63202fe52b ("KVM: x86: APICv: drop immediate APICv disablement on current vCPU"). That kvm_make_all_cpus_request_except() hasn't gained any users in the last ~3 years isn't a coincidence. If a VM-wide broadcast *needs* to skip the current vCPU, then odds are very good that there is underlying bug that could be better fixed elsewhere. Signed-off-by: Venkatesh Srinivas <venkateshs@chromium.org> Link: https://lore.kernel.org/r/20240404232651.1645176-1-venkateshs@chromium.org [sean: rewrite changelog with --verbose] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-05-02seq_file: Optimize seq_puts()Christophe JAILLET
Most of seq_puts() usages are done with a string literal. In such cases, the length of the string car be computed at compile time in order to save a strlen() call at run-time. seq_putc() or seq_write() can then be used instead. This saves a few cycles. To have an estimation of how often this optimization triggers: $ git grep seq_puts.*\" | wc -l 3436 $ git grep seq_puts.*\".\" | wc -l 84 Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/a8589bffe4830dafcb9111e22acf06603fea7132.1713781332.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christian Brauner <brauner@kernel.org> The output for seq_putc() generation has also be checked and works.
2024-05-01net: Protect dev->name by seqlock.Kuniyuki Iwashima
We will convert ioctl(SIOCGARP) to RCU, and then we need to copy dev->name which is currently protected by rtnl_lock(). This patch does the following: 1) Add seqlock netdev_rename_lock to protect dev->name 2) Add netdev_copy_name() that copies dev->name to buffer under netdev_rename_lock 3) Use netdev_copy_name() in netdev_get_name() and drop devnet_rename_sem Suggested-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/netdev/CANn89iJEWs7AYSJqGCUABeVqOCTkErponfZdT5kV-iD=-SajnQ@mail.gmail.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240430015813.71143-7-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-01kunit/fortify: Fix replaced failure path to unbreak __alloc_sizeKees Cook
The __alloc_size annotation for kmemdup() was getting disabled under KUnit testing because the replaced fortify_panic macro implementation was using "return NULL" as a way to survive the sanity checking. But having the chance to return NULL invalidated __alloc_size, so kmemdup was not passing the __builtin_dynamic_object_size() tests any more: [23:26:18] [PASSED] fortify_test_alloc_size_kmalloc_const [23:26:19] # fortify_test_alloc_size_kmalloc_dynamic: EXPECTATION FAILED at lib/fortify_kunit.c:265 [23:26:19] Expected __builtin_dynamic_object_size(p, 1) == expected, but [23:26:19] __builtin_dynamic_object_size(p, 1) == -1 (0xffffffffffffffff) [23:26:19] expected == 11 (0xb) [23:26:19] __alloc_size() not working with __bdos on kmemdup("hello there", len, gfp) [23:26:19] [FAILED] fortify_test_alloc_size_kmalloc_dynamic Normal builds were not affected: __alloc_size continued to work there. Use a zero-sized allocation instead, which allows __alloc_size to behave. Fixes: 4ce615e798a7 ("fortify: Provide KUnit counters for failure testing") Fixes: fa4a3f86d498 ("fortify: Add KUnit tests for runtime overflows") Link: https://lore.kernel.org/r/20240501232937.work.532-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-05-01cifs: Implement netfslib hooksDavid Howells
Provide implementation of the netfslib hooks that will be used by netfslib to ask cifs to set up and perform operations. Of particular note are (*) cifs_clamp_length() - This is used to negotiate the size of the next subrequest in a read request, taking into account the credit available and the rsize. The credits are attached to the subrequest. (*) cifs_req_issue_read() - This is used to issue a subrequest that has been set up and clamped. (*) cifs_prepare_write() - This prepares to fill a subrequest by picking a channel, reopening the file and requesting credits so that we can set the maximum size of the subrequest and also sets the maximum number of segments if we're doing RDMA. (*) cifs_issue_write() - This releases any unneeded credits and issues an asynchronous data write for the contiguous slice of file covered by the subrequest. This should possibly be folded in to all ->async_writev() ops and that called directly. (*) cifs_begin_writeback() - This gets the cached writable handle through which we do writeback (this does not affect writethrough, unbuffered or direct writes). At this point, cifs is not wired up to actually *use* netfslib; that will be done in a subsequent patch. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2024-05-01netfs, afs: Use writeback retry to deal with alternate keysDavid Howells
Use a hook in the new writeback code's retry algorithm to rotate the keys once all the outstanding subreqs have failed rather than doing it separately on each subreq. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-05-01netfs: Cut over to using new writeback codeDavid Howells
Cut over to using the new writeback code. The old code is #ifdef'd out or otherwise removed from compilation to avoid conflicts and will be removed in a future patch. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-05-01netfs: New writeback implementationDavid Howells
The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-05-01netfs: Switch to using unsigned long long rather than loff_tDavid Howells
Switch to using unsigned long long rather than loff_t in netfslib to avoid problems with the sign flipping in the maths when we're dealing with the byte at position 0x7fffffffffffffff. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Ilya Dryomov <idryomov@gmail.com> cc: Xiubo Li <xiubli@redhat.com> cc: netfs@lists.linux.dev cc: ceph-devel@vger.kernel.org cc: linux-fsdevel@vger.kernel.org
2024-05-01netfs: Use mempools for allocating requests and subrequestsDavid Howells
Use mempools for allocating requests and subrequests in an effort to make sure that allocation always succeeds so that when performing writeback we can always make progress. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2024-05-01netfs: Remove ->launder_folio() supportDavid Howells
Remove support for ->launder_folio() from netfslib and expect filesystems to use filemap_invalidate_inode() instead. netfs_launder_folio() can then be got rid of. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Steve French <sfrench@samba.org> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-fsdevel@vger.kernel.org cc: netfs@lists.linux.dev cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: devel@lists.orangefs.org
2024-05-01mm: Provide a means of invalidation without using launder_folioDavid Howells
Implement a replacement for launder_folio. The key feature of invalidate_inode_pages2() is that it locks each folio individually, unmaps it to prevent mmap'd accesses interfering and calls the ->launder_folio() address_space op to flush it. This has problems: firstly, each folio is written individually as one or more small writes; secondly, adjacent folios cannot be added so easily into the laundry; thirdly, it's yet another op to implement. Instead, use the invalidate lock to cause anyone wanting to add a folio to the inode to wait, then unmap all the folios if we have mmaps, then, conditionally, use ->writepages() to flush any dirty data back and then discard all pages. The invalidate lock prevents ->read_iter(), ->write_iter() and faulting through mmap all from adding pages for the duration. This is then used from netfslib to handle the flusing in unbuffered and direct writes. Signed-off-by: David Howells <dhowells@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: Miklos Szeredi <miklos@szeredi.hu> cc: Trond Myklebust <trond.myklebust@hammerspace.com> cc: Christoph Hellwig <hch@lst.de> cc: Andrew Morton <akpm@linux-foundation.org> cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Christian Brauner <brauner@kernel.org> cc: Jeff Layton <jlayton@kernel.org> cc: linux-mm@kvack.org cc: linux-fsdevel@vger.kernel.org cc: netfs@lists.linux.dev cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-nfs@vger.kernel.org cc: devel@lists.orangefs.org
2024-05-01Merge remote-tracking branch 'cxl/for-6.10/cper' into cxl-for-nextDave Jiang
Add support to send CPER records to CXL for more detailed parsing.
2024-05-01acpi/ghes: Process CXL Component EventsIra Weiny
BIOS can configure memory devices as firmware first. This will send CXL events to the firmware instead of the OS. The firmware can then inform the OS of these events via UEFI. UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER) format for CXL Component Events. The format is mostly the same as the CXL Common Event Record Format. The difference lies in the use of a GUID as the CPER Section Type which matches the UUID defined in CXL 3.1 Table 8-43. Currently a configuration such as this will trace a non standard event in the log omitting useful details of the event. In addition the CXL sub-system contains additional region and HPA information useful to the user.[0] The CXL code is required to be called from process context as it needs to take a device lock. The GHES code may be in interrupt context. This complicated the use of a callback. Dan Williams suggested the use of work items as an atomic way of switching between the callback execution and a default handler.[1] The use of a kfifo simplifies queue processing by providing lock free fifo operations. cxl_cper_kfifo_get() allows easier management of the kfifo between the ghes and cxl modules. CXL 3.1 Table 8-127 requires a device to have a queue depth of 1 for each of the four event logs. A combined queue depth of 32 is chosen to provide room for 8 entries of each log type. Add GHES support to detect CXL CPER records. Add the ability for the CXL sub-system to register a work queue to process the events. This patch adds back the functionality which was removed to fix the report by Dan Carpenter[2]. Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Suggested-by: Dan Carpenter <dan.carpenter@linaro.org> Suggested-by: Dan Williams <dan.j.williams@intel.com> Link: http://lore.kernel.org/r/cover.1711598777.git.alison.schofield@intel.com [0] Link: http://lore.kernel.org/r/65d111eb87115_6c745294ac@dwillia2-xfh.jf.intel.com.notmuch [1] Link: http://lore.kernel.org/r/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain [2] Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Tested-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Link: https://lore.kernel.org/r/20240426-cxl-cper3-v4-1-58076cce1624@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-05-01Merge tag 'regulator-fix-v6.9-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "There's a few simple driver specific fixes here, plus some core cleanups from Matti which fix issues found with client drivers due to the API being confusing. The two fixes for the stubs provide more constructive behaviour with !REGULATOR configurations, issues were noticed with some hwmon drivers which would otherwise have needed confusing bodges in the users. The irq_helpers fix to duplicate the provided name for the interrupt controller was found because a driver got this wrong and it's again a case where the core is the sensible place to put the fix" * tag 'regulator-fix-v6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: change devm_regulator_get_enable_optional() stub to return Ok regulator: change stubbed devm_regulator_get_enable to return Ok regulator: vqmmc-ipq4019: fix module autoloading regulator: qcom-refgen: fix module autoloading regulator: mt6360: De-capitalize devicetree regulator subnodes regulator: irq_helpers: duplicate IRQ name
2024-05-01mm/slab: make __free(kfree) accept error pointersDan Carpenter
Currently, if an automatically freed allocation is an error pointer that will lead to a crash. An example of this is in wm831x_gpio_dbg_show(). 171 char *label __free(kfree) = gpiochip_dup_line_label(chip, i); 172 if (IS_ERR(label)) { 173 dev_err(wm831x->dev, "Failed to duplicate label\n"); 174 continue; 175 } The auto clean up function should check for error pointers as well, otherwise we're going to keep hitting issues like this. Fixes: 54da6a092431 ("locking: Introduce __cleanup() based infrastructure") Cc: <stable@vger.kernel.org> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-05-01objpool: cache nr_possible_cpus() and avoid caching nr_cpu_idsAndrii Nakryiko
Profiling shows that calling nr_possible_cpus() in objpool_pop() takes a noticeable amount of CPU (when profiled on 80-core machine), as we need to recalculate number of set bits in a CPU bit mask. This number can't change, so there is no point in paying the price for recalculating it. As such, cache this value in struct objpool_head and use it in objpool_pop(). On the other hand, cached pool->nr_cpus isn't necessary, as it's not used in hot path and is also a pretty trivial value to retrieve. So drop pool->nr_cpus in favor of using nr_cpu_ids everywhere. This way the size of struct objpool_head remains the same, which is a nice bonus. Same BPF selftests benchmarks were used to evaluate the effect. Using changes in previous patch (inlining of objpool_pop/objpool_push) as baseline, here are the differences: BASELINE ======== kretprobe : 9.937 ± 0.174M/s kretprobe-multi: 10.440 ± 0.108M/s AFTER ===== kretprobe : 10.106 ± 0.120M/s (+1.7%) kretprobe-multi: 10.515 ± 0.180M/s (+0.7%) Link: https://lore.kernel.org/all/20240424215214.3956041-3-andrii@kernel.org/ Cc: Matt (Qiang) Wu <wuqiang.matt@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-05-01objpool: enable inlining objpool_push() and objpool_pop() operationsAndrii Nakryiko
objpool_push() and objpool_pop() are very performance-critical functions and can be called very frequently in kretprobe triggering path. As such, it makes sense to allow compiler to inline them completely to eliminate function calls overhead. Luckily, their logic is quite well isolated and doesn't have any sprawling dependencies. This patch moves both objpool_push() and objpool_pop() into include/linux/objpool.h and marks them as static inline functions, enabling inlining. To avoid anyone using internal helpers (objpool_try_get_slot, objpool_try_add_slot), rename them to use leading underscores. We used kretprobe microbenchmark from BPF selftests (bench trig-kprobe and trig-kprobe-multi benchmarks) running no-op BPF kretprobe/kretprobe.multi programs in a tight loop to evaluate the effect. BPF own overhead in this case is minimal and it mostly stresses the rest of in-kernel kretprobe infrastructure overhead. Results are in millions of calls per second. This is not super scientific, but shows the trend nevertheless. BEFORE ====== kretprobe : 9.794 ± 0.086M/s kretprobe-multi: 10.219 ± 0.032M/s AFTER ===== kretprobe : 9.937 ± 0.174M/s (+1.5%) kretprobe-multi: 10.440 ± 0.108M/s (+2.2%) Link: https://lore.kernel.org/all/20240424215214.3956041-2-andrii@kernel.org/ Cc: Matt (Qiang) Wu <wuqiang.matt@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-05-01ftrace: make extra rcu_is_watching() validation check optionalAndrii Nakryiko
Introduce CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING config option to control whether ftrace low-level code performs additional rcu_is_watching()-based validation logic in an attempt to catch noinstr violations. This check is expected to never be true and is mostly useful for low-level validation of ftrace subsystem invariants. For most users it should probably be kept disabled to eliminate unnecessary runtime overhead. This improves BPF multi-kretprobe (relying on ftrace and rethook infrastructure) runtime throughput by 2%, according to BPF benchmarks ([0]). [0] https://lore.kernel.org/bpf/CAEf4BzauQ2WKMjZdc9s0rBWa01BYbgwHN6aNDXQSHYia47pQ-w@mail.gmail.com/ Link: https://lore.kernel.org/all/20240418190909.704286-1-andrii@kernel.org/ Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Paul E. McKenney <paulmck@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-05-01fprobe: Add entry/exit callbacks typesJiri Olsa
We are going to store callbacks in following change, so this will ease up the code. Link: https://lore.kernel.org/all/20240207153550.856536-2-jolsa@kernel.org/ Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-05-01Merge branches 'fixes.2024.04.15a', 'misc.2024.04.12a', ↵Uladzislau Rezki (Sony)
'rcu-sync-normal-improve.2024.04.15a', 'rcu-tasks.2024.04.15a' and 'rcutorture.2024.04.15a' into rcu-merge.2024.04.15a fixes.2024.04.15a: RCU fixes misc.2024.04.12a: Miscellaneous fixes rcu-sync-normal-improve.2024.04.15a: Improving synchronize_rcu() call rcu-tasks.2024.04.15a: Tasks RCU updates rcutorture.2024.04.15a: Torture-test updates
2024-05-01Merge tag 'mhi-for-6.10' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi into char-misc-next Manivannan writes: MHI Host ======== - Added a new API mhi_power_down_keep_dev() to not destroy the struct dev associated with the MHI channels during MHI power down. This is useful in scenarios such as system suspend/hibernation where the probability of channels coming back is very high. So the PM maintainer suggested not to destroy the struct dev in those cases. This API is introduced for fixing the failure reported in the ath11k driver during resume from system suspend. NOTE: Due to the API dependency, the patch adding the API is pushed to an immutable branch (mhi-immutable) and merged into both mhi and ath trees. But the merge commit is not visible in mhi tree due to git being smart with 'fast-forward'. - Added an optional sysfs entry to force the MHI devices to enter the Emergency Download (EDL) mode to download the firmware from host. - Added EDL mode support for Qcom SDX75/65/55 modems as per the MHI spec v1.2, Chapter 13.2. This involves writing a cookie to the EDL doorbell registers and then triggering the device reset from host. * tag 'mhi-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi: bus: mhi: host: pci_generic: Add generic edl_trigger to allow devices to enter EDL mode bus: mhi: host: Add a new API for getting channel doorbell offset bus: mhi: host: Add sysfs entry to force device to enter EDL bus: mhi: host: Add mhi_power_down_keep_dev() API to support system suspend/hibernation
2024-04-30net: move sysctl_max_skb_frags to net_hotdataEric Dumazet
sysctl_max_skb_frags is used in TCP and MPTCP fast paths, move it to net_hodata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240429134025.1233626-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-30inet: introduce dst_rtable() helperEric Dumazet
I added dst_rt6_info() in commit e8dfd42c17fa ("ipv6: introduce dst_rt6_info() helper") This patch does a similar change for IPv4. Instead of (struct rtable *)dst casts, we can use : #define dst_rtable(_ptr) \ container_of_const(_ptr, struct rtable, dst) Patch is smaller than IPv6 one, because IPv4 has skb_rtable() helper. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/20240429133009.1227754-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-30Merge remote-tracking branch 'cxl/for-6.10/dpa-to-hpa' into cxl-for-nextDave Jiang
Support for HPA to DPA translation for CXL events cxl_dram and cxl_general_media.
2024-04-30ACPI: Move acpi_blacklisted() declaration to asm/acpi.hKuppuswamy Sathyanarayanan
The function acpi_blacklisted() is defined only when CONFIG_X86 is enabled and is only used by X86 arch code. To align with its usage and definition conditions, move its declaration to asm/acpi.h Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> [ rjw: Added empty code line in a header file ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-30cxl/core: Add region info to cxl_general_media and cxl_dram eventsAlison Schofield
User space may need to know which region, if any, maps the DPAs (device physical addresses) reported in a cxl_general_media or cxl_dram event. Since the mapping can change, the kernel provides this information at the time the event occurs. This informs user space that at event <timestamp> this <region> mapped this <DPA> to this <HPA>. Add the same region info that is included in the cxl_poison trace event: the DPA->HPA translation, region name, and region uuid. The new fields are inserted in the trace event and no existing fields are modified. If the DPA is not mapped, user will see: hpa=ULLONG_MAX, region="", and uuid=0 This work must be protected by dpa_rwsem & region_rwsem since it is looking up region mappings. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/dd8d708b7a7ebfb64a27020a5eb338091336b34d.1714496730.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30powercap: intel_rapl: Introduce APIs for PMU supportZhang Rui
Introduce two new APIs rapl_package_add_pmu()/rapl_package_remove_pmu(). RAPL driver can invoke these APIs to expose its supported energy counters via perf PMU. The new RAPL PMU is fully compatible with current MSR RAPL PMU, including using the same PMU name and events name/id/unit/scale, etc. For example, use below command perf stat -e power/energy-pkg/ -e power/energy-ram/ FOO to get the energy consumption if power/energy-pkg/ and power/energy-ram/ events are available in the "perf list" output. This does not introduce any conflict because TPMI RAPL is the only user of these APIs currently, and it never co-exists with MSR RAPL. Note that RAPL Packages can be probed/removed dynamically, and the events supported by each TPMI RAPL device can be different. Thus the RAPL PMU support is done on demand, which means 1. PMU is registered only if it is needed by a RAPL Package. PMU events for unsupported counters are not exposed. 2. PMU is unregistered and registered when a new RAPL Package is probed and supports new counters that are not supported by current PMU. For example, on a dual-package system using TPMI RAPL, it is possible that Package 1 behaves as TPMI domain root and supports Psys domain. In this case, register PMU without Psys event when probing Package 0, and re-register the PMU with Psys event when probing Package 1. 3. PMU is unregistered when all registered RAPL Packages don't need PMU. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-30cxl/cxl-event: include missing <linux/types.h> and <linux/uuid.h>Sangyun Kim
The linux/cxl-event.h header file uses the u8, u16, and uuid_t types, but it doesn't include the necessary header files, <linux/types.h> and <linux/uuid.h>. Currently, cxl-event.h is only used by drivers/cxl/cxlmem.h, and it doesn't cause any errors because cxlmem.h indirectly includes the required types. However, cxl-event.h may be used by other CXL-related code in the future, so it's important to fix this issue by including the missing header files directly in cxl-event.h. Signed-off-by: Sangyun Kim <sangyun.kim@snu.ac.kr> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20240417035043.2791431-1-sangyun.kim@snu.ac.kr Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30Merge patch series "riscv: Create and document PR_RISCV_SET_ICACHE_FLUSH_CTX ↵Palmer Dabbelt
prctl" Charlie Jenkins <charlie@rivosinc.com> says: Improve the performance of icache flushing by creating a new prctl flag PR_RISCV_SET_ICACHE_FLUSH_CTX. The interface is left generic to allow for future expansions such as with the proposed J extension [1]. Documentation is also provided to explain the use case. Patch sent to add PR_RISCV_SET_ICACHE_FLUSH_CTX to man-pages [2]. [1] https://github.com/riscv/riscv-j-extension [2] https://lore.kernel.org/linux-man/20240124-fencei_prctl-v1-1-0bddafcef331@rivosinc.com * b4-shazam-merge: cpumask: Add assign cpu documentation: Document PR_RISCV_SET_ICACHE_FLUSH_CTX prctl riscv: Include riscv_set_icache_flush_ctx prctl riscv: Remove unnecessary irqflags processor.h include Link: https://lore.kernel.org/r/20240312-fencei-v13-0-4b6bdc2bbf32@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-30kunit/fortify: Add memcpy() testsKees Cook
Add fortify tests for memcpy() and memmove(). This can use a similar method to the fortify_panic() replacement, only we can do it for what was the WARN_ONCE(), which can be redefined. Since this is primarily testing the fortify behaviors of the memcpy() and memmove() defenses, the tests for memcpy() and memmove() are identical. Link: https://lore.kernel.org/r/20240429194342.2421639-3-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org>