linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2024-09-12	docs: filesystems: corrected grammar of netfs page	Dennis Lam
	Fixed the word "aren't" to "isn't" based on singular word "bufferage". Signed-off-by: Dennis Lam <dennis.lamerice@gmail.com> Link: https://lore.kernel.org/r/20240912012550.13748-2-dennis.lamerice@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	Merge branch 'netfs-writeback' of ↵	Christian Brauner
	ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into vfs.netfs Merge patch series "netfs: Read/write improvements" from David Howells <dhowells@redhat.com>. * 'netfs-writeback' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (25 commits) cifs: Don't support ITER_XARRAY cifs: Switch crypto buffer to use a folio_queue rather than an xarray cifs: Use iterate_and_advance*() routines directly for hashing netfs: Cancel dirty folios that have no storage destination cachefiles, netfs: Fix write to partial block at EOF netfs: Remove fs/netfs/io.c netfs: Speed up buffered reading afs: Make read subreqs async netfs: Simplify the writeback code netfs: Provide an iterator-reset function netfs: Use new folio_queue data type and iterator instead of xarray iter cifs: Provide the capability to extract from ITER_FOLIOQ to RDMA SGEs iov_iter: Provide copy_folio_from_iter() mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios netfs: Use bh-disabling spinlocks for rreq->lock netfs: Set the request work function upon allocation netfs: Remove NETFS_COPY_TO_CACHE netfs: Reserve netfs_sreq_source 0 as unset/unknown netfs: Move max_len/max_nr_segs from netfs_io_subrequest to netfs_io_stream netfs, cifs: Move CIFS_INO_MODIFIED_ATTR to netfs_inode ... Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	cifs: Don't support ITER_XARRAY	David Howells
	There's now no need to support ITER_XARRAY in cifs as netfslib hands down ITER_FOLIOQ instead - and that's simpler to use with iterate_and_advance() as it doesn't hold the RCU read lock over the step function. This is part of the process of phasing out ITER_XARRAY. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Tom Talpey <tom@talpey.com> cc: Enzo Matsumiya <ematsumiya@suse.de> cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-26-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	cifs: Switch crypto buffer to use a folio_queue rather than an xarray	David Howells
	Switch cifs from using an xarray to hold the transport crypto buffer to using a folio_queue and use ITER_FOLIOQ rather than ITER_XARRAY. This is part of the process of phasing out ITER_XARRAY. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Tom Talpey <tom@talpey.com> cc: Enzo Matsumiya <ematsumiya@suse.de> cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-25-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	cifs: Use iterate_and_advance*() routines directly for hashing	David Howells
	Replace the bespoke cifs iterators of ITER_BVEC and ITER_KVEC to do hashing with iterate_and_advance_kernel() - a variant on iterate_and_advance() that only supports kernel-internal ITER_* types and not UBUF/IOVEC types. The bespoke ITER_XARRAY is left because we don't really want to be calling crypto_shash_update() under the RCU read lock for large amounts of data; besides, ITER_XARRAY is going to be phased out. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Tom Talpey <tom@talpey.com> cc: Enzo Matsumiya <ematsumiya@suse.de> cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-24-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	netfs: Cancel dirty folios that have no storage destination	David Howells
	Kafs wants to be able to cache the contents of directories (and symlinks), but whilst these are downloaded from the server with the FS.FetchData RPC op and similar, the same as for regular files, they can't be updated by FS.StoreData, but rather have special operations (FS.MakeDir, etc.). Now, rather than redownloading a directory's content after each change made to that directory, kafs modifies the local blob. This blob can be saved out to the cache, and since it's using netfslib, kafs just marks the folios dirty and lets ->writepages() on the directory take care of it, as for an regular file. This is fine as long as there's a cache as although the upload stream is disabled, there's a cache stream to drive the procedure. But if the cache goes away in the meantime, suddenly there's no way do any writes and the code gets confused, complains "R=%x: No submit" to dmesg and leaves the dirty folio hanging. Fix this by just cancelling the store of the folio if neither stream is active. (If there's no cache at the time of dirtying, we should just not mark the folio dirty). Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-23-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	cachefiles, netfs: Fix write to partial block at EOF	David Howells
	Because it uses DIO writes, cachefiles is unable to make a write to the backing file if that write is not aligned to and sized according to the backing file's DIO block alignment. This makes it tricky to handle a write to the cache where the EOF on the network file is not correctly aligned. To get around this, netfslib attempts to tell the driver it is calling how much more data there is available beyond the EOF that it can use to pad the write (netfslib preclears the part of the folio above the EOF). However, it tries to tell the cache what the maximum length is, but doesn't calculate this correctly; and, in any case, cachefiles actually ignores the value and just skips the block. Fix this by: (1) Change the value passed to indicate the amount of extra data that can be added to the operation (now ->submit_extendable_to). This is much simpler to calculate as it's just the end of the folio minus the top of the data within the folio - rather than having to account for data spread over multiple folios. (2) Make cachefiles add some of this data if the subrequest it is given ends at the network file's i_size if the extra data is sufficient to pad out to a whole block. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-22-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	netfs: Remove fs/netfs/io.c	David Howells
	Remove fs/netfs/io.c as it is no longer used. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-21-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	netfs: Speed up buffered reading	David Howells
	Improve the efficiency of buffered reads in a number of ways: (1) Overhaul the algorithm in general so that it's a lot more compact and split the read submission code between buffered and unbuffered versions. The unbuffered version can be vastly simplified. (2) Read-result collection is handed off to a work queue rather than being done in the I/O thread. Multiple subrequests can be processes simultaneously. (3) When a subrequest is collected, any folios it fully spans are collected and "spare" data on either side is donated to either the previous or the next subrequest in the sequence. Notes: () Readahead expansion is massively slows down fio, presumably because it causes a load of extra allocations, both folio and xarray, up front before RPC requests can be transmitted. () RDMA with cifs does appear to work, both with SIW and RXE. (*) PG_private_2-based reading and copy-to-cache is split out into its own file and altered to use folio_queue. Note that the copy to the cache now creates a new write transaction against the cache and adds the folios to be copied into it. This allows it to use part of the writeback I/O code. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-20-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	afs: Make read subreqs async	David Howells
	Perform AFS read subrequests in a work item rather than in the calling thread. For normal buffered reads, this will allow the calling thread to copy data from the pagecache to the application at the same time as the demarshalling thread is shovelling data from skbuffs into the pagecache. This will also allow the RA mark to trigger a new read before we've finished shovelling the data from the current one. Note: This would be a bit safer if the FS.FetchData RPC ops returned the metadata (including the data version number) before returning the data. This would allow me to flush the pagecache before installing the new data. In future, it may be possible to asynchronously flush the pagecache either side of the region being read. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-19-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	netfs: Simplify the writeback code	David Howells
	Use the new folio_queue structures to simplify the writeback code. The problem with referring to the i_pages xarray directly is that we may have gaps in the sequence of folios we're writing from that we need to skip when we're removing the writeback mark from the folios we're writing back from. At the moment the code tries to deal with this by carefully tracking the gaps in each writeback stream (eg. write to server and write to cache) and divining when there's a gap that spans folios (something that's not helped by folios not being a consistent size). Instead, the folio_queue buffer contains pointers only the folios we're dealing with, has them in ascending order and indicates a gap by placing non-consequitive folios next to each other. This makes it possible to track where we need to clean up to by just keeping track of where we've processed to on each stream and taking the minimum. Note that the I/O iterator is always rounded up to the end of the folio, even if that is beyond the EOF position, so that the cache can do DIO from the page. The excess space is cleared, though mmapped writes clobber it. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-18-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	netfs: Provide an iterator-reset function	David Howells
	Provide a function to reset the iterator on a subrequest. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-17-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	netfs: Use new folio_queue data type and iterator instead of xarray iter	David Howells
	Make the netfs write-side routines use the new folio_queue struct to hold a rolling buffer of folios, with the issuer adding folios at the tail and the collector removing them from the head as they're processed instead of using an xarray. This will allow a subsequent patch to simplify the write collector. The primary mark (as tested by folioq_is_marked()) is used to note if the corresponding folio needs putting. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-16-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	cifs: Provide the capability to extract from ITER_FOLIOQ to RDMA SGEs	David Howells
	Make smb_extract_iter_to_rdma() extract page fragments from an ITER_FOLIOQ iterator into RDMA SGEs. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Tom Talpey <tom@talpey.com> cc: Enzo Matsumiya <ematsumiya@suse.de> cc: linux-cifs@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-15-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	iov_iter: Provide copy_folio_from_iter()	David Howells
	Provide a copy_folio_from_iter() wrapper. Signed-off-by: David Howells <dhowells@redhat.com> cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Christian Brauner <christian@brauner.io> cc: Matthew Wilcox <willy@infradead.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/20240814203850.2240469-14-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios	David Howells
	Define a data structure, struct folio_queue, to represent a sequence of folios and a kernel-internal I/O iterator type, ITER_FOLIOQ, to allow a list of folio_queue structures to be used to provide a buffer to iov_iter-taking functions, such as sendmsg and recvmsg. The folio_queue structure looks like: struct folio_queue { struct folio_batch vec; u8 orders[PAGEVEC_SIZE]; struct folio_queue next; struct folio_queue prev; unsigned long marks; unsigned long marks2; }; It does not use a list_head so that next and/or prev can be set to NULL at the ends of the list, allowing iov_iter-handling routines to determine that they are the ends without needing to store a head pointer in the iov_iter struct. A folio_batch struct is used to hold the folio pointers which allows the batch to be passed to batch handling functions. Two mark bits are available per slot. The intention is to use at least one of them to mark folios that need putting, but that might not be ultimately necessary. Accessor functions are used to access the slots to do the masking and an additional accessor function is used to indicate the size of the array. The order of each folio is also stored in the structure to avoid the need for iov_iter_advance() and iov_iter_revert() to have to query each folio to find its size. With careful barriering, this can be used as an extending buffer with new folios inserted and new folio_queue structs added without the need for a lock. Further, provided we always keep at least one struct in the buffer, we can also remove consumed folios and consumed structs from the head end as we without the need for locks. [Questions/thoughts] (1) To manage this, I need a head pointer, a tail pointer, a tail slot number (assuming insertion happens at the tail end and the next pointers point from head to tail). Should I put these into a struct of their own, say "folio_queue_head" or "rolling_buffer"? I will end up with two of these in netfs_io_request eventually, one keeping track of the pagecache I'm dealing with for buffered I/O and the other to hold a bounce buffer when we need one. (2) Should I make the slots {folio,off,len} or bio_vec? (3) This is intended to replace ITER_XARRAY eventually. Using an xarray in I/O iteration requires the taking of the RCU read lock, doing copying under the RCU read lock, walking the xarray (which may change under us), handling retries and dealing with special values. The advantage of ITER_XARRAY is that when we're dealing with the pagecache directly, we don't need any allocation - but if we're doing encrypted comms, there's a good chance we'd be using a bounce buffer anyway. This will require afs, erofs, cifs, orangefs and fscache to be converted to not use this. afs still uses it for dirs and symlinks; some of erofs usages should be easy to change, but there's one which won't be so easy; ceph's use via fscache can be fixed by porting ceph to netfslib; cifs is using xarray as a bounce buffer - that can be moved to use sheaves instead; and orangefs has a similar problem to erofs - maybe orangefs could use netfslib? Signed-off-by: David Howells <dhowells@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: Jeff Layton <jlayton@kernel.org> cc: Steve French <sfrench@samba.org> cc: Ilya Dryomov <idryomov@gmail.com> cc: Gao Xiang <xiang@kernel.org> cc: Mike Marshall <hubcap@omnibond.com> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org cc: linux-afs@lists.infradead.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: linux-erofs@lists.ozlabs.org cc: devel@lists.orangefs.org Link: https://lore.kernel.org/r/20240814203850.2240469-13-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	uidgid: make sure we fit into one cacheline	Christian Brauner
	When I expanded uidgid mappings I intended for a struct uid_gid_map to fit into a single cacheline on x86 as they tend to be pretty performance sensitive (idmapped mounts etc). But a 4 byte hole was added that brought it over 64 bytes. Fix that by adding the static extent array and the extent counter into a substruct. C's type punning for unions guarantees that we can access ->nr_extents even if the last written to member wasn't within the same object. This is also what we rely on in struct_group() and friends. This of course relies on non-strict aliasing which we don't do. 99) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). Link: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2310.pdf Link: https://lore.kernel.org/r/20240910-work-uid_gid_map-v1-1-e6bc761363ed@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	Merge patch series "file: remove f_version"	Christian Brauner
	Christian Brauner <brauner@kernel.org> says: The f_version member in struct file isn't particularly well-defined. It is mainly used as a cookie to detect concurrent seeks when iterating directories. But it is also abused by some subsystems for completely unrelated things. It is mostly a directory specific thing that doesn't really need to live in struct file and with its wonky semantics it really lacks a specific function. For pipes, f_version is (ab)used to defer poll notifications until a write has happened. And struct pipe_inode_info is used by multiple struct files in their ->private_data so there's no chance of pushing that down into file->private_data without introducing another pointer indirection. But this should be a solvable problem. Only regular files with FMODE_ATOMIC_POS and directories require f_pos_lock. Pipes and other files don't. So this adds a union into struct file encompassing f_pos_lock and a pipe specific f_pipe member that pipes can use. This union of course can be extended to other file types and is similar to what we do in struct inode already. * patches from https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-0-6d3e4816aa7b@kernel.org: fs: remove f_version pipe: use f_pipe fs: add f_pipe ubifs: store cookie in private data ufs: store cookie in private data udf: store cookie in private data proc: store cookie in private data ocfs2: store cookie in private data input: remove f_version abuse ext4: store cookie in private data ext2: store cookie in private data affs: store cookie in private data fs: add generic_llseek_cookie() fs: use must_set_pos() fs: add must_set_pos() fs: add vfs_setpos_cookie() s390: remove unused f_version ceph: remove unused f_version adi: remove unused f_version file: remove pointless comment Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-0-6d3e4816aa7b@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	fs: remove f_version	Christian Brauner
	Now that detecting concurrent seeks is done by the filesystems that require it we can remove f_version and free up 8 bytes for future extensions. Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-20-6d3e4816aa7b@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	pipe: use f_pipe	Christian Brauner
	Pipes use f_version to defer poll notifications until a write has been observed. Since multiple file's refer to the same struct pipe_inode_info in their ->private_data moving it into their isn't feasible since we would need to introduce an additional pointer indirection. However, since pipes don't require f_pos_lock we placed a new f_pipe member into a union with f_pos_lock that pipes can use. This is similar to what we already do for struct inode where we have additional fields per file type. This will allow us to fully remove f_version in the next step. Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-19-6d3e4816aa7b@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	fs: add f_pipe	Christian Brauner
	Only regular files with FMODE_ATOMIC_POS and directories need f_pos_lock. Place a new f_pipe member in a union with f_pos_lock that they can use and make them stop abusing f_version in follow-up patches. Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-18-6d3e4816aa7b@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	ubifs: store cookie in private data	Christian Brauner
	Store the cookie to detect concurrent seeks on directories in file->private_data. Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-17-6d3e4816aa7b@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	ufs: store cookie in private data	Christian Brauner
	Store the cookie to detect concurrent seeks on directories in file->private_data. Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-16-6d3e4816aa7b@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	udf: store cookie in private data	Christian Brauner
	Store the cookie to detect concurrent seeks on directories in file->private_data. Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-15-6d3e4816aa7b@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	proc: store cookie in private data	Christian Brauner
	Store the cookie to detect concurrent seeks on directories in file->private_data. Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-14-6d3e4816aa7b@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	ocfs2: store cookie in private data	Christian Brauner
	Store the cookie to detect concurrent seeks on directories in file->private_data. Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-13-6d3e4816aa7b@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	input: remove f_version abuse	Christian Brauner
	f_version is removed from struct file. Make input stop abusing f_version for stashing information for poll. Move the input state counter into input_seq_state and allocate it via seq_private_open() and free via seq_release_private(). Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-12-6d3e4816aa7b@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12	Merge patch series "can: m_can: fix struct net_device_ops::{open,stop} ↵	Marc Kleine-Budde
	callbacks under high bus load" Marc Kleine-Budde <mkl@pengutronix.de> says: Under high CAN-bus load the struct net_device_ops::{open,stop} callbacks (m_can_open(), m_can_close()) don't properly start and shutdown the device. Fix the problems by re-arranging the order of functions in m_can_open() and m_can_close(). Link: https://patch.msgid.link/20240910-can-m_can-fix-ifup-v3-0-6c1720ba45ce@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-09-12	can: m_can: m_can_close(): stop clocks after device has been shut down	Marc Kleine-Budde
	After calling m_can_stop() an interrupt may be pending or NAPI might still be executed. This means the driver might still touch registers of the IP core after the clocks have been disabled. This is not good practice and might lead to aborts depending on the SoC integration. To avoid these potential problems, make m_can_close() symmetric to m_can_open(), i.e. stop the clocks at the end, right before shutting down the transceiver. Fixes: e0d1f4816f2a ("can: m_can: add Bosch M_CAN controller support") Link: https://patch.msgid.link/20240910-can-m_can-fix-ifup-v3-2-6c1720ba45ce@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-09-12	can: m_can: enable NAPI before enabling interrupts	Jake Hamby
	If an interrupt (RX-complete or error flag) is set when bringing up the CAN device, e.g. due to CAN bus traffic before initializing the device, when m_can_start() is called and interrupts are enabled, m_can_isr() is called immediately, which disables all CAN interrupts and calls napi_schedule(). Because napi_enable() isn't called until later in m_can_open(), the call to napi_schedule() never schedules the m_can_poll() callback and the device is left with interrupts disabled and can't receive any CAN packets until rebooted. This can be verified by running "cansend" from another device before setting the bitrate and calling "ip link set up can0" on the test device. Adding debug lines to m_can_isr() shows it's called with flags (IR_EP \| IR_EW \| IR_CRCE), which calls m_can_disable_all_interrupts() and napi_schedule(), and then m_can_poll() is never called. Move the call to napi_enable() above the call to m_can_start() to enable any initial interrupt flags to be handled by m_can_poll() so that interrupts are reenabled. Add a call to napi_disable() in the error handling section of m_can_open(), to handle the case where later functions return errors. Also, in m_can_close(), move the call to napi_disable() below the call to m_can_stop() to ensure all interrupts are handled when bringing down the device. This race condition is much less likely to occur. Tested on a Microchip SAMA7G54 MPU. The fix should be applicable to any SoC with a Bosch M_CAN controller. Signed-off-by: Jake Hamby <Jake.Hamby@Teledyne.com> Fixes: e0d1f4816f2a ("can: m_can: add Bosch M_CAN controller support") Link: https://patch.msgid.link/20240910-can-m_can-fix-ifup-v3-1-6c1720ba45ce@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-09-12	can: kvaser_pciefd: Enable 64-bit DMA addressing	Martin Jocic
	Enabling 64-bit addressing for DMA buffers will prevent issues on some memory constrained platforms like e.g. Raspberry Pi 5, where the driver won't load because it cannot allocate enough continuous memory in the default 32-bit memory address range. Signed-off-by: Martin Jocic <martin.jocic@kvaser.com> Link: https://patch.msgid.link/d7340f78e3db305bfeeb8229d2dd1c9077e10b92.1725875278.git.martin.jocic@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-09-12	can: esd_usb: Remove CAN_CTRLMODE_3_SAMPLES for CAN-USB/3-FD	Stefan Mätje
	Remove the CAN_CTRLMODE_3_SAMPLES announcement for CAN-USB/3-FD devices because these devices don't support it. The hardware has a Microchip SAM E70 microcontroller that uses a Bosch MCAN IP core as CAN FD controller. But this MCAN core doesn't support triple sampling. Fixes: 80662d943075 ("can: esd_usb: Add support for esd CAN-USB/3") Cc: stable@vger.kernel.org Signed-off-by: Stefan Mätje <stefan.maetje@esd.eu> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Link: https://patch.msgid.link/20240904222740.2985864-2-stefan.maetje@esd.eu Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-09-12	can: bcm: Clear bo->bcm_proc_read after remove_proc_entry().	Kuniyuki Iwashima
	syzbot reported a warning in bcm_release(). [0] The blamed change fixed another warning that is triggered when connect() is issued again for a socket whose connect()ed device has been unregistered. However, if the socket is just close()d without the 2nd connect(), the remaining bo->bcm_proc_read triggers unnecessary remove_proc_entry() in bcm_release(). Let's clear bo->bcm_proc_read after remove_proc_entry() in bcm_notify(). [0] name '4986' WARNING: CPU: 0 PID: 5234 at fs/proc/generic.c:711 remove_proc_entry+0x2e7/0x5d0 fs/proc/generic.c:711 Modules linked in: CPU: 0 UID: 0 PID: 5234 Comm: syz-executor606 Not tainted 6.11.0-rc5-syzkaller-00178-g5517ae241919 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 RIP: 0010:remove_proc_entry+0x2e7/0x5d0 fs/proc/generic.c:711 Code: ff eb 05 e8 cb 1e 5e ff 48 8b 5c 24 10 48 c7 c7 e0 f7 aa 8e e8 2a 38 8e 09 90 48 c7 c7 60 3a 1b 8c 48 89 de e8 da 42 20 ff 90 <0f> 0b 90 90 48 8b 44 24 18 48 c7 44 24 40 0e 36 e0 45 49 c7 04 07 RSP: 0018:ffffc9000345fa20 EFLAGS: 00010246 RAX: 2a2d0aee2eb64600 RBX: ffff888032f1f548 RCX: ffff888029431e00 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffc9000345fb08 R08: ffffffff8155b2f2 R09: 1ffff1101710519a R10: dffffc0000000000 R11: ffffed101710519b R12: ffff888011d38640 R13: 0000000000000004 R14: 0000000000000000 R15: dffffc0000000000 FS: 0000000000000000(0000) GS:ffff8880b8800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fcfb52722f0 CR3: 000000000e734000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> bcm_release+0x250/0x880 net/can/bcm.c:1578 __sock_release net/socket.c:659 [inline] sock_close+0xbc/0x240 net/socket.c:1421 __fput+0x24a/0x8a0 fs/file_table.c:422 task_work_run+0x24f/0x310 kernel/task_work.c:228 exit_task_work include/linux/task_work.h:40 [inline] do_exit+0xa2f/0x27f0 kernel/exit.c:882 do_group_exit+0x207/0x2c0 kernel/exit.c:1031 __do_sys_exit_group kernel/exit.c:1042 [inline] __se_sys_exit_group kernel/exit.c:1040 [inline] __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1040 x64_sys_call+0x2634/0x2640 arch/x86/include/generated/asm/syscalls_64.h:232 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fcfb51ee969 Code: Unable to access opcode bytes at 0x7fcfb51ee93f. RSP: 002b:00007ffce0109ca8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fcfb51ee969 RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001 RBP: 00007fcfb526f3b0 R08: ffffffffffffffb8 R09: 0000555500000000 R10: 0000555500000000 R11: 0000000000000246 R12: 00007fcfb526f3b0 R13: 0000000000000000 R14: 00007fcfb5271ee0 R15: 00007fcfb51bf160 </TASK> Fixes: 76fe372ccb81 ("can: bcm: Remove proc entry when dev is unregistered.") Reported-by: syzbot+0532ac7a06fb1a03187e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=0532ac7a06fb1a03187e Tested-by: syzbot+0532ac7a06fb1a03187e@syzkaller.appspotmail.com Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Link: https://patch.msgid.link/20240905012237.79683-1-kuniyu@amazon.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-09-12	Merge branch kvm-arm64/visibility-cleanups into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/visibility-cleanups: : . : Remove REG_HIDDEN_USER from the sysreg infrastructure, making things : a little more simple. From the cover letter: : : "Since 4d4f52052ba8 ("KVM: arm64: nv: Drop EL12 register traps that are : redirected to VNCR") and the admission that KVM would never be supporting : the original FEAT_NV, REG_HIDDEN_USER only had a few users, all of which : could either be replaced by a more ad-hoc mechanism, or removed altogether." : . KVM: arm64: Get rid of REG_HIDDEN_USER visibility qualifier KVM: arm64: Simplify visibility handling of AArch32 SPSR_* KVM: arm64: Simplify handling of CNTKCTL_EL12 Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-12	Merge branch kvm-arm64/s2-ptdump into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/s2-ptdump: : . : Stage-2 page table dumper, reusing the main ptdump infrastructure, : courtesy of Sebastian Ene. From the cover letter: : : "This series extends the ptdump support to allow dumping the guest : stage-2 pagetables. When CONFIG_PTDUMP_STAGE2_DEBUGFS is enabled, ptdump : registers the new following files under debugfs: : - /sys/debug/kvm/<guest_id>/stage2_page_tables : - /sys/debug/kvm/<guest_id>/stage2_levels : - /sys/debug/kvm/<guest_id>/ipa_range : : This allows userspace tools (eg. cat) to dump the stage-2 pagetables by : reading the 'stage2_page_tables' file. : [...]" : . KVM: arm64: Register ptdump with debugfs on guest creation arm64: ptdump: Don't override the level when operating on the stage-2 tables arm64: ptdump: Use the ptdump description from a local context arm64: ptdump: Expose the attribute parsing functionality KVM: arm64: Move pagetable definitions to common header Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-12	Merge branch kvm-arm64/nv-at-pan into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/nv-at-pan: : . : Add NV support for the AT family of instructions, which mostly results : in adding a page table walker that deals with most of the complexity : of the architecture. : : From the cover letter: : : "Another task that a hypervisor supporting NV on arm64 has to deal with : is to emulate the AT instruction, because we multiplex all the S1 : translations on a single set of registers, and the guest S2 is never : truly resident on the CPU. : : So given that we lie about page tables, we also have to lie about : translation instructions, hence the emulation. Things are made : complicated by the fact that guest S1 page tables can be swapped out, : and that our shadow S2 is likely to be incomplete. So while using AT : to emulate AT is tempting (and useful), it is not going to always : work, and we thus need a fallback in the shape of a SW S1 walker." : . KVM: arm64: nv: Add support for FEAT_ATS1A KVM: arm64: nv: Plumb handling of AT S1* traps from EL2 KVM: arm64: nv: Make AT+PAN instructions aware of FEAT_PAN3 KVM: arm64: nv: Sanitise SCTLR_EL1.EPAN according to VM configuration KVM: arm64: nv: Add SW walker for AT S1 emulation KVM: arm64: nv: Make ps_to_output_size() generally available KVM: arm64: nv: Add emulation of AT S12E{0,1}{R,W} KVM: arm64: nv: Add basic emulation of AT S1E2{R,W} KVM: arm64: nv: Add basic emulation of AT S1E1{R,W}P KVM: arm64: nv: Add basic emulation of AT S1E{0,1}{R,W} KVM: arm64: nv: Honor absence of FEAT_PAN2 KVM: arm64: nv: Turn upper_attr for S2 walk into the full descriptor KVM: arm64: nv: Enforce S2 alignment when contiguous bit is set arm64: Add ESR_ELx_FSC_ADDRSZ_L() helper arm64: Add system register encoding for PSTATE.PAN arm64: Add PAR_EL1 field description arm64: Add missing APTable and TCR_ELx.HPD masks KVM: arm64: Make kvm_at() take an OP_AT_* Signed-off-by: Marc Zyngier <maz@kernel.org> # Conflicts: # arch/arm64/kvm/nested.c
2024-09-12	Merge branch kvm-arm64/selftests-6.12 into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/selftests-6.12: : . : KVM/arm64 selftest updates for 6.12 : : - Check for a bunch of timer emulation corner cases (COlton Lewis) : . KVM: arm64: selftests: Add arch_timer_edge_cases selftest KVM: arm64: selftests: Ensure pending interrupts are handled in arch_timer test Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-12	Merge branch kvm-arm64/vgic-sre-traps into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/vgic-sre-traps: : . : Fix the multiple of cases where KVM/arm64 doesn't correctly : handle the guest trying to use a GICv3 that isn't advertised. : : From the cover letter: : : "It recently appeared that, when running on a GICv3-equipped platform : (which is what non-ancient arm64 HW has), not configuring a GICv3 : for the guest could result in less than desirable outcomes. : : We have multiple issues to fix: : : - for registers that always trap (the SGI registers) or that may : trap (the SRE register), we need to check whether a GICv3 has been : instantiated before acting upon the trap. : : - for registers that only conditionally trap, we must actively trap : them even in the absence of a GICv3 being instantiated, and handle : those traps accordingly. : : - finally, ID registers must reflect the absence of a GICv3, so that : we are consistent. : : This series goes through all these requirements. The main complexity : here is to apply a GICv3 configuration on the host in the absence of a : GICv3 in the guest. This is pretty hackish, but I don't have a much : better solution so far. : : As part of making wider use of of the trap bits, we fully define the : trap routing as per the architecture, something that we eventually : need for NV anyway." : . KVM: arm64: selftests: Cope with lack of GICv3 in set_id_regs KVM: arm64: Add selftest checking how the absence of GICv3 is handled KVM: arm64: Unify UNDEF injection helpers KVM: arm64: Make most GICv3 accesses UNDEF if they trap KVM: arm64: Honor guest requested traps in GICv3 emulation KVM: arm64: Add trap routing information for ICH_HCR_EL2 KVM: arm64: Add ICH_HCR_EL2 to the vcpu state KVM: arm64: Zero ID_AA64PFR0_EL1.GIC when no GICv3 is presented to the guest KVM: arm64: Add helper for last ditch idreg adjustments KVM: arm64: Force GICv3 trap activation when no irqchip is configured on VHE KVM: arm64: Force SRE traps when SRE access is not enabled KVM: arm64: Move GICv3 trap configuration to kvm_calculate_traps() Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-12	Merge branch kvm-arm64/fpmr into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/fpmr: : . : Add FP8 support to the KVM/arm64 floating point handling. : : This includes new ID registers (ID_AA64PFR2_EL1 ID_AA64FPFR0_EL1) : being made visible to guests, as well as a new confrol register : (FPMR) which gets context-switched. : . KVM: arm64: Expose ID_AA64PFR2_EL1 to userspace and guests KVM: arm64: Enable FP8 support when available and configured KVM: arm64: Expose ID_AA64FPFR0_EL1 as a writable ID reg KVM: arm64: Honor trap routing for FPMR KVM: arm64: Add save/restore support for FPMR KVM: arm64: Move FPMR into the sysreg array KVM: arm64: Add predicate for FPMR support in a VM KVM: arm64: Move SVCR into the sysreg array Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-12	Merge branch kvm-arm64/mmu-misc-6.12 into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/mmu-misc-6.12: : . : Various minor MMU improvements and bug-fixes: : : - Prevent MTE tags being restored by userspace if we are actively : logging writes, as that's a recipe for disaster : : - Correct the refcount on a page that is not considered for MTE : tag copying (such as a device) : : - When walking a page table to split blocks, keep the DSB at the end : the walk, as there is no need to perform it on every store. : : - Fix boundary check when transfering memory using FFA : . KVM: arm64: Add memory length checks and remove inline in do_ffa_mem_xfer KVM: arm64: Disallow copying MTE to guest memory while KVM is dirty logging KVM: arm64: Release pfn, i.e. put page, if copying MTE tags hits ZONE_DEVICE KVM: arm64: Move data barrier to end of split walk Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-09-12	iommu/amd: Test for PAGING domains before freeing a domain	Jason Gunthorpe
	This domain free function can be called for IDENTITY and SVA domains too, and they don't have page tables. For now protect against this by checking the type. Eventually the different types should have their own free functions. Fixes: 485534bfccb2 ("iommu/amd: Remove conditions from domain free paths") Reported-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/0-v1-ad9884ee5f5b+da-amd_iopgtbl_fix_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-09-12	iommu/amd: Fix argument order in amd_iommu_dev_flush_pasid_all()	Eliav Bar-ilan
	An incorrect argument order calling amd_iommu_dev_flush_pasid_pages() causes improper flushing of the IOMMU, leaving the old value of GCR3 from a previous process attached to the same PASID. The function has the signature: void amd_iommu_dev_flush_pasid_pages(struct iommu_dev_data *dev_data, ioasid_t pasid, u64 address, size_t size) Correct the argument order. Cc: stable@vger.kernel.org Fixes: 474bf01ed9f0 ("iommu/amd: Add support for device based TLB invalidation") Signed-off-by: Eliav Bar-ilan <eliavb@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/0-v1-fc6bc37d8208+250b-amd_pasid_flush_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-09-12	dma-mapping: reliably inform about DMA support for IOMMU	Leon Romanovsky
	If the DMA IOMMU path is going to be used, the appropriate check should return that DMA is supported. Fixes: b5c58b2fdc42 ("dma-mapping: direct calls for dma-iommu") Closes: https://lore.kernel.org/all/181e06ff-35a3-434f-b505-672f430bd1cb@notapiano Reported-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> #KernelCI Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-09-12	i2c: qcom-geni: Use IRQF_NO_AUTOEN flag in request_irq()	Jinjie Ruan
	disable_irq() after request_irq() still has a time gap in which interrupts can come. request_irq() with IRQF_NO_AUTOEN flag will disable IRQ auto-enable when request IRQ. Fixes: 37692de5d523 ("i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Cc: <stable@vger.kernel.org> # v4.19+ Acked-by: Mukesh Kumar Savaliya <quic_msavaliy@quicinc.com> Reviewed-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-09-12	ALSA: ump: Use %*ph to print small buffer	Andy Shevchenko
	Use %*ph format to print small buffer as hex string. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20240911195039.2885979-1-andriy.shevchenko@linux.intel.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-09-12	xen/xenbus: Convert to use ERR_CAST()	Shen Lichuan
	Use ERR_CAST() as it is designed for casting an error pointer to another type. This macro utilizes the __force and __must_check modifiers, which instruct the compiler to verify for errors at the locations where it is employed. Signed-off-by: Shen Lichuan <shenlichuan@vivo.com> Reviewed-by: Juergen Gross <jgross@suse.com> Message-ID: <20240829084710.30312-1-shenlichuan@vivo.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-12	xen, pvh: fix unbootable VMs by inlining memset() in xen_prepare_pvh()	Alexey Dobriyan
	If this memset() is not inlined than PVH early boot code can call into KASAN-instrumented memset() which results in unbootable VMs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Juergen Gross <jgross@suse.com> Message-ID: <20240802154253.482658-3-adobriyan@gmail.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-12	x86/cpu: fix unbootable VMs by inlining memcmp() in hypervisor_cpuid_base()	Alexey Dobriyan
	If this memcmp() is not inlined then PVH early boot code can call into KASAN-instrumented memcmp() which results in unbootable VMs: pvh_start_xen xen_prepare_pvh xen_cpuid_base hypervisor_cpuid_base memcmp Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Juergen Gross <jgross@suse.com> Message-ID: <20240802154253.482658-2-adobriyan@gmail.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-12	xen, pvh: fix unbootable VMs (PVH + KASAN - AMD_MEM_ENCRYPT)	Alexey Dobriyan
	Uninstrument arch/x86/platform/pvh/enlighten.c: KASAN has not been setup _this_ early in the boot process. Steps to reproduce: make allnoconfig make sure CONFIG_AMD_MEM_ENCRYPT is disabled AMD_MEM_ENCRYPT independently uninstruments lib/string.o so PVH boot code calls into uninstrumented memset() and memcmp() which can make the bug disappear depending on the compiler. enable CONFIG_PVH enable CONFIG_KASAN enable serial console this is fun exercise if you never done it from nothing :^) make qemu-system-x86_64 \ -enable-kvm \ -cpu host \ -smp cpus=1 \ -m 4096 \ -serial stdio \ -kernel vmlinux \ -append 'console=ttyS0 ignore_loglevel' Messages on serial console will easily tell OK kernel from unbootable kernel. In bad case qemu hangs in an infinite loop stroboscoping "SeaBIOS" message. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Juergen Gross <jgross@suse.com> Message-ID: <20240802154253.482658-1-adobriyan@gmail.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-09-12	xen: tolerate ACPI NVS memory overlapping with Xen allocated memory	Juergen Gross
	In order to minimize required special handling for running as Xen PV dom0, the memory layout is modified to match that of the host. This requires to have only RAM at the locations where Xen allocated memory is living. Unfortunately there seem to be some machines, where ACPI NVS is located at 64 MB, resulting in a conflict with the loaded kernel or the initial page tables built by Xen. Avoid this conflict by swapping the ACPI NVS area in the memory map with unused RAM. This is possible via modification of the dom0 P2M map. Accesses to the ACPI NVS area are done either for saving and restoring it across suspend operations (this will work the same way as before), or by ACPI code when NVS memory is referenced from other ACPI tables. The latter case is handled by a Xen specific indirection of acpi_os_ioremap(). While the E820 map can (and should) be modified right away, the P2M map can be updated only after memory allocation is working, as the P2M map might need to be extended. Fixes: 808fdb71936c ("xen: check for kernel memory conflicting with memory layout") Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>