summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2022-12-13Merge tag 'landlock-6.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux Pull landlock updates from Mickaël Salaün: "This adds file truncation support to Landlock, contributed by Günther Noack. As described by Günther [1], the goal of these patches is to work towards a more complete coverage of file system operations that are restrictable with Landlock. The known set of currently unsupported file system operations in Landlock is described at [2]. Out of the operations listed there, truncate is the only one that modifies file contents, so these patches should make it possible to prevent the direct modification of file contents with Landlock. The new LANDLOCK_ACCESS_FS_TRUNCATE access right covers both the truncate(2) and ftruncate(2) families of syscalls, as well as open(2) with the O_TRUNC flag. This includes usages of creat() in the case where existing regular files are overwritten. Additionally, this introduces a new Landlock security blob associated with opened files, to track the available Landlock access rights at the time of opening the file. This is in line with Unix's general approach of checking the read and write permissions during open(), and associating this previously checked authorization with the opened file. An ongoing patch documents this use case [3]. In order to treat truncate(2) and ftruncate(2) calls differently in an LSM hook, we split apart the existing security_path_truncate hook into security_path_truncate (for truncation by path) and security_file_truncate (for truncation of previously opened files)" Link: https://lore.kernel.org/r/20221018182216.301684-1-gnoack3000@gmail.com [1] Link: https://www.kernel.org/doc/html/v6.1/userspace-api/landlock.html#filesystem-flags [2] Link: https://lore.kernel.org/r/20221209193813.972012-1-mic@digikod.net [3] * tag 'landlock-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: samples/landlock: Document best-effort approach for LANDLOCK_ACCESS_FS_REFER landlock: Document Landlock's file truncation support samples/landlock: Extend sample tool to support LANDLOCK_ACCESS_FS_TRUNCATE selftests/landlock: Test ftruncate on FDs created by memfd_create(2) selftests/landlock: Test FD passing from restricted to unrestricted processes selftests/landlock: Locally define __maybe_unused selftests/landlock: Test open() and ftruncate() in multiple scenarios selftests/landlock: Test file truncation support landlock: Support file truncation landlock: Document init_layer_masks() helper landlock: Refactor check_access_path_dual() into is_access_to_paths_allowed() security: Create file_truncate hook from path_truncate hook
2022-12-13Merge tag 'configfs-6.2-2022-12-13' of ↵Linus Torvalds
git://git.infradead.org/users/hch/configfs Pull configfs updates from Christoph Hellwig: - fix a memory leak in configfs_create_dir (Chen Zhongjin) - remove mentions of committable items that were implemented (Bartosz Golaszewski) * tag 'configfs-6.2-2022-12-13' of git://git.infradead.org/users/hch/configfs: configfs: remove mentions of committable items configfs: fix possible memory leak in configfs_create_dir()
2022-12-13Merge tag 'nfs-for-6.2-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust "Bugfixes: - Fix NULL pointer dereference in the mount parser - Fix memory stomp in decode_attr_security_label - Fix credential leak in _nfs4_discover_trunking() - Fix buffer leak in rpcrdma_req_create() - Fix leaked socket in rpc_sockname() - Fix deadlock between nfs4_open_recover_helper() and delegreturn - Fix an Oops in nfs_d_automount() - Fix potential race in nfs_call_unlink() - Multiple fixes for the open context mode - NFSv4.2 READ_PLUS fixes - Fix a regression in which small rsize/wsize values are being forbidden - Fail client initialisation if the NFSv4.x state manager thread can't run - Avoid spurious warning of lost lock that is being unlocked. - Ensure the initialisation of struct nfs4_label Features and cleanups: - Trigger the "ls -l" readdir heuristic sooner - Clear the file access cache upon login to ensure supplementary group info is in sync between the client and server - pnfs: Fix up the logging of layout stateids - NFSv4.2: Change the default KConfig value for READ_PLUS - Use sysfs_emit() instead of scnprintf() where appropriate" * tag 'nfs-for-6.2-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (24 commits) NFSv4.2: Change the default KConfig value for READ_PLUS NFSv4.x: Fail client initialisation if state manager thread can't run fs: nfs: sysfs: use sysfs_emit() to instead of scnprintf() NFS: use sysfs_emit() to instead of scnprintf() NFS: Allow very small rsize & wsize again NFSv4.2: Fix up READ_PLUS alignment NFSv4.2: Set the correct size scratch buffer for decoding READ_PLUS SUNRPC: Fix missing release socket in rpc_sockname() xprtrdma: Fix regbuf data not freed in rpcrdma_req_create() NFS: avoid spurious warning of lost lock that is being unlocked. nfs: fix possible null-ptr-deref when parsing param NFSv4: check FMODE_EXEC from open context mode in nfs4_opendata_access() NFS: make sure open context mode have FMODE_EXEC when file open for exec NFS4.x/pnfs: Fix up logging of layout stateids NFS: Fix a race in nfs_call_unlink() NFS: Fix an Oops in nfs_d_automount() NFSv4: Fix a deadlock between nfs4_open_recover_helper() and delegreturn NFSv4: Fix a credential leak in _nfs4_discover_trunking() NFS: Trigger the "ls -l" readdir heuristic sooner NFSv4.2: Fix initialisation of struct nfs4_label ...
2022-12-12Merge tag 'nfsd-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "This release introduces support for the CB_RECALL_ANY operation. NFSD can send this operation to request that clients return any delegations they choose. The server uses this operation to handle low memory scenarios or indicate to a client when that client has reached the maximum number of delegations the server supports. The NFSv4.2 READ_PLUS operation has been simplified temporarily whilst support for sparse files in local filesystems and the VFS is improved. Two major data structure fixes appear in this release: - The nfs4_file hash table is replaced with a resizable hash table to reduce the latency of NFSv4 OPEN operations. - Reference counting in the NFSD filecache has been hardened against races. In furtherance of removing support for NFSv2 in a subsequent kernel release, a new Kconfig option enables server-side support for NFSv2 to be left out of a kernel build. MAINTAINERS has been updated to indicate that changes to fs/exportfs should go through the NFSD tree" * tag 'nfsd-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (49 commits) NFSD: Avoid clashing function prototypes SUNRPC: Fix crasher in unwrap_integ_data() SUNRPC: Make the svc_authenticate tracepoint conditional NFSD: Use only RQ_DROPME to signal the need to drop a reply SUNRPC: Clean up xdr_write_pages() SUNRPC: Don't leak netobj memory when gss_read_proxy_verf() fails NFSD: add CB_RECALL_ANY tracepoints NFSD: add delegation reaper to react to low memory condition NFSD: add support for sending CB_RECALL_ANY NFSD: refactoring courtesy_client_reaper to a generic low memory shrinker trace: Relocate event helper files NFSD: pass range end to vfs_fsync_range() instead of count lockd: fix file selection in nlmsvc_cancel_blocked lockd: ensure we use the correct file descriptor when unlocking lockd: set missing fl_flags field when retrieving args NFSD: Use struct_size() helper in alloc_session() nfsd: return error if nfs4_setacl fails lockd: set other missing fields when unlocking files NFSD: Add an nfsd_file_fsync tracepoint sunrpc: svc: Remove an unused static function svc_ungetu32() ...
2022-12-12Merge tag 'for-6.2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "This round there are a lot of cleanups and moved code so the diffstat looks huge, otherwise there are some nice performance improvements and an update to raid56 reliability. User visible features: - raid56 reliability vs performance trade off: - fix destructive RMW for raid5 data (raid6 still needs work): do full checksum verification for all data during RMW cycle, this should prevent rewriting potentially corrupted data without notice - stripes are cached in memory which should reduce the performance impact but still can hurt some workloads - checksums are verified after repair again - this is the last option without introducing additional features (write intent bitmap, journal, another tree), the extra checksum read/verification was supposed to be avoided by the original implementation exactly for performance reasons but that caused all the reliability problems - discard=async by default for devices that support it - implement emergency flush reserve to avoid almost all unnecessary transaction aborts due to ENOSPC in cases where there are too many delayed refs or delayed allocation - skip block group synchronization if there's no change in used bytes, can reduce transaction commit count for some workloads Performance improvements: - fiemap and lseek: - overall speedup due to skipping unnecessary or duplicate searches (-40% run time) - cache some data structures and sharedness of extents (-30% run time) - send: - faster backref resolution when finding clones - cached leaf to root mapping for faster backref walking - improved clone/sharing detection - overall run time improvements (-70%) Core: - module initialization converted to a table of function pointers run in a sequence - preparation for fscrypt, extend passing file names across calls, dir item can store encryption status - raid56 updates: - more accurate error tracking of sectors within stripe - simplify recovery path and remove dedicated endio worker kthread - simplify scrub call paths - refactoring to support the extra data checksum verification during RMW cycle - tree block parentness checks consolidated and done at metadata read time - improved error handling - cleanups: - move a lot of code for better synchronization between kernel and user space sources, split big files - enum cleanups - GFP flag cleanups - header file cleanups, prototypes, dependencies - redundant parameter cleanups - inline extent handling simplifications - inode parameter conversion - data structure cleanups, reductions, renames, merges" * tag 'for-6.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (249 commits) btrfs: print transaction aborted messages with an error level btrfs: sync some cleanups from progs into uapi/btrfs.h btrfs: do not BUG_ON() on ENOMEM when dropping extent items for a range btrfs: fix extent map use-after-free when handling missing device in read_one_chunk btrfs: remove outdated logic from overwrite_item() and add assertion btrfs: unify overwrite_item() and do_overwrite_item() btrfs: replace strncpy() with strscpy() btrfs: fix uninitialized variable in find_first_clear_extent_bit btrfs: fix uninitialized parent in insert_state btrfs: add might_sleep() annotations btrfs: add stack helpers for a few btrfs items btrfs: add nr_global_roots to the super block definition btrfs: remove BTRFS_LEAF_DATA_OFFSET btrfs: add helpers for manipulating leaf items and data btrfs: add eb to btrfs_node_key_ptr_offset btrfs: pass the extent buffer for the btrfs_item_nr helpers btrfs: move the csum helpers into ctree.h btrfs: move eb offset helpers into extent_io.h btrfs: move file_extent_item helpers into file-item.h btrfs: move leaf_data_end into ctree.c ...
2022-12-12Merge tag 'dlm-6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "These patches include the usual cleanups and minor fixes, the removal of code that is no longer needed due to recent improvements, and improvements to processing large volumes of messages during heavy locking activity. Summary: - Misc code cleanup - Fix a couple of socket handling bugs: a double release on an error path and a data-ready race in an accept loop - Remove code for resending dir-remove messages. This code is no longer needed since the midcomms layer now ensures the messages are resent if needed - Add tracepoints for dlm messages - Improve callback queueing by replacing the fixed array with a list - Simplify the handling of a remove message followed by a lookup message by sending both without releasing a spinlock in between - Improve the concurrency of sending and receiving messages by holding locks for a shorter time, and changing how workqueues are used - Remove old code for shutting down sockets, which is no longer needed with the reliable connection handling that was recently added" * tag 'dlm-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: (37 commits) fs: dlm: fix building without lockdep fs: dlm: parallelize lowcomms socket handling fs: dlm: don't init error value fs: dlm: use saved sk_error_report() fs: dlm: use sock2con without checking null fs: dlm: remove dlm_node_addrs lookup list fs: dlm: don't put dlm_local_addrs on heap fs: dlm: cleanup listen sock handling fs: dlm: remove socket shutdown handling fs: dlm: use listen sock as dlm running indicator fs: dlm: use list_first_entry_or_null fs: dlm: remove twice INIT_WORK fs: dlm: add midcomms init/start functions fs: dlm: add dst nodeid for msg tracing fs: dlm: rename seq to h_seq for msg tracing fs: dlm: rename DLM_IFL_NEED_SCHED to DLM_IFL_CB_PENDING fs: dlm: ast do WARN_ON_ONCE() on hotpath fs: dlm: drop lkb ref in bug case fs: dlm: avoid false-positive checker warning fs: dlm: use WARN_ON_ONCE() instead of WARN_ON() ...
2022-12-12Merge tag 'fuse-update-6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse update from Miklos Szeredi: - Allow some write requests to proceed in parallel - Fix a performance problem with allow_sys_admin_access - Add a special kind of invalidation that doesn't immediately purge submounts - On revalidation treat the target of rename(RENAME_NOREPLACE) the same as open(O_EXCL) - Use type safe helpers for some mnt_userns transformations * tag 'fuse-update-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: Rearrange fuse_allow_current_process checks fuse: allow non-extending parallel direct writes on the same file fuse: remove the unneeded result variable fuse: port to vfs{g,u}id_t and associated helpers fuse: Remove user_ns check for FUSE_DEV_IOC_CLONE fuse: always revalidate rename target dentry fuse: add "expire only" mode to FUSE_NOTIFY_INVAL_ENTRY fs/fuse: Replace kmap() with kmap_local_page()
2022-12-12Merge tag 'erofs-for-6.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "In this cycle, large folios are now enabled in the iomap/fscache mode for uncompressed files first. In order to do that, we've also cleaned up better interfaces between erofs and fscache, which are acked by fscache/netfs folks and included in this pull request. Other than that, there are random fixes around erofs over fscache and crafted images by syzbot, minor cleanups and documentation updates. Summary: - Enable large folios for iomap/fscache mode - Avoid sysfs warning due to mounting twice with the same fsid and domain_id in fscache mode - Refine fscache interface among erofs, fscache, and cachefiles - Use kmap_local_page() only for metabuf - Fixes around crafted images found by syzbot - Minor cleanups and documentation updates" * tag 'erofs-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: validate the extent length for uncompressed pclusters erofs: fix missing unmap if z_erofs_get_extent_compressedlen() fails erofs: Fix pcluster memleak when its block address is zero erofs: use kmap_local_page() only for erofs_bread() erofs: enable large folios for fscache mode erofs: support large folios for fscache mode erofs: switch to prepare_ondemand_read() in fscache mode fscache,cachefiles: add prepare_ondemand_read() callback erofs: clean up cached I/O strategies erofs: update documentation erofs: check the uniqueness of fsid in shared domain in advance erofs: enable large folios for iomap mode
2022-12-12Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscryptLinus Torvalds
Pull fscrypt updates from Eric Biggers: "This release adds SM4 encryption support, contributed by Tianjia Zhang. SM4 is a Chinese block cipher that is an alternative to AES. I recommend against using SM4, but (according to Tianjia) some people are being required to use it. Since SM4 has been turning up in many other places (crypto API, wireless, TLS, OpenSSL, ARMv8 CPUs, etc.), it hasn't been very controversial, and some people have to use it, I don't think it would be fair for me to reject this optional feature. Besides the above, there are a couple cleanups" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fscrypt: add additional documentation for SM4 support fscrypt: remove unused Speck definitions fscrypt: Add SM4 XTS/CTS symmetric algorithm support blk-crypto: Add support for SM4-XTS blk crypto mode fscrypt: add comment for fscrypt_valid_enc_modes_v1() fscrypt: pass super_block to fscrypt_put_master_key_activeref()
2022-12-12Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "A large number of cleanups and bug fixes, with many of the bug fixes found by Syzbot and fuzzing. (Many of the bug fixes involve less-used ext4 features such as fast_commit, inline_data and bigalloc) In addition, remove the writepage function for ext4, since the medium-term plan is to remove ->writepage() entirely. (The VM doesn't need or want writepage() for writeback, since it is fine with ->writepages() so long as ->migrate_folio() is implemented)" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (58 commits) ext4: fix reserved cluster accounting in __es_remove_extent() ext4: fix inode leak in ext4_xattr_inode_create() on an error path ext4: allocate extended attribute value in vmalloc area ext4: avoid unaccounted block allocation when expanding inode ext4: initialize quota before expanding inode in setproject ioctl ext4: stop providing .writepage hook mm: export buffer_migrate_folio_norefs() ext4: switch to using write_cache_pages() for data=journal writeout jbd2: switch jbd2_submit_inode_data() to use fs-provided hook for data writeout ext4: switch to using ext4_do_writepages() for ordered data writeout ext4: move percpu_rwsem protection into ext4_writepages() ext4: provide ext4_do_writepages() ext4: add support for writepages calls that cannot map blocks ext4: drop pointless IO submission from ext4_bio_write_page() ext4: remove nr_submitted from ext4_bio_write_page() ext4: move keep_towrite handling to ext4_bio_write_page() ext4: handle redirtying in ext4_bio_write_page() ext4: fix kernel BUG in 'ext4_write_inline_data_end()' ext4: make ext4_mb_initialize_context return void ext4: fix deadlock due to mbcache entry corruption ...
2022-12-12Merge tag 'fs.idmapped.mnt_idmap.v6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull idmapping updates from Christian Brauner: "Last cycle we've already made the interaction with idmapped mounts more robust and type safe by introducing the vfs{g,u}id_t type. This cycle we concluded the conversion and removed the legacy helpers. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem - with namespaces that are relevent on the mount level. Especially for filesystem developers without detailed knowledge in this area this can be a potential source for bugs. Instead of passing the plain namespace we introduce a dedicated type struct mnt_idmap and replace the pointer with a pointer to a struct mnt_idmap. There are no semantic or size changes for the mount struct caused by this. We then start converting all places aware of idmapped mounts to rely on struct mnt_idmap. Once the conversion is done all helpers down to the really low-level make_vfs{g,u}id() and from_vfs{g,u}id() will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two removing and thus eliminating the possibility of any bugs. Fwiw, I fixed some issues in that area a while ago in ntfs3 and ksmbd in the past. Afterwards only low-level code can ultimately use the associated namespace for any permission checks. Even most of the vfs can be completely obivious about this ultimately and filesystems will never interact with it in any form in the future. A struct mnt_idmap currently encompasses a simple refcount and pointer to the relevant namespace the mount is idmapped to. If a mount isn't idmapped then it will point to a static nop_mnt_idmap and if it doesn't that it is idmapped. As usual there are no allocations or anything happening for non-idmapped mounts. Everthing is carefully written to be a nop for non-idmapped mounts as has always been the case. If an idmapped mount is created a struct mnt_idmap is allocated and a reference taken on the relevant namespace. Each mount that gets idmapped or inherits the idmap simply bumps the reference count on struct mnt_idmap. Just a reminder that we only allow a mount to change it's idmapping a single time and only if it hasn't already been attached to the filesystems and has no active writers. The actual changes are fairly straightforward but this will have huge benefits for maintenance and security in the long run even if it causes some churn. Note that this also makes it possible to extend struct mount_idmap in the future. For example, it would be possible to place the namespace pointer in an anonymous union together with an idmapping struct. This would allow us to expose an api to userspace that would let it specify idmappings directly instead of having to go through the detour of setting up namespaces at all" * tag 'fs.idmapped.mnt_idmap.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: acl: conver higher-level helpers to rely on mnt_idmap fs: introduce dedicated idmap type for mounts
2022-12-12Merge tag 'fs.vfsuid.conversion.v6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull vfsuid updates from Christian Brauner: "Last cycle we introduced the vfs{g,u}id_t types and associated helpers to gain type safety when dealing with idmapped mounts. That initial work already converted a lot of places over but there were still some left, This converts all remaining places that still make use of non-type safe idmapping helpers to rely on the new type safe vfs{g,u}id based helpers. Afterwards it removes all the old non-type safe helpers" * tag 'fs.vfsuid.conversion.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: fs: remove unused idmapping helpers ovl: port to vfs{g,u}id_t and associated helpers fuse: port to vfs{g,u}id_t and associated helpers ima: use type safe idmapping helpers apparmor: use type safe idmapping helpers caps: use type safe idmapping helpers fs: use type safe idmapping helpers mnt_idmapping: add missing helpers
2022-12-12Merge tag 'fs.ovl.setgid.v6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull setgid inheritance updates from Christian Brauner: "This contains the work to make setgid inheritance consistent between modifying a file and when changing ownership or mode as this has been a repeated source of very subtle bugs. The gist is that we perform the same permission checks in the write path as we do in the ownership and mode changing paths after this series where we're currently doing different things. We've already made setgid inheritance a lot more consistent and reliable in the last releases by moving setgid stripping from the individual filesystems up into the vfs. This aims to make the logic even more consistent and easier to understand and also to fix long-standing overlayfs setgid inheritance bugs. Miklos was nice enough to just let me carry the trivial overlayfs patches from Amir too. Below is a more detailed explanation how the current difference in setgid handling lead to very subtle bugs exemplified via overlayfs which is a victim of the current rules. I hope this explains why I think taking the regression risk here is worth it. A long while ago I found a few setgid inheritance bugs in overlayfs in the write path in certain conditions. Amir recently picked this back up in [1] and I jumped on board to fix this more generally. On the surface all that overlayfs would need to fix setgid inheritance would be to call file_remove_privs() or file_modified() but actually that isn't enough because the setgid inheritance api is wildly inconsistent in that area. Before this pr setgid stripping in file_remove_privs()'s old should_remove_suid() helper was inconsistent with other parts of the vfs. Specifically, it only raises ATTR_KILL_SGID if the inode is S_ISGID and S_IXGRP but not if the inode isn't in the caller's groups and the caller isn't privileged over the inode although we require this already in setattr_prepare() and setattr_copy() and so all filesystem implement this requirement implicitly because they have to use setattr_{prepare,copy}() anyway. But the inconsistency shows up in setgid stripping bugs for overlayfs in xfstests (e.g., generic/673, generic/683, generic/685, generic/686, generic/687). For example, we test whether suid and setgid stripping works correctly when performing various write-like operations as an unprivileged user (fallocate, reflink, write, etc.): echo "Test 1 - qa_user, non-exec file $verb" setup_testfile chmod a+rws $junk_file commit_and_check "$qa_user" "$verb" 64k 64k The test basically creates a file with 6666 permissions. While the file has the S_ISUID and S_ISGID bits set it does not have the S_IXGRP set. On a regular filesystem like xfs what will happen is: sys_fallocate() -> vfs_fallocate() -> xfs_file_fallocate() -> file_modified() -> __file_remove_privs() -> dentry_needs_remove_privs() -> should_remove_suid() -> __remove_privs() newattrs.ia_valid = ATTR_FORCE | kill; -> notify_change() -> setattr_copy() In should_remove_suid() we can see that ATTR_KILL_SUID is raised unconditionally because the file in the test has S_ISUID set. But we also see that ATTR_KILL_SGID won't be set because while the file is S_ISGID it is not S_IXGRP (see above) which is a condition for ATTR_KILL_SGID being raised. So by the time we call notify_change() we have attr->ia_valid set to ATTR_KILL_SUID | ATTR_FORCE. Now notify_change() sees that ATTR_KILL_SUID is set and does: ia_valid = attr->ia_valid |= ATTR_MODE attr->ia_mode = (inode->i_mode & ~S_ISUID); which means that when we call setattr_copy() later we will definitely update inode->i_mode. Note that attr->ia_mode still contains S_ISGID. Now we call into the filesystem's ->setattr() inode operation which will end up calling setattr_copy(). Since ATTR_MODE is set we will hit: if (ia_valid & ATTR_MODE) { umode_t mode = attr->ia_mode; vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode); if (!vfsgid_in_group_p(vfsgid) && !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID)) mode &= ~S_ISGID; inode->i_mode = mode; } and since the caller in the test is neither capable nor in the group of the inode the S_ISGID bit is stripped. But assume the file isn't suid then ATTR_KILL_SUID won't be raised which has the consequence that neither the setgid nor the suid bits are stripped even though it should be stripped because the inode isn't in the caller's groups and the caller isn't privileged over the inode. If overlayfs is in the mix things become a bit more complicated and the bug shows up more clearly. When e.g., ovl_setattr() is hit from ovl_fallocate()'s call to file_remove_privs() then ATTR_KILL_SUID and ATTR_KILL_SGID might be raised but because the check in notify_change() is questioning the ATTR_KILL_SGID flag again by requiring S_IXGRP for it to be stripped the S_ISGID bit isn't removed even though it should be stripped: sys_fallocate() -> vfs_fallocate() -> ovl_fallocate() -> file_remove_privs() -> dentry_needs_remove_privs() -> should_remove_suid() -> __remove_privs() newattrs.ia_valid = ATTR_FORCE | kill; -> notify_change() -> ovl_setattr() /* TAKE ON MOUNTER'S CREDS */ -> ovl_do_notify_change() -> notify_change() /* GIVE UP MOUNTER'S CREDS */ /* TAKE ON MOUNTER'S CREDS */ -> vfs_fallocate() -> xfs_file_fallocate() -> file_modified() -> __file_remove_privs() -> dentry_needs_remove_privs() -> should_remove_suid() -> __remove_privs() newattrs.ia_valid = attr_force | kill; -> notify_change() The fix for all of this is to make file_remove_privs()'s should_remove_suid() helper perform the same checks as we already require in setattr_prepare() and setattr_copy() and have notify_change() not pointlessly requiring S_IXGRP again. It doesn't make any sense in the first place because the caller must calculate the flags via should_remove_suid() anyway which would raise ATTR_KILL_SGID Note that some xfstests will now fail as these patches will cause the setgid bit to be lost in certain conditions for unprivileged users modifying a setgid file when they would've been kept otherwise. I think this risk is worth taking and I explained and mentioned this multiple times on the list [2]. Enforcing the rules consistently across write operations and chmod/chown will lead to losing the setgid bit in cases were it might've been retained before. While I've mentioned this a few times but it's worth repeating just to make sure that this is understood. For the sake of maintainability, consistency, and security this is a risk worth taking. If we really see regressions for workloads the fix is to have special setgid handling in the write path again with different semantics from chmod/chown and possibly additional duct tape for overlayfs. I'll update the relevant xfstests with if you should decide to merge this second setgid cleanup. Before that people should be aware that there might be failures for fstests where unprivileged users modify a setgid file" Link: https://lore.kernel.org/linux-fsdevel/20221003123040.900827-1-amir73il@gmail.com [1] Link: https://lore.kernel.org/linux-fsdevel/20221122142010.zchf2jz2oymx55qi@wittgenstein [2] * tag 'fs.ovl.setgid.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: fs: use consistent setgid checks in is_sxid() ovl: remove privs in ovl_fallocate() ovl: remove privs in ovl_copyfile() attr: use consistent sgid stripping checks attr: add setattr_should_drop_sgid() fs: move should_remove_suid() attr: add in_group_or_capable()
2022-12-12Merge tag 'fs.acl.rework.v6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull VFS acl updates from Christian Brauner: "This contains the work that builds a dedicated vfs posix acl api. The origins of this work trace back to v5.19 but it took quite a while to understand the various filesystem specific implementations in sufficient detail and also come up with an acceptable solution. As we discussed and seen multiple times the current state of how posix acls are handled isn't nice and comes with a lot of problems: The current way of handling posix acls via the generic xattr api is error prone, hard to maintain, and type unsafe for the vfs until we call into the filesystem's dedicated get and set inode operations. It is already the case that posix acls are special-cased to death all the way through the vfs. There are an uncounted number of hacks that operate on the uapi posix acl struct instead of the dedicated vfs struct posix_acl. And the vfs must be involved in order to interpret and fixup posix acls before storing them to the backing store, caching them, reporting them to userspace, or for permission checking. Currently a range of hacks and duct tape exist to make this work. As with most things this is really no ones fault it's just something that happened over time. But the code is hard to understand and difficult to maintain and one is constantly at risk of introducing bugs and regressions when having to touch it. Instead of continuing to hack posix acls through the xattr handlers this series builds a dedicated posix acl api solely around the get and set inode operations. Going forward, the vfs_get_acl(), vfs_remove_acl(), and vfs_set_acl() helpers must be used in order to interact with posix acls. They operate directly on the vfs internal struct posix_acl instead of abusing the uapi posix acl struct as we currently do. In the end this removes all of the hackiness, makes the codepaths easier to maintain, and gets us type safety. This series passes the LTP and xfstests suites without any regressions. For xfstests the following combinations were tested: - xfs - ext4 - btrfs - overlayfs - overlayfs on top of idmapped mounts - orangefs - (limited) cifs There's more simplifications for posix acls that we can make in the future if the basic api has made it. A few implementation details: - The series makes sure to retain exactly the same security and integrity module permission checks. Especially for the integrity modules this api is a win because right now they convert the uapi posix acl struct passed to them via a void pointer into the vfs struct posix_acl format to perform permission checking on the mode. There's a new dedicated security hook for setting posix acls which passes the vfs struct posix_acl not a void pointer. Basing checking on the posix acl stored in the uapi format is really unreliable. The vfs currently hacks around directly in the uapi struct storing values that frankly the security and integrity modules can't correctly interpret as evidenced by bugs we reported and fixed in this area. It's not necessarily even their fault it's just that the format we provide to them is sub optimal. - Some filesystems like 9p and cifs need access to the dentry in order to get and set posix acls which is why they either only partially or not even at all implement get and set inode operations. For example, cifs allows setxattr() and getxattr() operations but doesn't allow permission checking based on posix acls because it can't implement a get acl inode operation. Thus, this patch series updates the set acl inode operation to take a dentry instead of an inode argument. However, for the get acl inode operation we can't do this as the old get acl method is called in e.g., generic_permission() and inode_permission(). These helpers in turn are called in various filesystem's permission inode operation. So passing a dentry argument to the old get acl inode operation would amount to passing a dentry to the permission inode operation which we shouldn't and probably can't do. So instead of extending the existing inode operation Christoph suggested to add a new one. He also requested to ensure that the get and set acl inode operation taking a dentry are consistently named. So for this version the old get acl operation is renamed to ->get_inode_acl() and a new ->get_acl() inode operation taking a dentry is added. With this we can give both 9p and cifs get and set acl inode operations and in turn remove their complex custom posix xattr handlers. In the future I hope to get rid of the inode method duplication but it isn't like we have never had this situation. Readdir is just one example. And frankly, the overall gain in type safety and the more pleasant api wise are simply too big of a benefit to not accept this duplication for a while. - We've done a full audit of every codepaths using variant of the current generic xattr api to get and set posix acls and surprisingly it isn't that many places. There's of course always a chance that we might have missed some and if so I'm sure we'll find them soon enough. The crucial codepaths to be converted are obviously stacking filesystems such as ecryptfs and overlayfs. For a list of all callers currently using generic xattr api helpers see [2] including comments whether they support posix acls or not. - The old vfs generic posix acl infrastructure doesn't obey the create and replace semantics promised on the setxattr(2) manpage. This patch series doesn't address this. It really is something we should revisit later though. The patches are roughly organized as follows: (1) Change existing set acl inode operation to take a dentry argument (Intended to be a non-functional change) (2) Rename existing get acl method (Intended to be a non-functional change) (3) Implement get and set acl inode operations for filesystems that couldn't implement one before because of the missing dentry. That's mostly 9p and cifs (Intended to be a non-functional change) (4) Build posix acl api, i.e., add vfs_get_acl(), vfs_remove_acl(), and vfs_set_acl() including security and integrity hooks (Intended to be a non-functional change) (5) Implement get and set acl inode operations for stacking filesystems (Intended to be a non-functional change) (6) Switch posix acl handling in stacking filesystems to new posix acl api now that all filesystems it can stack upon support it. (7) Switch vfs to new posix acl api (semantical change) (8) Remove all now unused helpers (9) Additional regression fixes reported after we merged this into linux-next Thanks to Seth for a lot of good discussion around this and encouragement and input from Christoph" * tag 'fs.acl.rework.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (36 commits) posix_acl: Fix the type of sentinel in get_acl orangefs: fix mode handling ovl: call posix_acl_release() after error checking evm: remove dead code in evm_inode_set_acl() cifs: check whether acl is valid early acl: make vfs_posix_acl_to_xattr() static acl: remove a slew of now unused helpers 9p: use stub posix acl handlers cifs: use stub posix acl handlers ovl: use stub posix acl handlers ecryptfs: use stub posix acl handlers evm: remove evm_xattr_acl_change() xattr: use posix acl api ovl: use posix acl api ovl: implement set acl method ovl: implement get acl method ecryptfs: implement set acl method ecryptfs: implement get acl method ksmbd: use vfs_remove_acl() acl: add vfs_remove_acl() ...
2022-12-12Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull misc vfs updates from Al Viro: "misc pile" * tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: sysv: Fix sysv_nblocks() returns wrong value get rid of INT_LIMIT, use type_max() instead btrfs: replace INT_LIMIT(loff_t) with OFFSET_MAX fs: simplify vfs_get_super fs: drop useless condition from inode_needs_update_time
2022-12-12Merge tag 'pull-iov_iter' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull iov_iter updates from Al Viro: "iov_iter work; most of that is about getting rid of direction misannotations and (hopefully) preventing more of the same for the future" * tag 'pull-iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: use less confusing names for iov_iter direction initializers iov_iter: saner checks for attempt to copy to/from iterator [xen] fix "direction" argument of iov_iter_kvec() [vhost] fix 'direction' argument of iov_iter_{init,bvec}() [target] fix iov_iter_bvec() "direction" argument [s390] memcpy_real(): WRITE is "data source", not destination... [s390] zcore: WRITE is "data source", not destination... [infiniband] READ is "data destination", not source... [fsi] WRITE is "data source", not destination... [s390] copy_oldmem_kernel() - WRITE is "data source", not destination csum_and_copy_to_iter(): handle ITER_DISCARD get rid of unlikely() on page_copy_sane() calls
2022-12-12Merge tag 'pull-elfcore' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull elf coredumping updates from Al Viro: "Unification of regset and non-regset sides of ELF coredump handling. Collecting per-thread register values is the only thing that needs to be ifdefed there..." * tag 'pull-elfcore' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: [elf] get rid of get_note_info_size() [elf] unify regset and non-regset cases [elf][non-regset] use elf_core_copy_task_regs() for dumper as well [elf][non-regset] uninline elf_core_copy_task_fpregs() (and lose pt_regs argument) elf_core_copy_task_regs(): task_pt_regs is defined everywhere [elf][regset] simplify thread list handling in fill_note_info() [elf][regset] clean fill_note_info() a bit kill extern of vsyscall32_sysctl kill coredump_params->regs kill signal_pt_regs()
2022-12-12Merge tag 'mm-nonmm-stable-2022-12-12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - A ptrace API cleanup series from Sergey Shtylyov - Fixes and cleanups for kexec from ye xingchen - nilfs2 updates from Ryusuke Konishi - squashfs feature work from Xiaoming Ni: permit configuration of the filesystem's compression concurrency from the mount command line - A series from Akinobu Mita which addresses bound checking errors when writing to debugfs files - A series from Yang Yingliang to address rapidio memory leaks - A series from Zheng Yejian to address possible overflow errors in encode_comp_t() - And a whole shower of singleton patches all over the place * tag 'mm-nonmm-stable-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (79 commits) ipc: fix memory leak in init_mqueue_fs() hfsplus: fix bug causing custom uid and gid being unable to be assigned with mount rapidio: devices: fix missing put_device in mport_cdev_open kcov: fix spelling typos in comments hfs: Fix OOB Write in hfs_asc2mac hfs: fix OOB Read in __hfs_brec_find relay: fix type mismatch when allocating memory in relay_create_buf() ocfs2: always read both high and low parts of dinode link count io-mapping: move some code within the include guarded section kernel: kcsan: kcsan_test: build without structleak plugin mailmap: update email for Iskren Chernev eventfd: change int to __u64 in eventfd_signal() ifndef CONFIG_EVENTFD rapidio: fix possible UAF when kfifo_alloc() fails relay: use strscpy() is more robust and safer cpumask: limit visibility of FORCE_NR_CPUS acct: fix potential integer overflow in encode_comp_t() acct: fix accuracy loss for input value of encode_comp_t() linux/init.h: include <linux/build_bug.h> and <linux/stringify.h> rapidio: rio: fix possible name leak in rio_register_mport() rapidio: fix possible name leaks when rio_add_device() fails ...
2022-12-12Merge tag 'docs-6.2' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "This was a not-too-busy cycle for documentation; highlights include: - The beginnings of a set of translations into Spanish, headed up by Carlos Bilbao - More Chinese translations - A change to the Sphinx "alabaster" theme by default for HTML generation. Unlike the previous default (Read the Docs), alabaster is shipped with Sphinx by default, reducing the number of other dependencies that need to be installed. It also (IMO) produces a cleaner and more readable result. - The ability to render the documentation into the texinfo format (something Sphinx could always do, we just never wired it up until now) Plus the usual collection of typo fixes, build-warning fixes, and minor updates" * tag 'docs-6.2' of git://git.lwn.net/linux: (67 commits) Documentation/features: Use loongarch instead of loong Documentation/features-refresh.sh: Only sed the beginning "arch" of ARCH_DIR docs/zh_CN: Fix '.. only::' directive's expression docs/sp_SP: Add memory-barriers.txt Spanish translation docs/zh_CN/LoongArch: Update links of LoongArch ISA Vol1 and ELF psABI docs/LoongArch: Update links of LoongArch ISA Vol1 and ELF psABI Documentation/features: Update feature lists for 6.1 Documentation: Fixed a typo in bootconfig.rst docs/sp_SP: Add process coding-style translation docs/sp_SP: Add kernel-docs.rst Spanish translation docs: Create translations/sp_SP/process/, move submitting-patches.rst docs: Add book to process/kernel-docs.rst docs: Retire old resources from kernel-docs.rst docs: Update maintainer of kernel-docs.rst Documentation: riscv: Document the sv57 VM layout Documentation: USB: correct possessive "its" usage math64: fix kernel-doc return value warnings math64: add kernel-doc for DIV64_U64_ROUND_UP math64: favor kernel-doc from header files doc: add texinfodocs and infodocs targets ...
2022-12-12Merge tag 'linux-kselftest-kunit-next-6.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull KUnit updates from Shuah Khan: "Several enhancements, fixes, clean-ups, documentation updates, improvements to logging and KTAP compliance of KUnit test output: - log numbers in decimal and hex - parse KTAP compliant test output - allow conditionally exposing static symbols to tests when KUNIT is enabled - make static symbols visible during kunit testing - clean-ups to remove unused structure definition" * tag 'linux-kselftest-kunit-next-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (29 commits) Documentation: dev-tools: Clarify requirements for result description apparmor: test: make static symbols visible during kunit testing kunit: add macro to allow conditionally exposing static symbols to tests kunit: tool: make parser preserve whitespace when printing test log Documentation: kunit: Fix "How Do I Use This" / "Next Steps" sections kunit: tool: don't include KTAP headers and the like in the test log kunit: improve KTAP compliance of KUnit test output kunit: tool: parse KTAP compliant test output mm: slub: test: Use the kunit_get_current_test() function kunit: Use the static key when retrieving the current test kunit: Provide a static key to check if KUnit is actively running tests kunit: tool: make --json do nothing if --raw_ouput is set kunit: tool: tweak error message when no KTAP found kunit: remove KUNIT_INIT_MEM_ASSERTION macro Documentation: kunit: Remove redundant 'tips.rst' page Documentation: KUnit: reword description of assertions Documentation: KUnit: make usage.rst a superset of tips.rst, remove duplication kunit: eliminate KUNIT_INIT_*_ASSERT_STRUCT macros kunit: tool: remove redundant file.close() call in unit test kunit: tool: unit tests all check parser errors, standardize formatting a bit ...
2022-12-12Merge tag 'random-6.2-rc1-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator updates from Jason Donenfeld: - Replace prandom_u32_max() and various open-coded variants of it, there is now a new family of functions that uses fast rejection sampling to choose properly uniformly random numbers within an interval: get_random_u32_below(ceil) - [0, ceil) get_random_u32_above(floor) - (floor, U32_MAX] get_random_u32_inclusive(floor, ceil) - [floor, ceil] Coccinelle was used to convert all current users of prandom_u32_max(), as well as many open-coded patterns, resulting in improvements throughout the tree. I'll have a "late" 6.1-rc1 pull for you that removes the now unused prandom_u32_max() function, just in case any other trees add a new use case of it that needs to converted. According to linux-next, there may be two trivial cases of prandom_u32_max() reintroductions that are fixable with a 's/.../.../'. So I'll have for you a final conversion patch doing that alongside the removal patch during the second week. This is a treewide change that touches many files throughout. - More consistent use of get_random_canary(). - Updates to comments, documentation, tests, headers, and simplification in configuration. - The arch_get_random*_early() abstraction was only used by arm64 and wasn't entirely useful, so this has been replaced by code that works in all relevant contexts. - The kernel will use and manage random seeds in non-volatile EFI variables, refreshing a variable with a fresh seed when the RNG is initialized. The RNG GUID namespace is then hidden from efivarfs to prevent accidental leakage. These changes are split into random.c infrastructure code used in the EFI subsystem, in this pull request, and related support inside of EFISTUB, in Ard's EFI tree. These are co-dependent for full functionality, but the order of merging doesn't matter. - Part of the infrastructure added for the EFI support is also used for an improvement to the way vsprintf initializes its siphash key, replacing an sleep loop wart. - The hardware RNG framework now always calls its correct random.c input function, add_hwgenerator_randomness(), rather than sometimes going through helpers better suited for other cases. - The add_latent_entropy() function has long been called from the fork handler, but is a no-op when the latent entropy gcc plugin isn't used, which is fine for the purposes of latent entropy. But it was missing out on the cycle counter that was also being mixed in beside the latent entropy variable. So now, if the latent entropy gcc plugin isn't enabled, add_latent_entropy() will expand to a call to add_device_randomness(NULL, 0), which adds a cycle counter, without the absent latent entropy variable. - The RNG is now reseeded from a delayed worker, rather than on demand when used. Always running from a worker allows it to make use of the CPU RNG on platforms like S390x, whose instructions are too slow to do so from interrupts. It also has the effect of adding in new inputs more frequently with more regularity, amounting to a long term transcript of random values. Plus, it helps a bit with the upcoming vDSO implementation (which isn't yet ready for 6.2). - The jitter entropy algorithm now tries to execute on many different CPUs, round-robining, in hopes of hitting even more memory latencies and other unpredictable effects. It also will mix in a cycle counter when the entropy timer fires, in addition to being mixed in from the main loop, to account more explicitly for fluctuations in that timer firing. And the state it touches is now kept within the same cache line, so that it's assured that the different execution contexts will cause latencies. * tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (23 commits) random: include <linux/once.h> in the right header random: align entropy_timer_state to cache line random: mix in cycle counter when jitter timer fires random: spread out jitter callback to different CPUs random: remove extraneous period and add a missing one in comments efi: random: refresh non-volatile random seed when RNG is initialized vsprintf: initialize siphash key using notifier random: add back async readiness notifier random: reseed in delayed work rather than on-demand random: always mix cycle counter in add_latent_entropy() hw_random: use add_hwgenerator_randomness() for early entropy random: modernize documentation comment on get_random_bytes() random: adjust comment to account for removed function random: remove early archrandom abstraction random: use random.trust_{bootloader,cpu} command line option only stackprotector: actually use get_random_canary() stackprotector: move get_random_canary() into stackprotector.h treewide: use get_random_u32_inclusive() when possible treewide: use get_random_u32_{above,below}() instead of manual loop treewide: use get_random_u32_below() instead of deprecated function ...
2022-12-12Merge branch 'for-6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu Pull percpu updates from Dennis Zhou: "Baoquan was nice enough to run some clean ups for percpu" * 'for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu: mm/percpu: remove unused PERCPU_DYNAMIC_EARLY_SLOTS mm/percpu.c: remove the lcm code since block size is fixed at page size mm/percpu: replace the goto with break mm/percpu: add comment to state the empty populated pages accounting mm/percpu: Update the code comment when creating new chunk mm/percpu: use list_first_entry_or_null in pcpu_reclaim_populated() mm/percpu: remove unused pcpu_map_extend_chunks
2022-12-12Merge tag 'cgroup-for-6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "Nothing too interesting: - Add CONFIG_DEBUG_GROUP_REF which makes cgroup refcnt operations kprobable - A couple cpuset optimizations - Other misc changes including doc and test updates" * tag 'cgroup-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: remove rcu_read_lock()/rcu_read_unlock() in critical section of spin_lock_irq() cgroup/cpuset: Improve cpuset_css_alloc() description kselftest/cgroup: Add cleanup() to test_cpuset_prs.sh cgroup/cpuset: Optimize cpuset_attach() on v2 cgroup/cpuset: Skip spread flags update on v2 kselftest/cgroup: Fix gathering number of CPUs cgroup: cgroup refcnt functions should be exported when CONFIG_DEBUG_CGROUP_REF cgroup: Implement DEBUG_CGROUP_REF
2022-12-12Merge tag 'sched-core-2022-12-12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Implement persistent user-requested affinity: introduce affinity_context::user_mask and unconditionally preserve the user-requested CPU affinity masks, for long-lived tasks to better interact with cpusets & CPU hotplug events over longer timespans, without destroying the original affinity intent if the underlying topology changes. - Uclamp updates: fix relationship between uclamp and fits_capacity() - PSI fixes - Misc fixes & updates * tag 'sched-core-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Clear ttwu_pending after enqueue_task() sched/psi: Use task->psi_flags to clear in CPU migration sched/psi: Stop relying on timer_pending() for poll_work rescheduling sched/psi: Fix avgs_work re-arm in psi_avgs_work() sched/psi: Fix possible missing or delayed pending event sched: Always clear user_cpus_ptr in do_set_cpus_allowed() sched: Enforce user requested affinity sched: Always preserve the user requested cpumask sched: Introduce affinity_context sched: Add __releases annotations to affine_move_task() sched/fair: Check if prev_cpu has highest spare cap in feec() sched/fair: Consider capacity inversion in util_fits_cpu() sched/fair: Detect capacity inversion sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()'s early exit condition sched/uclamp: Make cpu_overutilized() use util_fits_cpu() sched/uclamp: Make asym_fits_capacity() use util_fits_cpu() sched/uclamp: Make select_idle_capacity() use util_fits_cpu() sched/uclamp: Fix fits_capacity() check in feec() sched/uclamp: Make task_fits_capacity() use util_fits_cpu() sched/uclamp: Fix relationship between uclamp and migration margin
2022-12-12Merge tag 'perf-core-2022-12-12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events updates from Ingo Molnar: - Thoroughly rewrite the data structures that implement perf task context handling, with the goal of fixing various quirks and unfeatures both in already merged, and in upcoming proposed code. The old data structure is the per task and per cpu perf_event_contexts: task_struct::perf_events_ctxp[] <-> perf_event_context <-> perf_cpu_context ^ | ^ | ^ `---------------------------------' | `--> pmu ---' v ^ perf_event ------' In this new design this is replaced with a single task context and a single CPU context, plus intermediate data-structures: task_struct::perf_event_ctxp -> perf_event_context <- perf_cpu_context ^ | ^ ^ `---------------------------' | | | | perf_cpu_pmu_context <--. | `----. ^ | | | | | | v v | | ,--> perf_event_pmu_context | | | | | | | v v | perf_event ---> pmu ----------------' [ See commit bd2756811766 for more details. ] This rewrite was developed by Peter Zijlstra and Ravi Bangoria. - Optimize perf_tp_event() - Update the Intel uncore PMU driver, extending it with UPI topology discovery on various hardware models. - Misc fixes & cleanups * tag 'perf-core-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) perf/x86/intel/uncore: Fix reference count leak in __uncore_imc_init_box() perf/x86/intel/uncore: Fix reference count leak in snr_uncore_mmio_map() perf/x86/intel/uncore: Fix reference count leak in hswep_has_limit_sbox() perf/x86/intel/uncore: Fix reference count leak in sad_cfg_iio_topology() perf/x86/intel/uncore: Make set_mapping() procedure void perf/x86/intel/uncore: Update sysfs-devices-mapping file perf/x86/intel/uncore: Enable UPI topology discovery for Sapphire Rapids perf/x86/intel/uncore: Enable UPI topology discovery for Icelake Server perf/x86/intel/uncore: Get UPI NodeID and GroupID perf/x86/intel/uncore: Enable UPI topology discovery for Skylake Server perf/x86/intel/uncore: Generalize get_topology() for SKX PMUs perf/x86/intel/uncore: Disable I/O stacks to PMU mapping on ICX-D perf/x86/intel/uncore: Clear attr_update properly perf/x86/intel/uncore: Introduce UPI topology type perf/x86/intel/uncore: Generalize IIO topology support perf/core: Don't allow grouping events from different hw pmus perf/amd/ibs: Make IBS a core pmu perf: Fix function pointer case perf/x86/amd: Remove the repeated declaration perf: Fix possible memleak in pmu_dev_alloc() ...
2022-12-12Merge tag 'edac_updates_for_6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - Make ghes_edac a simple module like the rest of the EDAC drivers and drop the forced built-in only configuration by disentangling it from GHES (Jia He) - The usual small cleanups and improvements all over EDAC land * tag 'edac_updates_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/i10nm: fix refcount leak in pci_get_dev_wrapper() EDAC/i5400: Fix typo in comment: vaious -> various EDAC/mc_sysfs: Increase legacy channel support to 12 MAINTAINERS: Make Mauro EDAC reviewer MAINTAINERS: Make Manivannan Sadhasivam the maintainer of qcom_edac EDAC/igen6: Return the correct error type when not the MC owner apei/ghes: Use xchg_release() for updating new cache slot instead of cmpxchg() EDAC: Check for GHES preference in the chipset-specific EDAC drivers EDAC/ghes: Make ghes_edac a proper module EDAC/ghes: Prepare to make ghes_edac a proper module EDAC/ghes: Add a notifier for reporting memory errors efi/cper: Export several helpers for ghes_edac to use EDAC/i5000: Mark as BROKEN
2022-12-12Merge tag 'x86_cache_for_6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cache resource control updates from Dave Hansen: "These declare the resource control (rectrl) MSRs a bit more normally and clean up an unnecessary structure member: - Remove unnecessary arch_has_empty_bitmaps structure memory - Move rescrtl MSR defines into msr-index.h, like normal MSRs" * tag 'x86_cache_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Move MSR defines into msr-index.h x86/resctrl: Remove arch_has_empty_bitmaps
2022-12-12Merge tag 'x86_tdx_for_6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 tdx updates from Dave Hansen: "This includes a single chunk of new functionality for TDX guests which allows them to talk to the trusted TDX module software and obtain an attestation report. This report can then be used to prove the trustworthiness of the guest to a third party and get access to things like storage encryption keys" * tag 'x86_tdx_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: selftests/tdx: Test TDX attestation GetReport support virt: Add TDX guest driver x86/tdx: Add a wrapper to get TDREPORT0 from the TDX Module
2022-12-12Merge tag 'cxl-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxlLinus Torvalds
Pull cxl updates from Dan Williams: "Compute Express Link (CXL) updates for 6.2. While it may seem backwards, the CXL update this time around includes some focus on CXL 1.x enabling where the work to date had been with CXL 2.0 (VH topologies) in mind. First generation CXL can mostly be supported via BIOS, similar to DDR, however it became clear there are use cases for OS native CXL error handling and some CXL 3.0 endpoint features can be deployed on CXL 1.x hosts (Restricted CXL Host (RCH) topologies). So, this update brings RCH topologies into the Linux CXL device model. In support of the ongoing CXL 2.0+ enabling two new core kernel facilities are added. One is the ability for the kernel to flag collisions between userspace access to PCI configuration registers and kernel accesses. This is brought on by the PCIe Data-Object-Exchange (DOE) facility, a hardware mailbox over config-cycles. The other is a cpu_cache_invalidate_memregion() API that maps to wbinvd_on_all_cpus() on x86. To prevent abuse it is disabled in guest VMs and architectures that do not support it yet. The CXL paths that need it, dynamic memory region creation and security commands (erase / unlock), are disabled when it is not present. As for the CXL 2.0+ this cycle the subsystem gains support Persistent Memory Security commands, error handling in response to PCIe AER notifications, and support for the "XOR" host bridge interleave algorithm. Summary: - Add the cpu_cache_invalidate_memregion() API for cache flushing in response to physical memory reconfiguration, or memory-side data invalidation from operations like secure erase or memory-device unlock. - Add a facility for the kernel to warn about collisions between kernel and userspace access to PCI configuration registers - Add support for Restricted CXL Host (RCH) topologies (formerly CXL 1.1) - Add handling and reporting of CXL errors reported via the PCIe AER mechanism - Add support for CXL Persistent Memory Security commands - Add support for the "XOR" algorithm for CXL host bridge interleave - Rework / simplify CXL to NVDIMM interactions - Miscellaneous cleanups and fixes" * tag 'cxl-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (71 commits) cxl/region: Fix memdev reuse check cxl/pci: Remove endian confusion cxl/pci: Add some type-safety to the AER trace points cxl/security: Drop security command ioctl uapi cxl/mbox: Add variable output size validation for internal commands cxl/mbox: Enable cxl_mbox_send_cmd() users to validate output size cxl/security: Fix Get Security State output payload endian handling cxl: update names for interleave ways conversion macros cxl: update names for interleave granularity conversion macros cxl/acpi: Warn about an invalid CHBCR in an existing CHBS entry tools/testing/cxl: Require cache invalidation bypass cxl/acpi: Fail decoder add if CXIMS for HBIG is missing cxl/region: Fix spelling mistake "memergion" -> "memregion" cxl/regs: Fix sparse warning cxl/acpi: Set ACPI's CXL _OSC to indicate RCD mode support tools/testing/cxl: Add an RCH topology cxl/port: Add RCD endpoint port enumeration cxl/mem: Move devm_cxl_add_endpoint() from cxl_core to cxl_mem tools/testing/cxl: Add XOR Math support to cxl_test cxl/acpi: Support CXL XOR Interleave Math (CXIMS) ...
2022-12-12Merge tag 'thermal-6.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control updates from Rafael Wysocki: "These include thermal core fixes to protect thermal device operations against thermal device removal, other thermal core fixes and updates of Intel thermal control drivers. Specifics: - Fix race conditions related to thermal device operations that are not protected against thermal device removal (Guenter Roeck) - Fix error code in __thermal_cooling_device_register() (Dan Carpenter) - Validate new cooling device state (coming from user space) in cur_state_store() and reuse the max_state value from cooling device structure in the sysfs interface (Viresh Kumar) - Fix some possible name leaks in error paths in the thermal control core code (Yang Yingliang) - Detect TCC lock bit set in the intel_tcc_cooling driver and make it refuse to update the TCC offset in that case (Zhang Rui) - Add TCC cooling support for RaptorLake-S (Zhang Rui) - Prevent accidental clearing of HFI status by one of the other drivers using the same status register (Srinivas Pandruvada) - Protect clearing of thermal status bits in Intel thermal control drivers (Srinivas Pandruvada) - Allow the HFI thermal control driver to ACK an HFI event for the previously observed timestamp (Srinivas Pandruvada) - Remove a pointless die_id check from the HFI thermal driver and adjust the definition a data structure used by it (Ricardo Neri)" * tag 'thermal-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: intel: hfi: Remove a pointless die_id check thermal: core: fix some possible name leaks in error paths thermal: intel: hfi: ACK HFI for the same timestamp thermal: intel: Protect clearing of thermal status bits thermal: intel: Prevent accidental clearing of HFI status thermal/core: Protect thermal device operations against thermal device removal thermal/core: Remove thermal_zone_set_trips() thermal/core: Protect sysfs accesses to thermal operations with thermal zone mutex thermal/core: Protect hwmon accesses to thermal operations with thermal zone mutex thermal/core: Introduce locked version of thermal_zone_device_update thermal/core: Move parameter validation from __thermal_zone_get_temp to thermal_zone_get_temp thermal/core: Ensure that thermal device is registered in thermal_zone_get_temp thermal/core: Delete device under thermal device zone lock thermal/core: Destroy thermal zone device mutex in release function thermal: intel: intel_tcc_cooling: Add TCC cooling support for RaptorLake-S thermal: intel: intel_tcc_cooling: Detect TCC lock bit thermal: intel: hfi: Improve the type of hfi_features::nr_table_pages thermal/core: fix error code in __thermal_cooling_device_register() thermal: sysfs: Reuse cdev->max_state thermal: Validate new state in cur_state_store()
2022-12-12Merge tag 'acpi-6.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and PNP updates from Rafael Wysocki: "These include new code (for instance, support for the FFH address space type and support for new firmware data structures in ACPICA), some new quirks (mostly related to backlight handling and I2C enumeration), a number of fixes and a fair amount of cleanups all over. Specifics: - Update the ACPICA code in the kernel to the 20221020 upstream version and fix a couple of issues in it: - Make acpi_ex_load_op() match upstream implementation (Rafael Wysocki) - Add support for loong_arch-specific APICs in MADT (Huacai Chen) - Add support for fixed PCIe wake event (Huacai Chen) - Add EBDA pointer sanity checks (Vit Kabele) - Avoid accessing VGA memory when EBDA < 1KiB (Vit Kabele) - Add CCEL table support to both compiler/disassembler (Kuppuswamy Sathyanarayanan) - Add a couple of new UUIDs to the known UUID list (Bob Moore) - Add support for FFH Opregion special context data (Sudeep Holla) - Improve warning message for "invalid ACPI name" (Bob Moore) - Add support for CXL 3.0 structures (CXIMS & RDPAS) in the CEDT table (Alison Schofield) - Prepare IORT support for revision E.e (Robin Murphy) - Finish support for the CDAT table (Bob Moore) - Fix error code path in acpi_ds_call_control_method() (Rafael Wysocki) - Fix use-after-free in acpi_ut_copy_ipackage_to_ipackage() (Li Zetao) - Update the version of the ACPICA code in the kernel (Bob Moore) - Use ZERO_PAGE(0) instead of empty_zero_page in the ACPI device enumeration code (Giulio Benetti) - Change the return type of the ACPI driver remove callback to void and update its users accordingly (Dawei Li) - Add general support for FFH address space type and implement the low- level part of it for ARM64 (Sudeep Holla) - Fix stale comments in the ACPI tables parsing code and make it print more messages related to MADT (Hanjun Guo, Huacai Chen) - Replace invocations of generic library functions with more kernel- specific counterparts in the ACPI sysfs interface (Christophe JAILLET, Xu Panda) - Print full name paths of ACPI power resource objects during enumeration (Kane Chen) - Eliminate a compiler warning regarding a missing function prototype in the ACPI power management code (Sudeep Holla) - Fix and clean up the ACPI processor driver (Rafael Wysocki, Li Zhong, Colin Ian King, Sudeep Holla) - Add quirk for the HP Pavilion Gaming 15-cx0041ur to the ACPI EC driver (Mia Kanashi) - Add some mew ACPI backlight handling quirks and update some existing ones (Hans de Goede) - Make the ACPI backlight driver prefer the native backlight control over vendor backlight control when possible (Hans de Goede) - Drop unsetting ACPI APEI driver data on remove (Uwe Kleine-König) - Use xchg_release() instead of cmpxchg() for updating new GHES cache slots (Ard Biesheuvel) - Clean up the ACPI APEI code (Sudeep Holla, Christophe JAILLET, Jay Lu) - Add new I2C device enumeration quirks for Medion Lifetab S10346 and Lenovo Yoga Tab 3 Pro (YT3-X90F) (Hans de Goede) - Make the ACPI battery driver notify user space about adding new battery hooks and removing the existing ones (Armin Wolf) - Modify the pfr_update and pfr_telemetry drivers to use ACPI_FREE() for freeing acpi_object structures to help diagnostics (Wang ShaoBo) - Make the ACPI fan driver use sysfs_emit_at() in its sysfs interface code (ye xingchen) - Fix the _FIF package extraction failure handling in the ACPI fan driver (Hanjun Guo) - Fix the PCC mailbox handling error code path (Huisong Li) - Avoid using PCC Opregions if there is no platform interrupt allocated for this purpose (Huisong Li) - Use sysfs_emit() instead of scnprintf() in the ACPI PAD driver and CPPC library (ye xingchen) - Fix some kernel-doc issues in the ACPI GSI processing code (Xiongfeng Wang) - Fix name memory leak in pnp_alloc_dev() (Yang Yingliang) - Do not disable PNP devices on suspend when they cannot be re-enabled on resume (Hans de Goede) - Clean up the ACPI thermal driver a bit (Rafael Wysocki)" * tag 'acpi-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (67 commits) ACPI: x86: Add skip i2c clients quirk for Medion Lifetab S10346 ACPI: APEI: EINJ: Refactor available_error_type_show() ACPI: APEI: EINJ: Fix formatting errors ACPI: processor: perflib: Adjust acpi_processor_notify_smm() return value ACPI: processor: perflib: Rearrange acpi_processor_notify_smm() ACPI: processor: perflib: Rearrange unregistration routine ACPI: processor: perflib: Drop redundant parentheses ACPI: processor: perflib: Adjust white space ACPI: processor: idle: Drop unnecessary statements and parens ACPI: thermal: Adjust critical.flags.valid check ACPI: fan: Convert to use sysfs_emit_at() API ACPICA: Fix use-after-free in acpi_ut_copy_ipackage_to_ipackage() ACPI: battery: Call power_supply_changed() when adding hooks ACPI: use sysfs_emit() instead of scnprintf() ACPI: x86: Add skip i2c clients quirk for Lenovo Yoga Tab 3 Pro (YT3-X90F) ACPI: APEI: Remove a useless include PNP: Do not disable devices on suspend when they cannot be re-enabled on resume ACPI: processor: Silence missing prototype warnings ACPI: processor_idle: Silence missing prototype warnings ACPI: PM: Silence missing prototype warning ...
2022-12-12Merge tag 'pm-6.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These include two new drivers (cpufreq driver for Apple SoC CPU P-states and the SCMI Powercap based power capping driver), other new hardware support and driver extensions (Qualcomm cpufreq driver and its DT bindings, TI cpufreq driver, intel_pstate, intel-uncore-freq), a bunch of fixes and cleanups all over and a cpupower utility update including new features related to RAPL support. Specifics: - Fix nasty and hard to debug race condition introduced by mistake in the runtime PM core code and clean up that code somewhat on top of the fix (Rafael Wysocki) - Generalize of_perf_domain_get_sharing_cpumask phandle format (Hector Martin) - Add new cpufreq driver for Apple SoC CPU P-states (Hector Martin) - Update Qualcomm cpufreq driver (Manivannan Sadhasivam, Chen Hui): - CPU clock provider support - Generic cleanups or reorganization - Potential memleak fix - Fix of the return value of cpufreq_driver->get() - Update Qualcomm cpufreq driver's DT bindings (Manivannan Sadhasivam, Rob Herring, Melody Olvera): - Support for CPU clock provider - Missing cache-related properties fixes - Support for QDU1000/QRU1000 - Add support for ti,am625 SoC and enable build of ti-cpufreq for ARCH_K3 (Dave Gerlach, and Vibhore Vardhan) - Use flexible array to simplify memory allocation in the tegra186 cpufreq driver (Christophe JAILLET) - Convert cpufreq statistics code to use sysfs_emit_at() (ye xingchen) - Allow intel_pstate to use no-HWP mode on Sapphire Rapids (Giovanni Gherdovich) - Add missing pci_dev_put() to the amd_freq_sensitivity cpufreq driver (Xiongfeng Wang) - Initialize the kobj_unregister completion before calling kobject_init_and_add() in the cpufreq core code (Yongqiang Liu) - Defer setting boost MSRs in the ACPI cpufreq driver (Stuart Hayes, Nathan Chancellor) - Make intel_pstate accept initial EPP value of 0x80 (Srinivas Pandruvada) - Make read-only array sys_clk_src in the SPEAr cpufreq driver static (Colin Ian King) - Make array speeds in the longhaul cpufreq driver static (Colin Ian King) - Use str_enabled_disabled() helper in the ACPI cpufreq driver (Andy Shevchenko) - Drop a reference to CVS from cpufreq documentation (Conghui Wang) - Improve kernel messages printed by the PSCI cpuidle driver (Ulf Hansson) - Make the DT cpuidle driver return the correct number of parsed idle states, clean it up and clarify a comment in it (Ulf Hansson) - Modify the tasks freezing code to avoid using pr_cont() and refine an error message printed by it (Rafael Wysocki) - Make the hibernation core code complain about memory map mismatches during resume to help diagnostics (Xueqin Luo) - Fix mistake in a kerneldoc comment in the hibernation code (xiongxin) - Reverse the order of performance and enabling operations in the generic power domains code (Abel Vesa) - Power off[on] domains in hibernate .freeze[thaw]_noirq hook of in the generic power domains code (Abel Vesa) - Consolidate genpd_restore_noirq() and genpd_resume_noirq() (Shawn Guo) - Pass generic PM noirq hooks to genpd_finish_suspend() (Shawn Guo) - Drop generic power domain status manipulation during hibernate restore (Shawn Guo) - Fix compiler warnings with make W=1 in the idle_inject power capping driver (Srinivas Pandruvada) - Use kstrtobool() instead of strtobool() in the power capping sysfs interface (Christophe JAILLET) - Add SCMI Powercap based power capping driver (Cristian Marussi) - Add Emerald Rapids support to the intel-uncore-freq driver (Artem Bityutskiy) - Repair slips in kernel-doc comments in the generic notifier code (Lukas Bulwahn) - Fix several DT issues in the OPP library reorganize code around opp-microvolt-<named> DT property (Viresh Kumar) - Allow any of opp-microvolt, opp-microamp, or opp-microwatt properties to be present without the others present (James Calligeros) - Fix clock-latency-ns property in DT example (Serge Semin) - Add a private governor_data for devfreq governors (Kant Fan) - Reorganize devfreq code to use device_match_of_node() and devm_platform_get_and_ioremap_resource() instead of open coding them (ye xingchen, Minghao Chi) - Make cpupower choose base_cpu to display default cpupower details instead of picking CPU 0 (Saket Kumar Bhaskar) - Add Georgian translation to cpupower documentation (Zurab Kargareteli) - Introduce powercap intel-rapl library, powercap-info command, and RAPL monitor into cpupower (Thomas Renninger)" * tag 'pm-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (64 commits) PM: runtime: Adjust white space in the core code cpufreq: Remove CVS version control contents from documentation cpufreq: stats: Convert to use sysfs_emit_at() API cpufreq: ACPI: Only set boost MSRs on supported CPUs PM: sleep: Refine error message in try_to_freeze_tasks() PM: sleep: Avoid using pr_cont() in the tasks freezing code PM: runtime: Relocate rpm_callback() right after __rpm_callback() PM: runtime: Do not call __rpm_callback() from rpm_idle() PM / devfreq: event: use devm_platform_get_and_ioremap_resource() PM / devfreq: event: Use device_match_of_node() PM / devfreq: Use device_match_of_node() powercap: idle_inject: Fix warnings with make W=1 PM: hibernate: Complain about memory map mismatches during resume dt-bindings: cpufreq: cpufreq-qcom-hw: Add QDU1000/QRU1000 cpufreq cpufreq: tegra186: Use flexible array to simplify memory allocation cpupower: rapl monitor - shows the used power consumption in uj for each rapl domain cpupower: Introduce powercap intel-rapl library and powercap-info command cpupower: Add Georgian translation cpufreq: intel_pstate: Add Sapphire Rapids support in no-HWP mode cpufreq: amd_freq_sensitivity: Add missing pci_dev_put() ...
2022-12-12kunit: add macro to allow conditionally exposing static symbols to testsRae Moar
Create two macros: VISIBLE_IF_KUNIT - A macro that sets symbols to be static if CONFIG_KUNIT is not enabled. Otherwise if CONFIG_KUNIT is enabled there is no change to the symbol definition. EXPORT_SYMBOL_IF_KUNIT(symbol) - Exports symbol into EXPORTED_FOR_KUNIT_TESTING namespace only if CONFIG_KUNIT is enabled. Must use MODULE_IMPORT_NS(EXPORTED_FOR_KUNIT_TESTING) in test file in order to use symbols. Signed-off-by: Rae Moar <rmoar@google.com> Reviewed-by: John Johansen <john.johansen@canonical.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-12-12kunit: Use the static key when retrieving the current testDavid Gow
In order to detect if a KUnit test is running, and to access its context, the 'kunit_test' member of the current task_struct is used. Usually, this is accessed directly or via the kunit_fail_current_task() function. In order to speed up the case where no test is running, add a wrapper, kunit_get_current_test(), which uses the static key to fail early. Equally, Speed up kunit_fail_current_test() by using the static key. This should make it convenient for code to call this unconditionally in fakes or error paths, without worrying that this will slow the code down significantly. If CONFIG_KUNIT=n (or m), this compiles away to nothing. If CONFIG_KUNIT=y, it will compile down to a NOP (on most architectures) if no KUnit test is currently running. Note that kunit_get_current_test() does not work if KUnit is built as a module. This mirrors the existing restriction on kunit_fail_current_test(). Note that the definition of kunit_fail_current_test() still wraps an empty, inline function if KUnit is not built-in. This is to ensure that the printf format string __attribute__ will still work. Also update the documentation to suggest users use the new kunit_get_current_test() function, update the example, and to describe the behaviour when KUnit is disabled better. Cc: Jonathan Corbet <corbet@lwn.net> Cc: Sadiya Kazi <sadiyakazi@google.com> Signed-off-by: David Gow <davidgow@google.com> Reviewed-by: Daniel Latypov <dlatypov@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-12-12kunit: Provide a static key to check if KUnit is actively running testsDavid Gow
KUnit does a few expensive things when enabled. This hasn't been a problem because KUnit was only enabled on test kernels, but with a few people enabling (but not _using_) KUnit on production systems, we need a runtime way of handling this. Provide a 'kunit_running' static key (defaulting to false), which allows us to hide any KUnit code behind a static branch. This should reduce the performance impact (on other code) of having KUnit enabled to a single NOP when no tests are running. Note that, while it looks unintuitive, tests always run entirely within __kunit_test_suites_init(), so it's safe to decrement the static key at the end of this function, rather than in __kunit_test_suites_exit(), which is only there to clean up results in debugfs. Signed-off-by: David Gow <davidgow@google.com> Reviewed-by: Daniel Latypov <dlatypov@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-12-12kunit: remove KUNIT_INIT_MEM_ASSERTION macroDaniel Latypov
Commit 870f63b7cd78 ("kunit: eliminate KUNIT_INIT_*_ASSERT_STRUCT macros") removed all the other macros of this type. But it raced with commit b8a926bea8b1 ("kunit: Introduce KUNIT_EXPECT_MEMEQ and KUNIT_EXPECT_MEMNEQ macros"), which added another instance. Remove KUNIT_INIT_MEM_ASSERTION and just use the generic KUNIT_INIT_ASSERT macro instead. Rename the `size` arg to avoid conflicts by appending a "_" (like we did in the previous commit). Signed-off-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-12-12kunit: eliminate KUNIT_INIT_*_ASSERT_STRUCT macrosDaniel Latypov
These macros exist because passing an initializer list to other macros is hard. The goal of these macros is to generate a line like struct $ASSERT_TYPE __assertion = $APPROPRIATE_INITIALIZER; e.g. struct kunit_unary_assertion __assertion = { .condition = "foo()", .expected_true = true }; But the challenge is you can't pass `{.condition=..., .expect_true=...}` as a macro argument, since the comma means you're actually passing two arguments, `{.condition=...` and `.expect_true=....}`. So we'd made custom macros for each different initializer-list shape. But we can work around this with the following generic macro #define KUNIT_INIT_ASSERT(initializers...) { initializers } Note: this has the downside that we have to rename some macros arguments to not conflict with the struct field names (e.g. `expected_true`). It's a bit gross, but probably worth reducing the # of macros. Signed-off-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2022-12-12Merge tag 'x86-misc-2022-12-10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 updates from Thomas Gleixner: "Updates for miscellaneous x86 areas: - Reserve a new boot loader type for barebox which is usally used on ARM and MIPS, but can also be utilized as EFI payload on x86 to provide watchdog-supervised boot up. - Consolidate the native and compat 32bit signal handling code and split the 64bit version out into a separate source file - Switch the ESPFIX random usage to get_random_long()" * tag 'x86-misc-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/espfix: Use get_random_long() rather than archrandom x86/signal/64: Move 64-bit signal code to its own file x86/signal/32: Merge native and compat 32-bit signal code x86/signal: Add ABI prefixes to frame setup functions x86/signal: Merge get_sigframe() x86: Remove __USER32_DS signal/compat: Remove compat_sigset_t override x86/signal: Remove sigset_t parameter from frame setup functions x86/signal: Remove sig parameter from frame setup functions Documentation/x86/boot: Reserve type_of_loader=13 for barebox
2022-12-12Merge tag 'timers-core-2022-12-10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Updates for timers, timekeeping and drivers: Core: - The timer_shutdown[_sync]() infrastructure: Tearing down timers can be tedious when there are circular dependencies to other things which need to be torn down. A prime example is timer and workqueue where the timer schedules work and the work arms the timer. What needs to prevented is that pending work which is drained via destroy_workqueue() does not rearm the previously shutdown timer. Nothing in that shutdown sequence relies on the timer being functional. The conclusion was that the semantics of timer_shutdown_sync() should be: - timer is not enqueued - timer callback is not running - timer cannot be rearmed Preventing the rearming of shutdown timers is done by discarding rearm attempts silently. A warning for the case that a rearm attempt of a shutdown timer is detected would not be really helpful because it's entirely unclear how it should be acted upon. The only way to address such a case is to add 'if (in_shutdown)' conditionals all over the place. This is error prone and in most cases of teardown not required all. - The real fix for the bluetooth HCI teardown based on timer_shutdown_sync(). A larger scale conversion to timer_shutdown_sync() is work in progress. - Consolidation of VDSO time namespace helper functions - Small fixes for timer and timerqueue Drivers: - Prevent integer overflow on the XGene-1 TVAL register which causes an never ending interrupt storm. - The usual set of new device tree bindings - Small fixes and improvements all over the place" * tag 'timers-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) dt-bindings: timer: renesas,cmt: Add r8a779g0 CMT support dt-bindings: timer: renesas,tmu: Add r8a779g0 support clocksource/drivers/arm_arch_timer: Use kstrtobool() instead of strtobool() clocksource/drivers/timer-ti-dm: Fix missing clk_disable_unprepare in dmtimer_systimer_init_clock() clocksource/drivers/timer-ti-dm: Clear settings on probe and free clocksource/drivers/timer-ti-dm: Make timer_get_irq static clocksource/drivers/timer-ti-dm: Fix warning for omap_timer_match clocksource/drivers/arm_arch_timer: Fix XGene-1 TVAL register math error clocksource/drivers/timer-npcm7xx: Enable timer 1 clock before use dt-bindings: timer: nuvoton,npcm7xx-timer: Allow specifying all clocks dt-bindings: timer: rockchip: Add rockchip,rk3128-timer clockevents: Repair kernel-doc for clockevent_delta2ns() clocksource/drivers/ingenic-ost: Define pm functions properly in platform_driver struct clocksource/drivers/sh_cmt: Access registers according to spec vdso/timens: Refactor copy-pasted find_timens_vvar_page() helper into one copy Bluetooth: hci_qca: Fix the teardown problem for real timers: Update the documentation to reflect on the new timer_shutdown() API timers: Provide timer_shutdown[_sync]() timers: Add shutdown mechanism to the internal functions timers: Split [try_to_]del_timer[_sync]() to prepare for shutdown mode ...
2022-12-12Merge tag 'irq-core-2022-12-10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for the interrupt core and driver subsystem: The bulk is the rework of the MSI subsystem to support per device MSI interrupt domains. This solves conceptual problems of the current PCI/MSI design which are in the way of providing support for PCI/MSI[-X] and the upcoming PCI/IMS mechanism on the same device. IMS (Interrupt Message Store] is a new specification which allows device manufactures to provide implementation defined storage for MSI messages (as opposed to PCI/MSI and PCI/MSI-X that has a specified message store which is uniform accross all devices). The PCI/MSI[-X] uniformity allowed us to get away with "global" PCI/MSI domains. IMS not only allows to overcome the size limitations of the MSI-X table, but also gives the device manufacturer the freedom to store the message in arbitrary places, even in host memory which is shared with the device. There have been several attempts to glue this into the current MSI code, but after lengthy discussions it turned out that there is a fundamental design problem in the current PCI/MSI-X implementation. This needs some historical background. When PCI/MSI[-X] support was added around 2003, interrupt management was completely different from what we have today in the actively developed architectures. Interrupt management was completely architecture specific and while there were attempts to create common infrastructure the commonalities were rudimentary and just providing shared data structures and interfaces so that drivers could be written in an architecture agnostic way. The initial PCI/MSI[-X] support obviously plugged into this model which resulted in some basic shared infrastructure in the PCI core code for setting up MSI descriptors, which are a pure software construct for holding data relevant for a particular MSI interrupt, but the actual association to Linux interrupts was completely architecture specific. This model is still supported today to keep museum architectures and notorious stragglers alive. In 2013 Intel tried to add support for hot-pluggable IO/APICs to the kernel, which was creating yet another architecture specific mechanism and resulted in an unholy mess on top of the existing horrors of x86 interrupt handling. The x86 interrupt management code was already an incomprehensible maze of indirections between the CPU vector management, interrupt remapping and the actual IO/APIC and PCI/MSI[-X] implementation. At roughly the same time ARM struggled with the ever growing SoC specific extensions which were glued on top of the architected GIC interrupt controller. This resulted in a fundamental redesign of interrupt management and provided the today prevailing concept of hierarchical interrupt domains. This allowed to disentangle the interactions between x86 vector domain and interrupt remapping and also allowed ARM to handle the zoo of SoC specific interrupt components in a sane way. The concept of hierarchical interrupt domains aims to encapsulate the functionality of particular IP blocks which are involved in interrupt delivery so that they become extensible and pluggable. The X86 encapsulation looks like this: |--- device 1 [Vector]---[Remapping]---[PCI/MSI]--|... |--- device N where the remapping domain is an optional component and in case that it is not available the PCI/MSI[-X] domains have the vector domain as their parent. This reduced the required interaction between the domains pretty much to the initialization phase where it is obviously required to establish the proper parent relation ship in the components of the hierarchy. While in most cases the model is strictly representing the chain of IP blocks and abstracting them so they can be plugged together to form a hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the hardware it's clear that the actual PCI/MSI[-X] interrupt controller is not a global entity, but strict a per PCI device entity. Here we took a short cut on the hierarchical model and went for the easy solution of providing "global" PCI/MSI domains which was possible because the PCI/MSI[-X] handling is uniform across the devices. This also allowed to keep the existing PCI/MSI[-X] infrastructure mostly unchanged which in turn made it simple to keep the existing architecture specific management alive. A similar problem was created in the ARM world with support for IP block specific message storage. Instead of going all the way to stack a IP block specific domain on top of the generic MSI domain this ended in a construct which provides a "global" platform MSI domain which allows overriding the irq_write_msi_msg() callback per allocation. In course of the lengthy discussions we identified other abuse of the MSI infrastructure in wireless drivers, NTB etc. where support for implementation specific message storage was just mindlessly glued into the existing infrastructure. Some of this just works by chance on particular platforms but will fail in hard to diagnose ways when the driver is used on platforms where the underlying MSI interrupt management code does not expect the creative abuse. Another shortcoming of today's PCI/MSI-X support is the inability to allocate or free individual vectors after the initial enablement of MSI-X. This results in an works by chance implementation of VFIO (PCI pass-through) where interrupts on the host side are not set up upfront to avoid resource exhaustion. They are expanded at run-time when the guest actually tries to use them. The way how this is implemented is that the host disables MSI-X and then re-enables it with a larger number of vectors again. That works by chance because most device drivers set up all interrupts before the device actually will utilize them. But that's not universally true because some drivers allocate a large enough number of vectors but do not utilize them until it's actually required, e.g. for acceleration support. But at that point other interrupts of the device might be in active use and the MSI-X disable/enable dance can just result in losing interrupts and therefore hard to diagnose subtle problems. Last but not least the "global" PCI/MSI-X domain approach prevents to utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact that IMS is not longer providing a uniform storage and configuration model. The solution to this is to implement the missing step and switch from global PCI/MSI domains to per device PCI/MSI domains. The resulting hierarchy then looks like this: |--- [PCI/MSI] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N which in turn allows to provide support for multiple domains per device: |--- [PCI/MSI] device 1 |--- [PCI/IMS] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N |--- [PCI/IMS] device N This work converts the MSI and PCI/MSI core and the x86 interrupt domains to the new model, provides new interfaces for post-enable allocation/free of MSI-X interrupts and the base framework for PCI/IMS. PCI/IMS has been verified with the work in progress IDXD driver. There is work in progress to convert ARM over which will replace the platform MSI train-wreck. The cleanup of VFIO, NTB and other creative "solutions" are in the works as well. Drivers: - Updates for the LoongArch interrupt chip drivers - Support for MTK CIRQv2 - The usual small fixes and updates all over the place" * tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (134 commits) irqchip/ti-sci-inta: Fix kernel doc irqchip/gic-v2m: Mark a few functions __init irqchip/gic-v2m: Include arm-gic-common.h irqchip/irq-mvebu-icu: Fix works by chance pointer assignment iommu/amd: Enable PCI/IMS iommu/vt-d: Enable PCI/IMS x86/apic/msi: Enable PCI/IMS PCI/MSI: Provide pci_ims_alloc/free_irq() PCI/MSI: Provide IMS (Interrupt Message Store) support genirq/msi: Provide constants for PCI/IMS support x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X PCI/MSI: Provide prepare_desc() MSI domain op PCI/MSI: Split MSI-X descriptor setup genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN genirq/msi: Provide msi_domain_alloc_irq_at() genirq/msi: Provide msi_domain_ops:: Prepare_desc() genirq/msi: Provide msi_desc:: Msi_data genirq/msi: Provide struct msi_map x86/apic/msi: Remove arch_create_remap_msi_irq_domain() ...
2022-12-12Merge tag 'for-linus-6.2-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - fix memory leaks in error paths - add support for virtio PCI-devices in Xen guests on ARM - two minor fixes * tag 'for-linus-6.2-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/privcmd: Fix a possible warning in privcmd_ioctl_mmap_resource() x86/xen: Fix memory leak in xen_init_lock_cpu() x86/xen: Fix memory leak in xen_smp_intr_init{_pv}() xen: fix xen.h build for CONFIG_XEN_PVH=y xen/virtio: Handle PCI devices which Host controller is described in DT xen/virtio: Optimize the setup of "xen-grant-dma" devices
2022-12-12Merge tag 'platform-drivers-x86-v6.2-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver updates from Hans de Goede: - Intel: - PMC: Add support for Meteor Lake - Intel On Demand: various updates - Ideapad-laptop: - Add support for various Fn keys on new models - Fix touchpad on/off handling in a generic way to avoid having to add more and more quirks - Android x86 tablets: - Add support for two more X86 Android tablet models - New Dell WMI DDV driver - Miscellaneous cleanups and small bugfixes * tag 'platform-drivers-x86-v6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (52 commits) platform/mellanox: mlxbf-pmc: Fix event typo platform/x86: intel_scu_ipc: fix possible name leak in __intel_scu_ipc_register() platform/x86: sony-laptop: Convert to use sysfs_emit_at() API platform/x86/dell: alienware-wmi: Use sysfs_emit() instead of scnprintf() platform/x86: uv_sysfs: Use sysfs_emit() instead of scnprintf() platform/x86: mxm-wmi: fix memleak in mxm_wmi_call_mx[ds|mx]() platform/x86: x86-android-tablets: Add Advantech MICA-071 extra button platform/x86: x86-android-tablets: Add Lenovo Yoga Tab 3 (YT3-X90F) charger + fuel-gauge data platform/x86: x86-android-tablets: Add Medion Lifetab S10346 data platform/x86: wireless-hotkey: use ACPI HID as phys platform/x86/intel/hid: Add module-params for 5 button array + SW_TABLET_MODE reporting platform/x86: ideapad-laptop: Make touchpad_ctrl_via_ec a module option platform/x86: ideapad-laptop: Stop writing VPCCMD_W_TOUCHPAD at probe time platform/x86: ideapad-laptop: Send KEY_TOUCHPAD_TOGGLE on some models platform/x86: ideapad-laptop: Only toggle ps2 aux port on/off on select models platform/x86: ideapad-laptop: Do not send KEY_TOUCHPAD* events on probe / resume platform/x86: ideapad-laptop: Refactor ideapad_sync_touchpad_state() tools/arch/x86: intel_sdsi: Add support for reading meter certificates tools/arch/x86: intel_sdsi: Add support for new GUID tools/arch/x86: intel_sdsi: Read more On Demand registers ...
2022-12-12Merge tag 'soc-dt-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds
Pull ARM SoC DT updates from Arnd Bergmann: "The devicetree changes contain exactly 1000 non-merge changesets, including a number of new arm64 SoC variants from Qualcomm and Apple, as well as the Renesas r9a07g043f/u chip in both arm64 and riscv variants. While we have occasionally merged support for non-arm SoCs in the past, this is now the normal path for riscv devicetree files. The most notable changes, by SoC platform, are: - The Apple T6000 (M1 Pro), T6001 (M1 Max) and T6002 (M1 Ultra) chips now have initial support. This is particularly nice as I am typing this on a T6002 Mac Studio with only a small number of driver patches. - Qualcomm MSM8996 Pro (Snapdragon 821), SM6115 (Snapdragon 662), SM4250 (Snapdragon 460), SM6375 (Snapdragon 695), SDM670 (Snapdragon 670), MSM8976 (Snapdragon 652) and MSM8956 (Snapdragon 650) are all mobile phone chips that are closely related to others we already support. Adding those helps support more phones and we add several models from Sony (Xperia 10 IV, 5 IV, X, and X compact), OnePlus (One, 3, 3T, and Nord N100), Xiaomi (Poco F1, Mi6), Huawei (Watch) and Google (Pixel 3a). There are also new variants of the Herobrine and Trogdor chromebook motherboards. SA8540P is an automotive SoC used in the Qdrive-3 development platform - Rockchips gains no new SoC variants, but a lot of new boards: three mobile gaming systems based on RK3326 Odroid-Go/rg351 family, two more Anbernic gaming systems based on RK3566 and a number of other RK356x based single-board computers. - Renesas RZ/G2UL (r9a07g043) was already supported for arm64, but as the newly added RZ/Five is based on the same design, this now gets reorganized in order to share most of the dts description between the two and add the RZ/Five SMARC EVK board support. Aside from that, there are the usual changes all over the tree: - New boards on other platforms contain two ASpeed BMC users, two Broadcom based Wifi routers, Zyxel NSA310S NAS, the i.MX6 based Kobo Aura2 ebook reader, two i.MX8 based development boards, two Uniphier Pro5 development boards, the STM32MP1 testbench board from DHCOR, the TI K3 based BeagleBone AI-64 board, and the Mediatek Helio X10 based Sony Xperia M5 phone. - The Starfive JH7100 source gets reorganized in order to support the VisionFive V1 board. - Minor updates and cleanups for Intel SoCFPGA, Marvell PXA168, TI, ST, NXP, Apple, Broadcom, Juno, Marvell MVEBU, at91, nuvoton, Tegra, Mediatek, Renesas, Hisilicon, Allwinner, Samsung, ux500, spear, ... The treewide cleanups now have a lot of fixes for cache nodes and other binding violoations. - Somewhat larger sets of reworks for NVIDIA Tegra, Qualcomm and Renesas platforms, adding a lot more on-chip device support - A rework of the way that DTB overlays are built" * tag 'soc-dt-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (979 commits) arm64: dts: apple: t6002: Fix GPU power domains arm64: dts: apple: t600x-pmgr: Fix search & replace typo arm64: dts: apple: Add t8103 L1/L2 cache properties and nodes arm64: dts: apple: Rename dart-sio* to sio-dart* arch: arm64: apple: t600x: Use standard "iommu" node name arch: arm64: apple: t8103: Use standard "iommu" node name ARM: dts: socfpga: Fix pca9548 i2c-mux node name dt-bindings: iio: adc: qcom,spmi-vadc: fix PM8350 define dt-bindings: iio: adc: qcom,spmi-vadc: extend example arm64: dts: qcom: sc8280xp: fix UFS DMA coherency arm64: dts: qcom: sc7280: Add DT for sc7280-herobrine-zombie arm64: dts: qcom: sm8250-sony-xperia-edo: fix no-mmc property for SDHCI arm64: dts: qcom: sdm845-sony-xperia-tama: fix no-mmc property for SDHCI arm64: dts: qcom: sda660-inforce-ifc6560: fix no-mmc property for SDHCI arm64: dts: qcom: sa8155p-adp: fix no-mmc property for SDHCI arm64: dts: qcom: qrb5165-rb: fix no-mmc property for SDHCI arm64: dts: qcom: sm8450: align MMC node names with dtschema arm64: dts: qcom: sc7180-trogdor: use generic node names arm64: dts: qcom: sm8450-hdk: add sound support arm64: dts: qcom: sm8450: add Soundwire and LPASS ...
2022-12-12Merge tag 'soc-drivers-6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC driver updates from Arnd Bergmann: "There are few major updates in the SoC specific drivers, mainly the usual reworks and support for variants of the existing SoC. While this remains Arm centric for the most part, the branch now also contains updates to risc-v and loongarch specific code in drivers/soc/. Notable changes include: - Support for the newly added Qualcomm Snapdragon variants (MSM8956, MSM8976, SM6115, SM4250, SM8150, SA8155 and SM8550) in the soc ID, rpmh, rpm, spm and powerdomain drivers. - Documentation for the somewhat controversial qcom,board-id properties that are required for booting a number of machines - A new SoC identification driver for the loongson-2 (loongarch) platform - memory controller updates for stm32, tegra, and renesas. - a new DT binding to better describe LPDDR2/3/4/5 chips in the memory controller subsystem - Updates for Tegra specific drivers across multiple subsystems, improving support for newer SoCs and better identification - Minor fixes for Broadcom, Freescale, Apple, Renesas, Sifive, TI, Mediatek and Marvell SoC drivers" * tag 'soc-drivers-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (137 commits) soc: qcom: socinfo: Add SM6115 / SM4250 SoC IDs to the soc_id table dt-bindings: arm: qcom,ids: Add SoC IDs for SM6115 / SM4250 and variants soc: qcom: socinfo: Add SM8150 and SA8155 SoC IDs to the soc_id table dt-bindings: arm: qcom,ids: Add SoC IDs for SM8150 and SA8155 dt-bindings: soc: qcom: apr: document generic qcom,apr compatible soc: qcom: Select REMAP_MMIO for ICC_BWMON driver soc: qcom: Select REMAP_MMIO for LLCC driver soc: qcom: rpmpd: Add SM4250 support dt-bindings: power: rpmpd: Add SM4250 support dt-bindings: soc: qcom: aoss: Add compatible for SM8550 soc: qcom: llcc: Add configuration data for SM8550 dt-bindings: arm: msm: Add LLCC compatible for SM8550 soc: qcom: llcc: Add v4.1 HW version support soc: qcom: socinfo: Add SM8550 ID soc: qcom: rpmh-rsc: Avoid unnecessary checks on irq-done response soc: qcom: rpmh-rsc: Add support for RSC v3 register offsets soc: qcom: rpmhpd: Add SM8550 power domains dt-bindings: power: rpmpd: Add SM8550 to rpmpd binding soc: qcom: socinfo: Add MSM8956/76 SoC IDs to the soc_id table dt-bindings: arm: qcom,ids: Add SoC IDs for MSM8956 and MSM8976 ...
2022-12-12Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "The highlights this time are support for dynamically enabling and disabling Clang's Shadow Call Stack at boot and a long-awaited optimisation to the way in which we handle the SVE register state on system call entry to avoid taking unnecessary traps from userspace. Summary: ACPI: - Enable FPDT support for boot-time profiling - Fix CPU PMU probing to work better with PREEMPT_RT - Update SMMUv3 MSI DeviceID parsing to latest IORT spec - APMT support for probing Arm CoreSight PMU devices CPU features: - Advertise new SVE instructions (v2.1) - Advertise range prefetch instruction - Advertise CSSC ("Common Short Sequence Compression") scalar instructions, adding things like min, max, abs, popcount - Enable DIT (Data Independent Timing) when running in the kernel - More conversion of system register fields over to the generated header CPU misfeatures: - Workaround for Cortex-A715 erratum #2645198 Dynamic SCS: - Support for dynamic shadow call stacks to allow switching at runtime between Clang's SCS implementation and the CPU's pointer authentication feature when it is supported (complete with scary DWARF parser!) Tracing and debug: - Remove static ftrace in favour of, err, dynamic ftrace! - Seperate 'struct ftrace_regs' from 'struct pt_regs' in core ftrace and existing arch code - Introduce and implement FTRACE_WITH_ARGS on arm64 to replace the old FTRACE_WITH_REGS - Extend 'crashkernel=' parameter with default value and fallback to placement above 4G physical if initial (low) allocation fails SVE: - Optimisation to avoid disabling SVE unconditionally on syscall entry and just zeroing the non-shared state on return instead Exceptions: - Rework of undefined instruction handling to avoid serialisation on global lock (this includes emulation of user accesses to the ID registers) Perf and PMU: - Support for TLP filters in Hisilicon's PCIe PMU device - Support for the DDR PMU present in Amlogic Meson G12 SoCs - Support for the terribly-named "CoreSight PMU" architecture from Arm (and Nvidia's implementation of said architecture) Misc: - Tighten up our boot protocol for systems with memory above 52 bits physical - Const-ify static keys to satisty jump label asm constraints - Trivial FFA driver cleanups in preparation for v1.1 support - Export the kernel_neon_* APIs as GPL symbols - Harden our instruction generation routines against instrumentation - A bunch of robustness improvements to our arch-specific selftests - Minor cleanups and fixes all over (kbuild, kprobes, kfence, PMU, ...)" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (151 commits) arm64: kprobes: Return DBG_HOOK_ERROR if kprobes can not handle a BRK arm64: kprobes: Let arch do_page_fault() fix up page fault in user handler arm64: Prohibit instrumentation on arch_stack_walk() arm64:uprobe fix the uprobe SWBP_INSN in big-endian arm64: alternatives: add __init/__initconst to some functions/variables arm_pmu: Drop redundant armpmu->map_event() in armpmu_event_init() kselftest/arm64: Allow epoll_wait() to return more than one result kselftest/arm64: Don't drain output while spawning children kselftest/arm64: Hold fp-stress children until they're all spawned arm64/sysreg: Remove duplicate definitions from asm/sysreg.h arm64/sysreg: Convert ID_DFR1_EL1 to automatic generation arm64/sysreg: Convert ID_DFR0_EL1 to automatic generation arm64/sysreg: Convert ID_AFR0_EL1 to automatic generation arm64/sysreg: Convert ID_MMFR5_EL1 to automatic generation arm64/sysreg: Convert MVFR2_EL1 to automatic generation arm64/sysreg: Convert MVFR1_EL1 to automatic generation arm64/sysreg: Convert MVFR0_EL1 to automatic generation arm64/sysreg: Convert ID_PFR2_EL1 to automatic generation arm64/sysreg: Convert ID_PFR1_EL1 to automatic generation arm64/sysreg: Convert ID_PFR0_EL1 to automatic generation ...
2022-12-12Merge tag 'hyperv-next-signed-20221208' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Drop unregister syscore from hyperv_cleanup to avoid hang (Gaurav Kohli) - Clean up panic path for Hyper-V framebuffer (Guilherme G. Piccoli) - Allow IRQ remapping to work without x2apic (Nuno Das Neves) - Fix comments (Olaf Hering) - Expand hv_vp_assist_page definition (Saurabh Sengar) - Improvement to page reporting (Shradha Gupta) - Make sure TSC clocksource works when Linux runs as the root partition (Stanislav Kinsburskiy) * tag 'hyperv-next-signed-20221208' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: Remove unregister syscore call from Hyper-V cleanup iommu/hyper-v: Allow hyperv irq remapping without x2apic clocksource: hyper-v: Add TSC page support for root partition clocksource: hyper-v: Use TSC PFN getter to map vvar page clocksource: hyper-v: Introduce TSC PFN getter clocksource: hyper-v: Introduce a pointer to TSC page x86/hyperv: Expand definition of struct hv_vp_assist_page PCI: hv: update comment in x86 specific hv_arch_irq_unmask hv: fix comment typo in vmbus_channel/low_latency drivers: hv, hyperv_fb: Untangle and refactor Hyper-V panic notifiers video: hyperv_fb: Avoid taking busy spinlock on panic path hv_balloon: Add support for configurable order free page reporting mm/page_reporting: Add checks for page_reporting_order param
2022-12-12Merge tag 'tpmdd-next-v6.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull tpm updates from Jarkko Sakkinen: "A random collection of TPM fixes and one bug fix for trusted keys" * tag 'tpmdd-next-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: tpm: st33zp24: remove pointless checks on probe tpm/tpm_crb: Fix error message in __crb_relinquish_locality() tpm/tpm_ftpm_tee: Fix error handling in ftpm_mod_init() tpm: tpm_tis: Add the missed acpi_put_table() to fix memory leak tpm: tpm_crb: Add the missed acpi_put_table() to fix memory leak tpm: acpi: Call acpi_put_table() to fix memory leak tpm: Add flag to use default cancellation policy tpm: tis_i2c: Fix sanity check interrupt enable mask KEYS: trusted: tee: Make registered shm dependency explicit tpm: Avoid function type cast of put_device() tpm: st33zp24: switch to using gpiod API tpm: st33zp24: drop support for platform data
2022-12-12Merge tag 'slab-for-6.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab updates from Vlastimil Babka: - SLOB deprecation and SLUB_TINY The SLOB allocator adds maintenance burden and stands in the way of API improvements [1]. Deprecate it by renaming the config option (to make users notice) to CONFIG_SLOB_DEPRECATED with updated help text. SLUB should be used instead as SLAB will be the next on the removal list. Based on reports from a riscv k210 board with 8MB RAM, add a CONFIG_SLUB_TINY option to minimize SLUB's memory usage at the expense of scalability. This has resolved the k210 regression [2] so in case there are no others (that wouldn't be resolvable by further tweaks to SLUB_TINY) plan is to remove SLOB in a few cycles. Existing defconfigs with CONFIG_SLOB are converted to CONFIG_SLUB_TINY. - kmalloc() slub_debug redzone improvements A series from Feng Tang that builds on the tracking or requested size for kmalloc() allocations (for caches with debugging enabled) added in 6.1, to make redzone checks consider the requested size and not the rounded up one, in order to catch more subtle buffer overruns. Includes new slub_kunit test. - struct slab fields reordering to accomodate larger rcu_head RCU folks would like to grow rcu_head with debugging options, which breaks current struct slab layout's assumptions, so reorganize it to make this possible. - Miscellaneous improvements/fixes: - __alloc_size checking compiler workaround (Kees Cook) - Optimize and cleanup SLUB's sysfs init (Rasmus Villemoes) - Make SLAB compatible with PROVE_RAW_LOCK_NESTING (Jiri Kosina) - Correct SLUB's percpu allocation estimates (Baoquan He) - Re-enableS LUB's run-time failslab sysfs control (Alexander Atanasov) - Make tools/vm/slabinfo more user friendly when not run as root (Rong Tao) - Dead code removal in SLUB (Hyeonggon Yoo) * tag 'slab-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (31 commits) mm, slob: rename CONFIG_SLOB to CONFIG_SLOB_DEPRECATED mm, slub: don't aggressively inline with CONFIG_SLUB_TINY mm, slub: remove percpu slabs with CONFIG_SLUB_TINY mm, slub: split out allocations from pre/post hooks mm/slub, kunit: Add a test case for kmalloc redzone check mm/slub, kunit: add SLAB_SKIP_KFENCE flag for cache creation mm, slub: refactor free debug processing mm, slab: ignore SLAB_RECLAIM_ACCOUNT with CONFIG_SLUB_TINY mm, slub: don't create kmalloc-rcl caches with CONFIG_SLUB_TINY mm, slub: lower the default slub_max_order with CONFIG_SLUB_TINY mm, slub: retain no free slabs on partial list with CONFIG_SLUB_TINY mm, slub: disable SYSFS support with CONFIG_SLUB_TINY mm, slub: add CONFIG_SLUB_TINY mm, slab: ignore hardened usercopy parameters when disabled slab: Remove special-casing of const 0 size allocations slab: Clean up SLOB vs kmalloc() definition mm/sl[au]b: rearrange struct slab fields to allow larger rcu_head mm/migrate: make isolate_movable_page() skip slab pages mm/slab: move and adjust kernel-doc for kmem_cache_alloc mm/slub, percpu: correct the calculation of early percpu allocation size ...
2022-12-12Merge tag 'printk-for-6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Add NMI-safe SRCU reader API. It uses atomic_inc() instead of this_cpu_inc() on strong load-store architectures. - Introduce new console_list_lock to synchronize a manipulation of the list of registered consoles and their flags. This is a first step in removing the big-kernel-lock-like behavior of console_lock(). This semaphore still serializes console->write() calbacks against: - each other. It primary prevents potential races between early and proper console drivers using the same device. - suspend()/resume() callbacks and init() operations in some drivers. - various other operations in the tty/vt and framebufer susbsystems. It is likely that console_lock() serializes even operations that are not directly conflicting with the console->write() callbacks here. This is the most complicated big-kernel-lock aspect of the console_lock() that will be hard to untangle. - Introduce new console_srcu lock that is used to safely iterate and access the registered console drivers under SRCU read lock. This is a prerequisite for introducing atomic console drivers and console kthreads. It will reduce the complexity of serialization against normal consoles and console_lock(). Also it should remove the risk of deadlock during critical situations, like Oops or panic, when only atomic consoles are registered. - Check whether the console is registered instead of enabled on many locations. It was a historical leftover. - Cleanly force a preferred console in xenfb code instead of a dirty hack. - A lot of code and comment clean ups and improvements. * tag 'printk-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (47 commits) printk: htmldocs: add missing description tty: serial: sh-sci: use setup() callback for early console printk: relieve console_lock of list synchronization duties tty: serial: kgdboc: use console_list_lock to trap exit tty: serial: kgdboc: synchronize tty_find_polling_driver() and register_console() tty: serial: kgdboc: use console_list_lock for list traversal tty: serial: kgdboc: use srcu console list iterator proc: consoles: use console_list_lock for list iteration tty: tty_io: use console_list_lock for list synchronization printk, xen: fbfront: create/use safe function for forcing preferred netconsole: avoid CON_ENABLED misuse to track registration usb: early: xhci-dbc: use console_is_registered() tty: serial: xilinx_uartps: use console_is_registered() tty: serial: samsung_tty: use console_is_registered() tty: serial: pic32_uart: use console_is_registered() tty: serial: earlycon: use console_is_registered() tty: hvc: use console_is_registered() efi: earlycon: use console_is_registered() tty: nfcon: use console_is_registered() serial_core: replace uart_console_enabled() with uart_console_registered() ...
2022-12-12Merge tag 'locks-v6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking updates from Jeff Layton: "The main change here is to add the new locks_inode_context helper, and convert all of the places that dereference inode->i_flctx directly to use that instead. There is a new helper to indicate whether any locks are held on an inode. This is mostly for Ceph but may be usable elsewhere too. Andi Kleen requested that we print the PID when the LOCK_MAND warning fires, to help track down applications trying to use it. Finally, we added some new warnings to some of the file locking functions that fire when the ->fl_file and filp arguments differ. This helped us find some long-standing bugs in lockd. Patches for those are in Chuck Lever's tree and should be in his v6.2 PR. After that patch, people using NFSv2/v3 locking may see some warnings fire until those go in. Happy Holidays!" * tag 'locks-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: Add process name and pid to locks warning nfsd: use locks_inode_context helper nfs: use locks_inode_context helper lockd: use locks_inode_context helper ksmbd: use locks_inode_context helper cifs: use locks_inode_context helper ceph: use locks_inode_context helper filelock: add a new locks_inode_context accessor function filelock: new helper: vfs_inode_has_locks filelock: WARN_ON_ONCE when ->fl_file and filp don't match