summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2024-08-21net: repack struct netdev_queueJakub Kicinski
Adding the NAPI pointer to struct netdev_queue made it grow into another cacheline, even though there was 44 bytes of padding available. The struct was historically grouped as follows: /* read-mostly stuff (align) */ /* ... random control path fields ... */ /* write-mostly stuff (align) */ /* ... 40 byte hole ... */ /* struct dql (align) */ It seems that people want to add control path fields after the read only fields. struct dql looks pretty innocent but it forces its own alignment and nothing indicates that there is a lot of empty space above it. Move dql above the xmit_lock. This shifts the empty space to the end of the struct rather than in the middle of it. Move two example fields there to set an example. Hopefully people will now add new fields at the end of the struct. A lot of the read-only stuff is also control path-only, but if we move it all we'll have another hole in the middle. Before: /* size: 384, cachelines: 6, members: 16 */ /* sum members: 284, holes: 3, sum holes: 100 */ After: /* size: 320, cachelines: 5, members: 16 */ /* sum members: 284, holes: 1, sum holes: 8 */ Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240820205119.1321322-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-21bpf: extract iterator argument type and name validation logicAndrii Nakryiko
Verifier enforces that all iterator structs are named `bpf_iter_<name>` and that whenever iterator is passed to a kfunc it's passed as a valid PTR -> STRUCT chain (with potentially const modifiers in between). We'll need this check for upcoming changes, so instead of duplicating the logic, extract it into a helper function. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240808232230.2848712-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-08-21workqueue: Fix another htmldocs build warningTejun Heo
Fix htmldocs build warning introduced by 9b59a85a84dc ("workqueue: Don't call va_start / va_end twice"). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Matthew Brost <matthew.brost@intel.com>
2024-08-21soc: qcom: pmic_glink: Fix race during initializationBjorn Andersson
As pointed out by Stephen Boyd it is possible that during initialization of the pmic_glink child drivers, the protection-domain notifiers fires, and the associated work is scheduled, before the client registration returns and as a result the local "client" pointer has been initialized. The outcome of this is a NULL pointer dereference as the "client" pointer is blindly dereferenced. Timeline provided by Stephen: CPU0 CPU1 ---- ---- ucsi->client = NULL; devm_pmic_glink_register_client() client->pdr_notify(client->priv, pg->client_state) pmic_glink_ucsi_pdr_notify() schedule_work(&ucsi->register_work) <schedule away> pmic_glink_ucsi_register() ucsi_register() pmic_glink_ucsi_read_version() pmic_glink_ucsi_read() pmic_glink_ucsi_read() pmic_glink_send(ucsi->client) <client is NULL BAD> ucsi->client = client // Too late! This code is identical across the altmode, battery manager and usci child drivers. Resolve this by splitting the allocation of the "client" object and the registration thereof into two operations. This only happens if the protection domain registry is populated at the time of registration, which by the introduction of commit '1ebcde047c54 ("soc: qcom: add pd-mapper implementation")' became much more likely. Reported-by: Amit Pundir <amit.pundir@linaro.org> Closes: https://lore.kernel.org/all/CAMi1Hd2_a7TjA7J9ShrAbNOd_CoZ3D87twmO5t+nZxC9sX18tA@mail.gmail.com/ Reported-by: Johan Hovold <johan@kernel.org> Closes: https://lore.kernel.org/all/ZqiyLvP0gkBnuekL@hovoldconsulting.com/ Reported-by: Stephen Boyd <swboyd@chromium.org> Closes: https://lore.kernel.org/all/CAE-0n52JgfCBWiFQyQWPji8cq_rCsviBpW-m72YitgNfdaEhQg@mail.gmail.com/ Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver") Cc: stable@vger.kernel.org Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Tested-by: Amit Pundir <amit.pundir@linaro.org> Reviewed-by: Johan Hovold <johan+linaro@kernel.org> Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com> Tested-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com> Link: https://lore.kernel.org/r/20240820-pmic-glink-v6-11-races-v3-1-eec53c750a04@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2024-08-21printk: nbcon: Implement emergency sectionsThomas Gleixner
In emergency situations (something has gone wrong but the system continues to operate), usually important information (such as a backtrace) is generated via printk(). This information should be pushed out to the consoles ASAP. Add per-CPU emergency nesting tracking because an emergency can arise while in an emergency situation. Add functions to mark the beginning and end of emergency sections where the urgent messages are generated. Perform direct console flushing at the emergency priority if the current CPU is in an emergency state and it is safe to do so. Note that the emergency state is not system-wide. While one CPU is in an emergency state, another CPU may attempt to print console messages at normal priority. Also note that printk() already attempts to flush consoles in the caller context for normal priority. However, follow-up changes will introduce printing kthreads, in which case the normal priority printk() calls will offload to the kthreads. Co-developed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-32-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: Coordinate direct printing in panicJohn Ogness
If legacy and nbcon consoles are registered and the nbcon consoles are allowed to flush (i.e. no boot consoles registered), the legacy consoles will no longer perform direct printing on the panic CPU until after the backtrace has been stored. This will give the safe nbcon consoles a chance to print the panic messages before allowing the unsafe legacy consoles to print. If no nbcon consoles are registered or they are not allowed to flush because boot consoles are registered, there is no change in behavior (i.e. legacy consoles will always attempt to print from the printk() caller context). Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-30-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: nbcon: Add unsafe flushing on panicJohn Ogness
Add nbcon_atomic_flush_unsafe() to flush all nbcon consoles using the write_atomic() callback and allowing unsafe hostile takeovers. Call this at the end of panic() as a final attempt to flush any pending messages. Note that legacy consoles use unsafe methods for flushing from the beginning of panic (see bust_spinlocks()). Therefore, systems using both legacy and nbcon consoles may still fail to see panic messages due to unsafe legacy console usage. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-27-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21serial: core: Acquire nbcon context in port->lock wrapperJohn Ogness
Currently the port->lock wrappers uart_port_lock(), uart_port_unlock() (and their variants) only lock/unlock the spin_lock. If the port is an nbcon console that has implemented the write_atomic() callback, the wrappers must also acquire/release the console context and mark the region as unsafe. This allows general port->lock synchronization to be synchronized against the nbcon write_atomic() callback. Note that __uart_port_using_nbcon() relies on the port->lock being held while a console is added and removed from the console list (i.e. all uart nbcon drivers *must* take the port->lock in their device_lock() callbacks). Signed-off-by: John Ogness <john.ogness@linutronix.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-15-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21nbcon: Add API to acquire context for non-printing operationsJohn Ogness
Provide functions nbcon_device_try_acquire() and nbcon_device_release() which will try to acquire the nbcon console ownership with NBCON_PRIO_NORMAL and mark it unsafe for handover/takeover. These functions are to be used together with the device-specific locking when performing non-printing activities on the console device. They will allow synchronization against the atomic_write() callback which will be serialized, for higher priority contexts, only by acquiring the console context ownership. Pitfalls: The API requires to be called in a context with migration disabled because it uses per-CPU variables internally. The context is set unsafe for a takeover all the time. It guarantees full serialization against any atomic_write() caller except for the final flush in panic() which might try an unsafe takeover. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-14-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21console: Improve console_srcu_read_flags() commentsJohn Ogness
It was not clear when exactly console_srcu_read_flags() must be used vs. directly reading @console->flags. Refactor and clarify that console_srcu_read_flags() is only needed if the console is registered or the caller is in a context where the registration status of the console may change (due to another context). The function requires the caller holds @console_srcu, which will ensure that the caller sees an appropriate @flags value for the registered console and that exit/cleanup routines will not run if the console is in the process of unregistration. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-13-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21serial: core: Introduce wrapper to set @uart_port->consJohn Ogness
Introduce uart_port_set_cons() as a wrapper to set @cons of a uart_port. The wrapper sets @cons under the port lock in order to prevent @cons from disappearing while another context is holding the port lock. This is necessary for a follow-up commit relating to the port lock wrappers, which rely on @cons not changing between lock and unlock. Signed-off-by: John Ogness <john.ogness@linutronix.de> Tested-by: Théo Lebrun <theo.lebrun@bootlin.com> # EyeQ5, AMBA-PL011 Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20240820063001.36405-12-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21serial: core: Provide low-level functions to lock portJohn Ogness
It will be necessary at times for the uart nbcon console drivers to acquire the port lock directly (without the additional nbcon functionality of the port lock wrappers). These are special cases such as the implementation of the device_lock()/device_unlock() callbacks or for internal port lock wrapper synchronization. Provide low-level variants __uart_port_lock_irqsave() and __uart_port_unlock_irqrestore() for this purpose. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20240820063001.36405-11-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: nbcon: Add callbacks to synchronize with driverJohn Ogness
Console drivers typically must deal with access to the hardware via user input/output (such as an interactive login shell) and output of kernel messages via printk() calls. To provide the necessary synchronization, usually some driver-specific locking mechanism is used (for example, the port spinlock for uart serial consoles). Until now, usage of this driver-specific locking has been hidden from the printk-subsystem and implemented within the various console callbacks. However, nbcon consoles would need to use it even in the generic code. Add device_lock() and device_unlock() callback which will need to get implemented by nbcon consoles. The callbacks will use whatever synchronization mechanism the driver is using for itself. The minimum requirement is to prevent CPU migration. It would allow a context friendly acquiring of nbcon console ownership in non-emergency and non-panic context. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-9-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: nbcon: Add detailed doc for write_atomic()John Ogness
The write_atomic() callback has special requirements and is allowed to use special helper functions. Provide detailed documentation of the callback so that a developer has a chance of implementing it correctly. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-8-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: nbcon: Remove return value for write_atomic()John Ogness
The return value of write_atomic() does not provide any useful information. On the contrary, it makes things more complicated for the caller to appropriately deal with the information. Change write_atomic() to not have a return value. If the message did not get printed due to loss of ownership, the caller will notice this on its own. If ownership was not lost, it will be assumed that the driver successfully printed the message and the sequence number for that console will be incremented. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-7-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21printk: Check printk_deferred_enter()/_exit() usageSebastian Andrzej Siewior
Add validation that printk_deferred_enter()/_exit() are called in non-migration contexts. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240820063001.36405-5-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-08-21dm: Remove unused declaration dm_get_rq_mapinfo()Yue Haibing
Commit ae6ad75e5c3c ("dm: remove unused dm_get_rq_mapinfo()") removed the implementation but leave declaration. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2024-08-21f2fs: clean up val{>>,<<}F2FS_BLKSIZE_BITSZhiguo Niu
Use F2FS_BYTES_TO_BLK(bytes) and F2FS_BLK_TO_BYTES(blk) for cleanup Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-20workqueue: Don't call va_start / va_end twiceMatthew Brost
Calling va_start / va_end multiple times is undefined and causes problems with certain compiler / platforms. Change alloc_ordered_workqueue_lockdep_map to a macro and updated __alloc_workqueue to take a va_list argument. Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-08-20Merge branch 'tip/sched/core' into for-6.12Tejun Heo
To receive 863ccdbb918a ("sched: Allow sched_class::dequeue_task() to fail") which makes sched_class.dequeue_task() return bool instead of void. This leads to compile breakage and will be fixed by a follow-up patch. Signed-off-by: Tejun Heo <tj@kernel.org>
2024-08-20kernel: Add helper macros for loop unrollingKP Singh
This helps in easily initializing blocks of code (e.g. static calls and keys). UNROLL(N, MACRO, __VA_ARGS__) calls MACRO N times with the first argument as the index of the iteration. This allows string pasting to create unique tokens for variable names, function calls etc. As an example: #include <linux/unroll.h> #define MACRO(N, a, b) \ int add_##N(int a, int b) \ { \ return a + b + N; \ } UNROLL(2, MACRO, x, y) expands to: int add_0(int x, int y) { return x + y + 0; } int add_1(int x, int y) { return x + y + 1; } Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: John Johansen <john.johansen@canonical.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Song Liu <song@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Nacked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: KP Singh <kpsingh@kernel.org> Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-08-20fsverity: expose verified fsverity built-in signatures to LSMsFan Wu
This patch enhances fsverity's capabilities to support both integrity and authenticity protection by introducing the exposure of built-in signatures through a new LSM hook. This functionality allows LSMs, e.g. IPE, to enforce policies based on the authenticity and integrity of files, specifically focusing on built-in fsverity signatures. It enables a policy enforcement layer within LSMs for fsverity, offering granular control over the usage of authenticity claims. For instance, a policy could be established to only permit the execution of all files with verified built-in fsverity signatures. The introduction of a security_inode_setintegrity() hook call within fsverity's workflow ensures that the verified built-in signature of a file is exposed to LSMs. This enables LSMs to recognize and label fsverity files that contain a verified built-in fsverity signature. This hook is invoked subsequent to the fsverity_verify_signature() process, guaranteeing the signature's verification against fsverity's keyring. This mechanism is crucial for maintaining system security, as it operates in kernel space, effectively thwarting attempts by malicious binaries to bypass user space stack interactions. The second to last commit in this patch set will add a link to the IPE documentation in fsverity.rst. Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com> Signed-off-by: Fan Wu <wufan@linux.microsoft.com> Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-08-20lsm: add security_inode_setintegrity() hookFan Wu
This patch introduces a new hook to save inode's integrity data. For example, for fsverity enabled files, LSMs can use this hook to save the existence of verified fsverity builtin signature into the inode's security blob, and LSMs can make access decisions based on this data. Signed-off-by: Fan Wu <wufan@linux.microsoft.com> [PM: subject line tweak, removed changelog] Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-08-20dm-verity: expose root hash digest and signature data to LSMsDeven Bowers
dm-verity provides a strong guarantee of a block device's integrity. As a generic way to check the integrity of a block device, it provides those integrity guarantees to its higher layers, including the filesystem level. However, critical security metadata like the dm-verity roothash and its signing information are not easily accessible to the LSMs. To address this limitation, this patch introduces a mechanism to store and manage these essential security details within a newly added LSM blob in the block_device structure. This addition allows LSMs to make access control decisions on the integrity data stored within the block_device, enabling more flexible security policies. For instance, LSMs can now revoke access to dm-verity devices based on their roothashes, ensuring that only authorized and verified content is accessible. Additionally, LSMs can enforce policies to only allow files from dm-verity devices that have a valid digital signature to execute, effectively blocking any unsigned files from execution, thus enhancing security against unauthorized modifications. The patch includes new hook calls, `security_bdev_setintegrity()`, in dm-verity to expose the dm-verity roothash and the roothash signature to LSMs via preresume() callback. By using the preresume() callback, it ensures that the security metadata is consistently in sync with the metadata of the dm-verity target in the current active mapping table. The hook calls are depended on CONFIG_SECURITY. Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com> Signed-off-by: Fan Wu <wufan@linux.microsoft.com> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> [PM: moved sig_size field as discussed] Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-08-20block,lsm: add LSM blob and new LSM hooks for block devicesDeven Bowers
This patch introduces a new LSM blob to the block_device structure, enabling the security subsystem to store security-sensitive data related to block devices. Currently, for a device mapper's mapped device containing a dm-verity target, critical security information such as the roothash and its signing state are not readily accessible. Specifically, while the dm-verity volume creation process passes the dm-verity roothash and its signature from userspace to the kernel, the roothash is stored privately within the dm-verity target, and its signature is discarded post-verification. This makes it extremely hard for the security subsystem to utilize these data. With the addition of the LSM blob to the block_device structure, the security subsystem can now retain and manage important security metadata such as the roothash and the signing state of a dm-verity by storing them inside the blob. Access decisions can then be based on these stored data. The implementation follows the same approach used for security blobs in other structures like struct file, struct inode, and struct superblock. The initialization of the security blob occurs after the creation of the struct block_device, performed by the security subsystem. Similarly, the security blob is freed by the security subsystem before the struct block_device is deallocated or freed. This patch also introduces a new hook security_bdev_setintegrity() to save block device's integrity data to the new LSM blob. For example, for dm-verity, it can use this hook to expose its roothash and signing state to LSMs, then LSMs can save these data into the LSM blob. Please note that the new hook should be invoked every time the security information is updated to keep these data current. For example, in dm-verity, if the mapping table is reloaded and configured to use a different dm-verity target with a new roothash and signing information, the previously stored data in the LSM blob will become obsolete. It is crucial to re-invoke the hook to refresh these data and ensure they are up to date. This necessity arises from the design of device-mapper, where a device-mapper device is first created, and then targets are subsequently loaded into it. These targets can be modified multiple times during the device's lifetime. Therefore, while the LSM blob is allocated during the creation of the block device, its actual contents are not initialized at this stage and can change substantially over time. This includes alterations from data that the LSM 'trusts' to those it does not, making it essential to handle these changes correctly. Failure to address this dynamic aspect could potentially allow for bypassing LSM checks. Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com> Signed-off-by: Fan Wu <wufan@linux.microsoft.com> [PM: merge fuzz, subject line tweaks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-08-20lsm: add new securityfs delete functionFan Wu
When deleting a directory in the security file system, the existing securityfs_remove requires the directory to be empty, otherwise it will do nothing. This leads to a potential risk that the security file system might be in an unclean state when the intended deletion did not happen. This commit introduces a new function securityfs_recursive_remove to recursively delete a directory without leaving an unclean state. Co-developed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Fan Wu <wufan@linux.microsoft.com> [PM: subject line tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-08-20initramfs,lsm: add a security hook to do_populate_rootfs()Fan Wu
This patch introduces a new hook to notify security system that the content of initramfs has been unpacked into the rootfs. Upon receiving this notification, the security system can activate a policy to allow only files that originated from the initramfs to execute or load into kernel during the early stages of booting. This approach is crucial for minimizing the attack surface by ensuring that only trusted files from the initramfs are operational in the critical boot phase. Signed-off-by: Fan Wu <wufan@linux.microsoft.com> [PM: subject line tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-08-20softirq: Remove unused 'action' parameter from action callbackCaleb Sander Mateos
When soft interrupt actions are called, they are passed a pointer to the struct softirq action which contains the action's function pointer. This pointer isn't useful, as the action callback already knows what function it is. And since each callback handles a specific soft interrupt, the callback also knows which soft interrupt number is running. No soft interrupt action callback actually uses this parameter, so remove it from the function pointer signature. This clarifies that soft interrupt actions are global routines and makes it slightly cheaper to call them. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/all/20240815171549.3260003-1-csander@purestorage.com
2024-08-20firmware: arm_ffa: Add support for FFA_MSG_SEND_DIRECT_{REQ,RESP}2Sudeep Holla
FFA_MSG_SEND_DIRECT_{REQ,RESP} supported only x3-x7 to pass implementation defined values as part of the message. This may not be sufficient sometimes and also it would be good to use all the registers supported by SMCCC v1.2 (x0-x17) for such register based communication. Also another limitation with the FFA_MSG_SEND_DIRECT_{REQ,RESP} is the ability to target a specific service within the partition based on it's UUID. In order to address both of the above limitation, FF-A v1.2 introduced FFA_MSG_SEND_DIRECT_{REQ,RESP}2 which has the ability to target the message to a specific service based on its UUID within a partition as well as utilise all the available registers(x4-x17 specifically) for the communication. This change adds support for FFA_MSG_SEND_DIRECT_REQ2 and FFA_MSG_SEND_DIRECT_RESP2. Message-Id: <20240820-ffa_v1-2-v2-5-18c0c5f3c65e@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2024-08-20firmware: arm_ffa: Update the FF-A command list with v1.2 additionsSudeep Holla
Arm Firmware Framework for A-profile(FFA) v1.2 introduces register based discovery mechanism and direct messaging extensions that enables to target specific UUID within a partition. Let us add all the newly supported FF-A function IDs in the spec. Also update to the error values and associated handling. Message-Id: <20240820-ffa_v1-2-v2-2-18c0c5f3c65e@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2024-08-20coresight: Make trace ID map spinlock local to the mapJames Clark
Reduce contention on the lock by replacing the global lock with one for each map. Signed-off-by: James Clark <james.clark@arm.com> Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20240722101202.26915-18-james.clark@linaro.org
2024-08-20coresight: Emit sink ID in the HW_ID packetsJames Clark
For Perf to be able to decode when per-sink trace IDs are used, emit the sink that's being written to for each ETM. Perf currently errors out if it sees a newer packet version so instead of bumping it, add a new minor version field. This can be used to signify new versions that have backwards compatible fields. Considering this change is only for high core count machines, it doesn't make sense to make a breaking change for everyone. Signed-off-by: James Clark <james.clark@arm.com> Tested-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20240722101202.26915-17-james.clark@linaro.org
2024-08-20coresight: Remove pending trace ID release mechanismJames Clark
Pending the release of IDs was a way of managing concurrent sysfs and Perf sessions in a single global ID map. Perf may have finished while sysfs hadn't, and Perf shouldn't release the IDs in use by sysfs and vice versa. Now that Perf uses its own exclusive ID maps, pending release doesn't result in any different behavior than just releasing all IDs when the last Perf session finishes. As part of the per-sink trace ID change, we would have still had to make the pending mechanism work on a per-sink basis, due to the overlapping ID allocations, so instead of making that more complicated, just remove it. Signed-off-by: James Clark <james.clark@arm.com> Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20240722101202.26915-16-james.clark@linaro.org
2024-08-20coresight: Use per-sink trace ID maps for Perf sessionsJames Clark
This will allow sessions with more than CORESIGHT_TRACE_IDS_MAX ETMs as long as there are fewer than that many ETMs connected to each sink. Each sink owns its own trace ID map, and any Perf session connecting to that sink will allocate from it, even if the sink is currently in use by other users. This is similar to the existing behavior where the dynamic trace IDs are constant as long as there is any concurrent Perf session active. It's not completely optimal because slightly more IDs will be used than necessary, but the optimal solution involves tracking the PIDs of each session and allocating ID maps based on the session owner. This is difficult to do with the combination of per-thread and per-cpu modes and some scheduling issues. The complexity of this isn't likely to worth it because even with multiple users they'd just see a difference in the ordering of ID allocations rather than hitting any limits (unless the hardware does have too many ETMs connected to one sink). Signed-off-by: James Clark <james.clark@arm.com> Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20240722101202.26915-15-james.clark@linaro.org
2024-08-20coresight: Make CPU id map a property of a trace ID mapJames Clark
The global CPU ID mappings won't work for per-sink ID maps so move it to the ID map struct. coresight_trace_id_release_all_pending() is hard coded to operate on the default map, but once Perf sessions use their own maps the pending release mechanism will be deleted. So it doesn't need to be extended to accept a trace ID map argument at this point. Signed-off-by: James Clark <james.clark@arm.com> Reviewed-by: Mike Leach <mike.leach@linaro.org> Tested-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20240722101202.26915-14-james.clark@linaro.org
2024-08-20coresight: Move struct coresight_trace_id_map to common headerJames Clark
The trace ID maps will need to be created and stored by the core and Perf code so move the definition up to the common header. Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Tested-by: Leo Yan <leo.yan@arm.com> Tested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20240722101202.26915-12-james.clark@linaro.org
2024-08-20x86/kaslr: Expose and use the end of the physical memory address spaceThomas Gleixner
iounmap() on x86 occasionally fails to unmap because the provided valid ioremap address is not below high_memory. It turned out that this happens due to KASLR. KASLR uses the full address space between PAGE_OFFSET and vaddr_end to randomize the starting points of the direct map, vmalloc and vmemmap regions. It thereby limits the size of the direct map by using the installed memory size plus an extra configurable margin for hot-plug memory. This limitation is done to gain more randomization space because otherwise only the holes between the direct map, vmalloc, vmemmap and vaddr_end would be usable for randomizing. The limited direct map size is not exposed to the rest of the kernel, so the memory hot-plug and resource management related code paths still operate under the assumption that the available address space can be determined with MAX_PHYSMEM_BITS. request_free_mem_region() allocates from (1 << MAX_PHYSMEM_BITS) - 1 downwards. That means the first allocation happens past the end of the direct map and if unlucky this address is in the vmalloc space, which causes high_memory to become greater than VMALLOC_START and consequently causes iounmap() to fail for valid ioremap addresses. MAX_PHYSMEM_BITS cannot be changed for that because the randomization does not align with address bit boundaries and there are other places which actually require to know the maximum number of address bits. All remaining usage sites of MAX_PHYSMEM_BITS have been analyzed and found to be correct. Cure this by exposing the end of the direct map via PHYSMEM_END and use that for the memory hot-plug and resource management related places instead of relying on MAX_PHYSMEM_BITS. In the KASLR case PHYSMEM_END maps to a variable which is initialized by the KASLR initialization and otherwise it is based on MAX_PHYSMEM_BITS as before. To prevent future hickups add a check into add_pages() to catch callers trying to add memory above PHYSMEM_END. Fixes: 0483e1fa6e09 ("x86/mm: Implement ASLR for kernel memory regions") Reported-by: Max Ramanouski <max8rr8@gmail.com> Reported-by: Alistair Popple <apopple@nvidia.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-By: Max Ramanouski <max8rr8@gmail.com> Tested-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Kees Cook <kees@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/87ed6soy3z.ffs@tglx
2024-08-20PM: domains: add device managed version of dev_pm_domain_attach|detach_list()Dikshita Agarwal
Add the devres-enabled version of dev_pm_domain_attach|detach_list. If client drivers use devm_pm_domain_attach_list() to attach the PM domains, devm_pm_domain_detach_list() will be invoked implicitly during remove phase. Signed-off-by: Dikshita Agarwal <quic_dikshita@quicinc.com> Link: https://lore.kernel.org/r/1724063350-11993-2-git-send-email-quic_dikshita@quicinc.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-08-20net: add copy from skb_seq_state to buffer functionChristian Hopps
Add an skb helper function to copy a range of bytes from within an existing skb_seq_state. Signed-off-by: Christian Hopps <chopps@labn.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-08-19bpf: Allow bpf_current_task_under_cgroup() with BPF_CGROUP_*Matteo Croce
The helper bpf_current_task_under_cgroup() currently is only allowed for tracing programs, allow its usage also in the BPF_CGROUP_* program types. Move the code from kernel/trace/bpf_trace.c to kernel/bpf/helpers.c, so it compiles also without CONFIG_BPF_EVENTS. This will be used in systemd-networkd to monitor the sysctl writes, and filter it's own writes from others: https://github.com/systemd/systemd/pull/32212 Signed-off-by: Matteo Croce <teknoraver@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240819162805.78235-3-technoboy85@gmail.com
2024-08-19workqueue: Fix htmldocs build warningTejun Heo
Fix htmldocs build warning introduced by ec0a7d44b358 ("workqueue: Add interface for user-defined workqueue lockdep map"). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Matthew Brost <matthew.brost@intel.com>
2024-08-19ALSA/ASoC/SoundWire: Intel: update maximum numberMark Brown
Merge series from Bard Liao <yung-chuan.liao@linux.intel.com>: Intel new platforms can have up to 5 SoundWire links. This series does not apply to SoundWire tree due to recent changes in machine driver. Can we go via ASoC tree with Vinod's Acked-by tag?
2024-08-19x86: support user address masking instead of non-speculative conditionalLinus Torvalds
The Spectre-v1 mitigations made "access_ok()" much more expensive, since it has to serialize execution with the test for a valid user address. All the normal user copy routines avoid this by just masking the user address with a data-dependent mask instead, but the fast "unsafe_user_read()" kind of patterms that were supposed to be a fast case got slowed down. This introduces a notion of using src = masked_user_access_begin(src); to do the user address sanity using a data-dependent mask instead of the more traditional conditional if (user_read_access_begin(src, len)) { model. This model only works for dense accesses that start at 'src' and on architectures that have a guard region that is guaranteed to fault in between the user space and the kernel space area. With this, the user access doesn't need to be manually checked, because a bad address is guaranteed to fault (by some architecture masking trick: on x86-64 this involves just turning an invalid user address into all ones, since we don't map the top of address space). This only converts a couple of examples for now. Example x86-64 code generation for loading two words from user space: stac mov %rax,%rcx sar $0x3f,%rcx or %rax,%rcx mov (%rcx),%r13 mov 0x8(%rcx),%r14 clac where all the error handling and -EFAULT is now purely handled out of line by the exception path. Of course, if the micro-architecture does badly at 'clac' and 'stac', the above is still pitifully slow. But at least we did as well as we could. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-08-19Merge tag 'printk-for-6.11-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk fix from Petr Mladek: - Do not block printk on non-panic CPUs when they are dumping backtraces * tag 'printk-for-6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk/panic: Allow cpu backtraces to be written into ringbuffer during panic
2024-08-19block: Drop NULL check in bdev_write_zeroes_sectors()John Garry
Function bdev_get_queue() must not return NULL, so drop the check in bdev_write_zeroes_sectors(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Link: https://lore.kernel.org/r/20240815163228.216051-3-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-08-19Merge back thermal core material for 6.12.Rafael J. Wysocki
2024-08-19percpu-rwsem: remove the unused parameter 'read'Wang Long
In the function percpu_rwsem_release, the parameter `read` is unused, so remove it. Signed-off-by: Wang Long <w@laoqinren.net> Link: https://lore.kernel.org/r/20240802091901.2546797-1-w@laoqinren.net Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-19fs: don't flush in-flight wb switches for superblocks without cgroup writebackHaifeng Xu
When deactivating any type of superblock, it had to wait for the in-flight wb switches to be completed. wb switches are executed in inode_switch_wbs_work_fn() which needs to acquire the wb_switch_rwsem and races against sync_inodes_sb(). If there are too much dirty data in the superblock, the waiting time may increase significantly. For superblocks without cgroup writeback such as tmpfs, they have nothing to do with the wb swithes, so the flushing can be avoided. Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com> Link: https://lore.kernel.org/r/20240726030525.180330-1-haifeng.xu@shopee.com Reviewed-by: Jan Kara <jack@suse.cz> Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-19soundwire: intel: increase maximum number of linksPierre-Louis Bossart
Intel platforms have enabled 4 links since the beginning, newer platforms now have 5 links. Update the definition accordingly. This patch will have no effect on older platforms where the number of links was hard-coded. A follow-up patch will add a dynamic check that the ACPI-reported information is aligned with hardware capabilities on newer platforms. Acked-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Acked-by: Mark Brown <broonie@kernel.org> Reviewed-by: Takashi Iwai <tiwai@suse.de> Link: https://patch.msgid.link/20240819005548.5867-4-yung-chuan.liao@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-08-19soundwire: intel: add probe-time check on link idPierre-Louis Bossart
In older platforms, the number of links was constant and hard-coded to 4. Newer platforms can have varying number of links, so we need to add a probe-time check to make sure the ACPI-reported information with _DSD properties is aligned with hardware capabilities reported in the SoundWire LCAP register. Acked-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Acked-by: Mark Brown <broonie@kernel.org> Reviewed-by: Takashi Iwai <tiwai@suse.de> Link: https://patch.msgid.link/20240819005548.5867-3-yung-chuan.liao@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>