summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2025-03-06net: replace dev_addr_sem with netdev instance lockStanislav Fomichev
Lockdep reports possible circular dependency in [0]. Instead of fixing the ordering, replace global dev_addr_sem with netdev instance lock. Most of the paths that set/get mac are RTNL protected. Two places where it's not, convert to explicit locking: - sysfs address_show - dev_get_mac_address via dev_ioctl 0: https://netdev-3.bots.linux.dev/vmksft-forwarding-dbg/results/993321/24-router-bridge-1d-lag-sh/stderr Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-12-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: hold netdev instance lock during ndo_bpfStanislav Fomichev
Cover the paths that come via bpf system call and XSK bind. Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-10-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: hold netdev instance lock during sysfs operationsStanislav Fomichev
Most of them are already covered by the converted dev_xxx APIs. Add the locking wrappers for the remaining ones. Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-9-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: hold netdev instance lock during ioctl operationsStanislav Fomichev
Convert all ndo_eth_ioctl invocations to dev_eth_ioctl which does the locking. Reflow some of the dev_siocxxx to drop else clause. Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-8-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: hold netdev instance lock during rtnetlink operationsStanislav Fomichev
To preserve the atomicity, hold the lock while applying multiple attributes. The major issue with a full conversion to the instance lock are software nesting devices (bonding/team/vrf/etc). Those devices call into the core stack for their lower (potentially real hw) devices. To avoid explicitly wrapping all those places into instance lock/unlock, introduce new API boundaries: - (some) existing dev_xxx calls are now considered "external" (to drivers) APIs and they transparently grab the instance lock if needed (dev_api.c) - new netif_xxx calls are internal core stack API (naming is sketchy, I've tried netdev_xxx_locked per Jakub's suggestion, but it feels a bit verbose; but happy to get back to this naming scheme if this is the preference) This avoids touching most of the existing ioctl/sysfs/drivers paths. Note the special handling of ndo_xxx_slave operations: I exploit the fact that none of the drivers that call these functions need/use instance lock. At the same time, they use dev_xxx APIs, so the lower device has to be unlocked. Changes in unregister_netdevice_many_notify (to protect dev->state with instance lock) trigger lockdep - the loop over close_list (mostly from cleanup_net) introduces spurious ordering issues. netdev_lock_cmp_fn has a justification on why it's ok to suppress for now. Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-7-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: hold netdev instance lock during queue operationsStanislav Fomichev
For the drivers that use queue management API, switch to the mode where core stack holds the netdev instance lock. This affects the following drivers: - bnxt - gve - netdevsim Originally I locked only start/stop, but switched to holding the lock over all iterations to make them look atomic to the device (feels like it should be easier to reason about). Reviewed-by: Eric Dumazet <edumazet@google.com> Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-6-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: hold netdev instance lock during nft ndo_setup_tcStanislav Fomichev
Introduce new dev_setup_tc for nft ndo_setup_tc paths. Reviewed-by: Eric Dumazet <edumazet@google.com> Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-3-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: hold netdev instance lock during ndo_open/ndo_stopStanislav Fomichev
For the drivers that use shaper API, switch to the mode where core stack holds the netdev lock. This affects two drivers: * iavf - already grabs netdev lock in ndo_open/ndo_stop, so mostly remove these * netdevsim - switch to _locked APIs to avoid deadlock iavf_close diff is a bit confusing, the existing call looks like this: iavf_close() { netdev_lock() .. netdev_unlock() wait_event_timeout(down_waitqueue) } I change it to the following: netdev_lock() iavf_close() { .. netdev_unlock() wait_event_timeout(down_waitqueue) netdev_lock() // reusing this lock call } netdev_unlock() Since I'm reusing existing netdev_lock call, so it looks like I only add netdev_unlock. Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-2-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06Merge tag 'amd-pstate-v6.15-2025-03-06' of ↵Rafael J. Wysocki
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux Merge amd-pstate updates for 6.15 (3/6/25) from Mario Limonciello: "A lot of code optimization to avoid cases where call paths will end up calling the same writes multiple times and needlessly caching variables. To accomplish this some of the writes are now made into an atomically written "perf" variable. Locking has been overhauled to ensure it only applies to the necessary functions. Tracing has been adjusted to ensure trace events only are used right before writing out to the hardware." NOTE: This is a redo of amd-pstate-v6.15-2025-03-03 with a fixed Fixes tag. * tag 'amd-pstate-v6.15-2025-03-06' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux: (29 commits) cpufreq/amd-pstate: Drop actions in amd_pstate_epp_cpu_offline() cpufreq/amd-pstate: Stop caching EPP cpufreq/amd-pstate: Rework CPPC enabling cpufreq/amd-pstate: Drop debug statements for policy setting cpufreq/amd-pstate: Update cppc_req_cached for shared mem EPP writes cpufreq/amd-pstate: Move all EPP tracing into *_update_perf and *_set_epp functions cpufreq/amd-pstate: Cache CPPC request in shared mem case too cpufreq/amd-pstate: Replace all AMD_CPPC_* macros with masks cpufreq/amd-pstate-ut: Adjust variable scope cpufreq/amd-pstate-ut: Run on all of the correct CPUs cpufreq/amd-pstate-ut: Drop SUCCESS and FAIL enums cpufreq/amd-pstate-ut: Allow lowest nonlinear and lowest to be the same cpufreq/amd-pstate-ut: Use _free macro to free put policy cpufreq/amd-pstate: Drop `cppc_cap1_cached` cpufreq/amd-pstate: Overhaul locking cpufreq/amd-pstate: Move perf values into a union cpufreq/amd-pstate: Drop min and max cached frequencies cpufreq/amd-pstate: Show a warning when a CPU fails to setup cpufreq/amd-pstate: Invalidate cppc_req_cached during suspend cpufreq/amd-pstate: Fix the clamping of perf values ...
2025-03-06PM: EM: Consify two parameters of em_dev_register_perf_domain()Rafael J. Wysocki
Notice that em_dev_register_perf_domain() and the functions called by it do not update objects pointed to by its cb and cpus parameters, so the const modifier can be added to them. This allows the return value of cpumask_of() or a pointer to a struct em_data_callback declared as const to be passed to em_dev_register_perf_domain() directly without explicit type casting which is rather handy. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/4648962.LvFx2qVVIh@rjwysocki.net
2025-03-06drm/xe/uapi: Add DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRRORMatthew Brost
Add the DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR device query flag, which indicates whether the device supports CPU address mirroring. The intent is for UMDs to use this query to determine if a VM can be set up with CPU address mirroring. This flag is implemented by checking if the device supports GPU faults. v7: - Only report enabled if CONFIG_DRM_GPUSVM is selected (CI) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-20-matthew.brost@intel.com
2025-03-06drm/gpuvm: Add DRM_GPUVA_OP_DRIVERMatthew Brost
Add DRM_GPUVA_OP_DRIVER which allows driver to define their own gpuvm ops. Useful for driver created ops which can be passed into the bind software pipeline. v3: - s/DRM_GPUVA_OP_USER/DRM_GPUVA_OP_DRIVER (Thomas) - Better commit message (Thomas) Cc: Danilo Krummrich <dakr@redhat.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-14-matthew.brost@intel.com
2025-03-06drm/xe/uapi: Add DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRRORMatthew Brost
Add the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag, which is used to create unpopulated virtual memory areas (VMAs) without memory backing or GPU page tables. These VMAs are referred to as CPU address mirror VMAs. The idea is that upon a page fault or prefetch, the memory backing and GPU page tables will be populated. CPU address mirror VMAs only update GPUVM state; they do not have an internal page table (PT) state, nor do they have GPU mappings. It is expected that CPU address mirror VMAs will be mixed with buffer object (BO) VMAs within a single VM. In other words, system allocations and runtime allocations can be mixed within a single user-mode driver (UMD) program. Expected usage: - Bind the entire virtual address (VA) space upon program load using the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag. - If a buffer object (BO) requires GPU mapping (runtime allocation), allocate a CPU address using mmap(PROT_NONE), bind the BO to the mmapped address using existing bind IOCTLs. If a CPU map of the BO is needed, mmap it again to the same CPU address using mmap(MAP_FIXED) - If a BO no longer requires GPU mapping, munmap it from the CPU address space and them bind the mapping address with the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag. - Any malloc'd or mmapped CPU address accessed by the GPU will be faulted in via the SVM implementation (system allocation). - Upon freeing any mmapped or malloc'd data, the SVM implementation will remove GPU mappings. Only supporting 1 to 1 mapping between user address space and GPU address space at the moment as that is the expected use case. uAPI defines interface for non 1 to 1 but enforces 1 to 1, this restriction can be lifted if use cases arrise for non 1 to 1 mappings. This patch essentially short-circuits the code in the existing VM bind paths to avoid populating page tables when the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag is set. v3: - Call vm_bind_ioctl_ops_fini on -ENODATA - Don't allow DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR on non-faulting VMs - s/DRM_XE_VM_BIND_FLAG_SYSTEM_ALLOCATOR/DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR (Thomas) - Rework commit message for expected usage (Thomas) - Describe state of code after patch in commit message (Thomas) v4: - Fix alignment (Checkpatch) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-9-matthew.brost@intel.com
2025-03-06drm/gpusvm: Add support for GPU Shared Virtual MemoryMatthew Brost
This patch introduces support for GPU Shared Virtual Memory (SVM) in the Direct Rendering Manager (DRM) subsystem. SVM allows for seamless sharing of memory between the CPU and GPU, enhancing performance and flexibility in GPU computing tasks. The patch adds the necessary infrastructure for SVM, including data structures and functions for managing SVM ranges and notifiers. It also provides mechanisms for allocating, deallocating, and migrating memory regions between system RAM and GPU VRAM. This is largely inspired by GPUVM. v2: - Take order into account in check pages - Clear range->pages in get pages error - Drop setting dirty or accessed bit in get pages (Vetter) - Remove mmap assert for cpu faults - Drop mmap write lock abuse (Vetter, Christian) - Decouple zdd from range (Vetter, Oak) - Add drm_gpusvm_range_evict, make it work with coherent pages - Export drm_gpusvm_evict_to_sram, only use in BO evict path (Vetter) - mmget/put in drm_gpusvm_evict_to_sram - Drop range->vram_alloation variable - Don't return in drm_gpusvm_evict_to_sram until all pages detached - Don't warn on mixing sram and device pages - Update kernel doc - Add coherent page support to get pages - Use DMA_FROM_DEVICE rather than DMA_BIDIRECTIONAL - Add struct drm_gpusvm_vram and ops (Thomas) - Update the range's seqno if the range is valid (Thomas) - Remove the is_unmapped check before hmm_range_fault (Thomas) - Use drm_pagemap (Thomas) - Drop kfree_mapping (Thomas) - dma mapp pages under notifier lock (Thomas) - Remove ctx.prefault - Remove ctx.mmap_locked - Add ctx.check_pages - s/vram/devmem (Thomas) v3: - Fix memory leak drm_gpusvm_range_get_pages - Only migrate pages with same zdd on CPU fault - Loop over al VMAs in drm_gpusvm_range_evict - Make GPUSVM a drm level module - GPL or MIT license - Update main kernel doc (Thomas) - Prefer foo() vs foo for functions in kernel doc (Thomas) - Prefer functions over macros (Thomas) - Use unsigned long vs u64 for addresses (Thomas) - Use standard interval_tree (Thomas) - s/drm_gpusvm_migration_put_page/drm_gpusvm_migration_unlock_put_page (Thomas) - Drop err_out label in drm_gpusvm_range_find_or_insert (Thomas) - Fix kernel doc in drm_gpusvm_range_free_pages (Thomas) - Newlines between functions defs in header file (Thomas) - Drop shall language in driver vfunc kernel doc (Thomas) - Move some static inlines from head to C file (Thomas) - Don't allocate pages under page lock in drm_gpusvm_migrate_populate_ram_pfn (Thomas) - Change check_pages to a threshold v4: - Fix NULL ptr deref in drm_gpusvm_migrate_populate_ram_pfn (Thomas, Himal) - Fix check pages threshold - Check for range being unmapped under notifier lock in get pages (Testing) - Fix characters per line - Drop WRITE_ONCE for zdd->devmem_allocation assignment (Thomas) - Use completion for devmem_allocation->detached (Thomas) - Make GPU SVM depend on ZONE_DEVICE (CI) - Use hmm_range_fault for eviction (Thomas) - Drop zdd worker (Thomas) v5: - Select Kconfig deps (CI) - Set device to NULL in __drm_gpusvm_migrate_to_ram (Matt Auld, G.G.) - Drop Thomas's SoB (Thomas) - Add drm_gpusvm_range_start/end/size helpers (Thomas) - Add drm_gpusvm_notifier_start/end/size helpers (Thomas) - Absorb drm_pagemap name changes (Thomas) - Fix driver lockdep assert (Thomas) - Move driver lockdep assert to static function (Thomas) - Assert mmap lock held in drm_gpusvm_migrate_to_devmem (Thomas) - Do not retry forever on eviction (Thomas) v6: - Fix drm_gpusvm_get_devmem_page alignment (Checkpatch) - Modify Kconfig (CI) - Compile out lockdep asserts (CI) v7: - Add kernel doc for flags fields (CI, Auld) Cc: Simona Vetter <simona.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: Christian König <christian.koenig@amd.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: <dri-devel@lists.freedesktop.org> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-7-matthew.brost@intel.com
2025-03-06drm/pagemap: Add DRM pagemapThomas Hellström
Introduce drm_pagemap ops to map and unmap dma to VRAM resources. In the local memory case it's a matter of merely providing an offset into the device's physical address. For future p2p the map and unmap functions may encode as needed. Similar to how dma-buf works, let the memory provider (drm_pagemap) provide the mapping functionality. v3: - Move to drm level include v4: - Fix kernel doc (G.G.) v5: - s/map_dma/device_map (Thomas) - s/unmap_dma/device_unmap (Thomas) v7: - Fix kernel doc (CI, Auld) - Drop P2P define (Thomas) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-5-matthew.brost@intel.com
2025-03-06mm/migrate: Add migrate_device_pfnsMatthew Brost
Add migrate_device_pfns which prepares an array of pre-populated device pages for migration. This is needed for eviction of known set of non-contiguous devices pages to cpu pages which is a common case for SVM in DRM drivers using TTM. v2: - s/migrate_device_vma_range/migrate_device_prepopulated_range - Drop extra mmu invalidation (Vetter) v3: - s/migrate_device_prepopulated_range/migrate_device_pfns (Alistar) - Use helper to lock device pages (Alistar) - Update commit message with why this is required (Alistar) Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-3-matthew.brost@intel.com
2025-03-06Merge branch 'for-6.15/features' into fwctlJason Gunthorpe
Add CXL mailbox Features commands enabling. This is also preparation for CXL fwctl enabling. The same code will also be utilized by the CXL EDAC enabling. The commands 'Get Supported Features', 'Get Feature', and 'Set Feature' are enabled for kernel usages. Required for the CXL fwctl driver. * branch 'for-6.15/features' cxl: Setup exclusive CXL features that are reserved for the kernel cxl/mbox: Add SET_FEATURE mailbox command cxl/mbox: Add GET_FEATURE mailbox command cxl/test: Add Get Supported Features mailbox command support cxl: Add Get Supported Features command for kernel usage cxl: Enumerate feature commands cxl: Refactor user ioctl command path from mds to mailbox Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-06fwctl/mlx5: Support for communicating with mlx5 fwSaeed Mahameed
mlx5 FW has a built in security context called UID. Each UID has a set of permissions controlled by the kernel when it is created and every command is tagged by the kernel with a particular UID. In general commands cannot reach objects outside of their UID and commands cannot exceed their UID's permissions. These restrictions are enforced by FW. This mechanism has long been used in RDMA for the devx interface where RDMA will sent commands directly to the FW and the UID limitations restrict those commands to a ib_device/verbs security domain. For instance commands that would effect other VFs, or global device resources. The model is suitable for unprivileged userspace to operate the RDMA functionality. The UID has been extended with a "tools resources" permission which allows additional commands and sub-commands that are intended to match with the scope limitations set in FWCTL. This is an alternative design to the "command intent log" where the FW does the enforcement rather than having the FW report the enforcement the kernel should do. Consistent with the fwctl definitions the "tools resources" security context is limited to the FWCTL_RPC_CONFIGURATION, FWCTL_RPC_DEBUG_READ_ONLY, FWCTL_RPC_DEBUG_WRITE, and FWCTL_RPC_DEBUG_WRITE_FULL security scopes. Like RDMA devx, each opened fwctl file descriptor will get a unique UID associated with each file descriptor. The fwctl driver is kept simple and we reject commands that can create objects as the UID mechanism relies on the kernel to track and destroy objects prior to detroying the UID. Filtering into fwctl sub scopes is done inside the driver with a switch statement. This substantially limits what is possible to primarily query functions ad a few limited set operations. mlx5 already has a robust infrastructure for delivering RPC messages to fw. Trivially connect fwctl's RPC mechanism to mlx5_cmd_do(). Enforce the User Context ID in every RPC header accepted from the FD so the FW knows the security context of the issuing ID. Link: https://patch.msgid.link/r/7-v5-642aa0c94070+4447f-fwctl_jgg@nvidia.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-06fwctl: FWCTL_RPC to execute a Remote Procedure Call to device firmwareJason Gunthorpe
Add the FWCTL_RPC ioctl which allows a request/response RPC call to device firmware. Drivers implementing this call must follow the security guidelines under Documentation/userspace-api/fwctl.rst The core code provides some memory management helpers to get the messages copied from and back to userspace. The driver is responsible for allocating the output message memory and delivering the message to the device. Link: https://patch.msgid.link/r/5-v5-642aa0c94070+4447f-fwctl_jgg@nvidia.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-06taint: Add TAINT_FWCTLJason Gunthorpe
Requesting a fwctl scope of access that includes mutating device debug data will cause the kernel to be tainted. Changing the device operation through things in the debug scope may cause the device to malfunction in undefined ways. This should be reflected in the TAINT flags to help any debuggers understand that something has been done. Link: https://patch.msgid.link/r/4-v5-642aa0c94070+4447f-fwctl_jgg@nvidia.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-06fwctl: FWCTL_INFO to return basic information about the deviceJason Gunthorpe
Userspace will need to know some details about the fwctl interface being used to locate the correct userspace code to communicate with the kernel. Provide a simple device_type enum indicating what the kernel driver is. Allow the device to provide a device specific info struct that contains any additional information that the driver may need to provide to userspace. Link: https://patch.msgid.link/r/3-v5-642aa0c94070+4447f-fwctl_jgg@nvidia.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-06fwctl: Basic ioctl dispatch for the character deviceJason Gunthorpe
Each file descriptor gets a chunk of per-FD driver specific context that allows the driver to attach a device specific struct to. The core code takes care of the memory lifetime for this structure. The ioctl dispatch and design is based on what was built for iommufd. The ioctls have a struct which has a combined in/out behavior with a typical 'zero pad' scheme for future extension and backwards compatibility. Like iommufd some shared logic does most of the ioctl marshaling and compatibility work and table dispatches to some function pointers for each unique ioctl. This approach has proven to work quite well in the iommufd and rdma subsystems. Allocate an ioctl number space for the subsystem. Link: https://patch.msgid.link/r/2-v5-642aa0c94070+4447f-fwctl_jgg@nvidia.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-06fwctl: Add basic structure for a class subsystem with a cdevJason Gunthorpe
Create the class, character device and functions for a fwctl driver to un/register to the subsystem. A typical fwctl driver has a sysfs presence like: $ ls -l /dev/fwctl/fwctl0 crw------- 1 root root 250, 0 Apr 25 19:16 /dev/fwctl/fwctl0 $ ls /sys/class/fwctl/fwctl0 dev device power subsystem uevent $ ls /sys/class/fwctl/fwctl0/device/infiniband/ ibp0s10f0 $ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/ fwctl0/ $ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0 dev device power subsystem uevent Which allows userspace to link all the multi-subsystem driver components together and learn the subsystem specific names for the device's components. Link: https://patch.msgid.link/r/1-v5-642aa0c94070+4447f-fwctl_jgg@nvidia.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-06firmware: tegra: bpmp: Fix typo in bpmp-abi.hZhu Jun
The word 'traget' is wrong, so fix it. Signed-off-by: Zhu Jun <zhujun2@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20241118022928.11305-1-zhujun2@cmss.chinamobile.com Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-03-06PCI/DOE: Rename Discovery Response Data Object Contents to typeAlistair Francis
PCIe r6.1, sec 6.30.1.1, describes a "Vendor ID", a "Data Object Type" and "Next Index" as the fields in the DOE Discovery Response Data Object. The DOE driver currently uses both the terms 'type' and 'prot' for the second element. Rename all uses of the DOE Discovery Response Data Object to use 'type' as the second element of the object header, instead of type/prot as it currently is. Link: https://lore.kernel.org/r/20250306075211.1855177-2-alistair@alistair23.me Signed-off-by: Alistair Francis <alistair@alistair23.me> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-03-06tracing: gfp: Remove duplication of recording GFP flagsSteven Rostedt
The gfp_flags when recorded in the trace require being converted from their numbers to values. Various macros are used to help facilitate this, but there's two sets of macros that need to keep track of the same GFP flags to stay in sync. Commit 60295b944ff68 ("tracing: gfp: Fix the GFP enum values shown for user space tracing tools") added a TRACE_GFP_FLAGS macro that holds the enum ___GFP_*_BIT defined bits, and creates the TRACE_DEFINE_ENUM() wrapper around them. The __def_gfpflag_names() macro creates the mapping of various flags or multiple flags to give them human readable names via the __print_flags() tracing helper macro. As the TRACE_GFP_FLAGS is a subset of the __def_gfpflags_names(), it can be used to cover the individual bit names, by redefining the internal macro TRACE_GFP_EM(): #undef TRACE_GFP_EM #define TRACE_GFP_EM(a) gfpflag_string(__GFP_##a), This will remove the bits that are duplicate between the two macros. If a new bit is created, only the TRACE_GFP_FLAGS needs to be updated and that will also update the __def_gfpflags_names() macro. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/20250225135611.1942b65c@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-06tracing: Remove orphaned event_trace_printkHengqi Chen
The event_trace_printk macro has no callers since commit b8e65554d80b ("tracing: remove deprecated TRACE_FORMAT"). So drop it. Link: https://lore.kernel.org/20250213113951.813258-1-hengqi.chen@gmail.com Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-06Merge tag 'vfs-6.14-rc6.fixes' of ↵Linus Torvalds
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix spelling mistakes in idmappings.rst - Fix RCU warnings in override_creds()/revert_creds() - Create new pid namespaces with default limit now that pid_max is namespaced * tag 'vfs-6.14-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: pid: Do not set pid_max in new pid namespaces doc: correcting two prefix errors in idmappings.rst cred: Fix RCU warnings in override/revert_creds
2025-03-06fs/pipe: express 'pipe_empty()' in terms of 'pipe_occupancy()'Linus Torvalds
That's what 'pipe_full()' does, so it's more consistent. But more importantly it gets the type limits right when the pipe head and tail are no longer necessarily 'unsigned int'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-06Merge tag 'ffa-updates-6.15' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into soc/drivers Arm FF-A updates for v6.15 This update primarily focuses on FF-A framework notification support along with other improvements, including UUID handling enhancements and various fixes. 1. FF-A framework notification upport - Adds support for multiple UUIDs per partition to register individual SRI callbacks. - Handles Rx buffer full framework notifications and provides a general interface for future extensions. 2. Improved multiple UUID/services per-partition handling - Adds support for UUID passing in FFA_MSG_SEND2, improving multiple UUID/service support in the driver. - Introduces a helper function to check whether a partition can receive REQUEST2 messages. 3. Partition handling generic improvements - Implements device unregistration for better partition cleanup. - Improves handling of the host partition presence in partition info. 4. FF-A version updates - Upgrades the driver version to FF-A v1.2. - Rejects major versions higher than the driver version as incompatible. 5. Big-Endian support fixes - Fixes big-endian issues in: __ffa_partition_info_regs_get() __ffa_partition_info_get() - Big-endian support is still incomplete, and only these changes can be verified without additional application/testing updates at the moment. We can discover all the partitions correctly with big-endian kernel now. 6. Miscellaneous fixes - Fixes function prototype misalignments in: sync_send_receive{,2} - Adds explicit type casting for return values from: FFA_VERSION and NOTIFICATION_INFO_GET - Corrects vCPU list parsing in ffa_notification_info_get(). 7. UUID management in the driver and DMA mask updates - Replaces UUID buffer with the standard UUID format in ffa_partition_info structure. - Fixes a typo in some FF-A bus macros. - Sets dma_mask for FF-A devices. In short, this update enhances notification handling, UUID support, and overall robustness of the FF-A driver while addressing multiple fixes and cleanups. * tag 'ffa-updates-6.15' of https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: (23 commits) firmware: arm_ffa: Set dma_mask for ffa devices firmware: arm_ffa: Skip the first/partition ID when parsing vCPU list firmware: arm_ffa: Explicitly cast return value from NOTIFICATION_INFO_GET firmware: arm_ffa: Explicitly cast return value from FFA_VERSION before comparison firmware: arm_ffa: Handle ffa_notification_get correctly at virtual FF-A instance firmware: arm_ffa: Allow multiple UUIDs per partition to register SRI callback firmware: arm_ffa: Add support for handling framework notifications firmware: arm_ffa: Add support for {un,}registration of framework notifications firmware: arm_ffa: Stash ffa_device instead of notify_type in notifier_cb_info firmware: arm_ffa: Refactoring to prepare for framework notification support firmware: arm_ffa: Remove unnecessary declaration of ffa_partitions_cleanup() firmware: arm_ffa: Reject higher major version as incompatible firmware: arm_ffa: Upgrade FF-A version to v1.2 in the driver firmware: arm_ffa: Add support for passing UUID in FFA_MSG_SEND2 firmware: arm_ffa: Helper to check if a partition can receive REQUEST2 messages firmware: arm_ffa: Unregister the FF-A devices when cleaning up the partitions firmware: arm_ffa: Handle the presence of host partition in the partition info firmware: arm_ffa: Refactor addition of partition information into XArray firmware: arm_ffa: Fix big-endian support in __ffa_partition_info_regs_get() firmware: arm_ffa: Fix big-endian support in __ffa_partition_info_get() ... Link: https://lore.kernel.org/r/20250304105928.432997-1-sudeep.holla@arm.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-03-06Merge tag 'smccc-update-6.15' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into soc/drivers Arm SMCCC update for v6.15 Just a single update introducing the support for the optional SOC_ID name string from the Arm SMCCC v1.6 specification. If the SOC_ID name string is implemented, the machine field of the SoC Device Attributes will reflect it. The original intent of SOC_ID was to provide a JEP-106 code for the SiP and the SoC revision to uniquely identify the SoC. However, there has been a request to add this optional name so that SoC firmware can directly provide the SoC name to the OS. This change avoids the need for frequent updates to various tools that would otherwise require maintaining hardcoded model/machine name tables for new SoCs. * tag 'smccc-update-6.15' of https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: firmware: smccc: Support optional Arm SMCCC SOC_ID name Link: https://lore.kernel.org/r/20250304105845.432813-1-sudeep.holla@arm.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-03-06Merge tag 'asahi-soc-rtkit-6.15' of https://github.com/AsahiLinux/linux into ↵Arnd Bergmann
soc/drivers Apple SoC RTKit IPC library updates for 6.15: - Additional logging for errors - A few minor improvements and bugfixes required for drivers that are yet to be upstreamed * tag 'asahi-soc-rtkit-6.15' of https://github.com/AsahiLinux/linux: soc: apple: rtkit: Cut syslog messages after the first '\0' soc: apple: rtkit: Use high prio work queue soc: apple: rtkit: Implement OSLog buffers properly soc: apple: rtkit: Add and use PWR_STATE_INIT instead of _ON soc: apple: rtkit: Fix use-after-free in apple_rtkit_crashlog_rx() soc: apple: rtkit: Pass the crashlog to the crashed() callback soc: apple: rtkit: Check & log more failures Link: https://lore.kernel.org/r/20250302113842.58092-1-sven@svenpeter.dev Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-03-06ASoC: soc-dai: add snd_soc_dai_mute_is_ctrled_at_trigger()Kuninori Morimoto
Adds snd_soc_dai_mute_is_ctrled_at_trigger() to judge dai->driver->ops->mute_unmute_on_trigger flags Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Link: https://patch.msgid.link/871pva6hs2.wl-kuninori.morimoto.gx@renesas.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-03-06badblocks: use sector_t instead of int to avoid truncation of badblocks lengthZheng Qixing
There is a truncation of badblocks length issue when set badblocks as follow: echo "2055 4294967299" > bad_blocks cat bad_blocks 2055 3 Change 'sectors' argument type from 'int' to 'sector_t'. This change avoids truncation of badblocks length for large sectors by replacing 'int' with 'sector_t' (u64), enabling proper handling of larger disk sizes and ensuring compatibility with 64-bit sector addressing. Fixes: 9e0e252a048b ("badblocks: Add core badblock management code") Signed-off-by: Zheng Qixing <zhengqixing@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Coly Li <colyli@kernel.org> Link: https://lore.kernel.org/r/20250227075507.151331-13-zhengqixing@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-06badblocks: return boolean from badblocks_set() and badblocks_clear()Zheng Qixing
Change the return type of badblocks_set() and badblocks_clear() from int to bool, indicating success or failure. Specifically: - _badblocks_set() and _badblocks_clear() functions now return true for success and false for failure. - All calls to these functions are updated to handle the new boolean return type. - This change improves code clarity and ensures a more consistent handling of success and failure states. Signed-off-by: Zheng Qixing <zhengqixing@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Coly Li <colyli@kernel.org> Acked-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20250227075507.151331-11-zhengqixing@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-06netfilter: nf_tables: make destruction work queue pernetFlorian Westphal
The call to flush_work before tearing down a table from the netlink notifier was supposed to make sure that all earlier updates (e.g. rule add) that might reference that table have been processed. Unfortunately, flush_work() waits for the last queued instance. This could be an instance that is different from the one that we must wait for. This is because transactions are protected with a pernet mutex, but the work item is global, so holding the transaction mutex doesn't prevent another netns from queueing more work. Make the work item pernet so that flush_work() will wait for all transactions queued from this netns. A welcome side effect is that we no longer need to wait for transaction objects from foreign netns. The gc work queue is still global. This seems to be ok because nft_set structures are reference counted and each container structure owns a reference on the net namespace. The destroy_list is still protected by a global spinlock rather than pernet one but the hold time is very short anyway. v2: call cancel_work_sync before reaping the remaining tables (Pablo). Fixes: 9f6958ba2e90 ("netfilter: nf_tables: unconditionally flush pending work before notifier") Reported-by: syzbot+5d8c5789c8cb076b2c25@syzkaller.appspotmail.com Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-03-06uprobes/x86: Harden uretprobe syscall trampoline checkJiri Olsa
Jann reported a possible issue when trampoline_check_ip returns address near the bottom of the address space that is allowed to call into the syscall if uretprobes are not set up: https://lore.kernel.org/bpf/202502081235.5A6F352985@keescook/T/#m9d416df341b8fbc11737dacbcd29f0054413cbbf Though the mmap minimum address restrictions will typically prevent creating mappings there, let's make sure uretprobe syscall checks for that. Fixes: ff474a78cef5 ("uprobe: Add uretprobe syscall to speed up return probe") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Kees Cook <kees@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250212220433.3624297-1-jolsa@kernel.org
2025-03-06watchdog/hardlockup/perf: Fix perf_event memory leakLi Huafei
During stress-testing, we found a kmemleak report for perf_event: unreferenced object 0xff110001410a33e0 (size 1328): comm "kworker/4:11", pid 288, jiffies 4294916004 hex dump (first 32 bytes): b8 be c2 3b 02 00 11 ff 22 01 00 00 00 00 ad de ...;...."....... f0 33 0a 41 01 00 11 ff f0 33 0a 41 01 00 11 ff .3.A.....3.A.... backtrace (crc 24eb7b3a): [<00000000e211b653>] kmem_cache_alloc_node_noprof+0x269/0x2e0 [<000000009d0985fa>] perf_event_alloc+0x5f/0xcf0 [<00000000084ad4a2>] perf_event_create_kernel_counter+0x38/0x1b0 [<00000000fde96401>] hardlockup_detector_event_create+0x50/0xe0 [<0000000051183158>] watchdog_hardlockup_enable+0x17/0x70 [<00000000ac89727f>] softlockup_start_fn+0x15/0x40 ... Our stress test includes CPU online and offline cycles, and updating the watchdog configuration. After reading the code, I found that there may be a race between cleaning up perf_event after updating watchdog and disabling event when the CPU goes offline: CPU0 CPU1 CPU2 (update watchdog) (hotplug offline CPU1) ... _cpu_down(CPU1) cpus_read_lock() // waiting for cpu lock softlockup_start_all smp_call_on_cpu(CPU1) softlockup_start_fn ... watchdog_hardlockup_enable(CPU1) perf create E1 watchdog_ev[CPU1] = E1 cpus_read_unlock() cpus_write_lock() cpuhp_kick_ap_work(CPU1) cpuhp_thread_fun ... watchdog_hardlockup_disable(CPU1) watchdog_ev[CPU1] = NULL dead_event[CPU1] = E1 __lockup_detector_cleanup for each dead_events_mask release each dead_event /* * CPU1 has not been added to * dead_events_mask, then E1 * will not be released */ CPU1 -> dead_events_mask cpumask_clear(&dead_events_mask) // dead_events_mask is cleared, E1 is leaked In this case, the leaked perf_event E1 matches the perf_event leak reported by kmemleak. Due to the low probability of problem recurrence (only reported once), I added some hack delays in the code: static void __lockup_detector_reconfigure(void) { ... watchdog_hardlockup_start(); cpus_read_unlock(); + mdelay(100); /* * Must be called outside the cpus locked section to prevent * recursive locking in the perf code. ... } void watchdog_hardlockup_disable(unsigned int cpu) { ... perf_event_disable(event); this_cpu_write(watchdog_ev, NULL); this_cpu_write(dead_event, event); + mdelay(100); cpumask_set_cpu(smp_processor_id(), &dead_events_mask); atomic_dec(&watchdog_cpus); ... } void hardlockup_detector_perf_cleanup(void) { ... perf_event_release_kernel(event); per_cpu(dead_event, cpu) = NULL; } + mdelay(100); cpumask_clear(&dead_events_mask); } Then, simultaneously performing CPU on/off and switching watchdog, it is almost certain to reproduce this leak. The problem here is that releasing perf_event is not within the CPU hotplug read-write lock. Commit: 941154bd6937 ("watchdog/hardlockup/perf: Prevent CPU hotplug deadlock") introduced deferred release to solve the deadlock caused by calling get_online_cpus() when releasing perf_event. Later, commit: efe951d3de91 ("perf/x86: Fix perf,x86,cpuhp deadlock") removed the get_online_cpus() call on the perf_event release path to solve another deadlock problem. Therefore, it is now possible to move the release of perf_event back into the CPU hotplug read-write lock, and release the event immediately after disabling it. Fixes: 941154bd6937 ("watchdog/hardlockup/perf: Prevent CPU hotplug deadlock") Signed-off-by: Li Huafei <lihuafei1@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241021193004.308303-1-lihuafei1@huawei.com
2025-03-06iomap: Support SW-based atomic writesJohn Garry
Currently atomic write support requires dedicated HW support. This imposes a restriction on the filesystem that disk blocks need to be aligned and contiguously mapped to FS blocks to issue atomic writes. XFS has no method to guarantee FS block alignment for regular, non-RT files. As such, atomic writes are currently limited to 1x FS block there. To deal with the scenario that we are issuing an atomic write over misaligned or discontiguous data blocks - and raise the atomic write size limit - support a SW-based software emulated atomic write mode. For XFS, this SW-based atomic writes would use CoW support to issue emulated untorn writes. It is the responsibility of the FS to detect discontiguous atomic writes and switch to IOMAP_DIO_ATOMIC_SW mode and retry the write. Indeed, SW-based atomic writes could be used always when the mounted bdev does not support HW offload, but this strategy is not initially expected to be used. Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20250303171120.2837067-6-john.g.garry@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06iomap: Rename IOMAP_ATOMIC -> IOMAP_ATOMIC_HWJohn Garry
In future xfs will support a SW-based atomic write, so rename IOMAP_ATOMIC -> IOMAP_ATOMIC_HW to be clear which mode is being used. Also relocate setting of IOMAP_ATOMIC_HW to the write path in __iomap_dio_rw(), to be clear that this flag is only relevant to writes. Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20250303171120.2837067-3-john.g.garry@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06Merge branch 'vfs-6.15.shared.iomap' of ↵Christian Brauner
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Bring in iomap changes that xfs relies on. Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-06PCI: dwc: Add Rockchip to the RAS DES allowed vendor listNiklas Cassel
Add PCI_VENDOR_ID_ROCKCHIP to the list of RAS DES vendor specific IDs. Signed-off-by: Niklas Cassel <cassel@kernel.org> Link: https://lore.kernel.org/r/20250225145657.944925-2-cassel@kernel.org [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-06PCI: Add Rockchip Vendor IDShawn Lin
Move PCI_VENDOR_ID_ROCKCHIP from pci_endpoint_test.c to pci_ids.h and reuse it in pcie-rockchip-host.c. Link: https://lore.kernel.org/r/20250218092120.2322784-2-cassel@kernel.org Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-06PCI: dwc: Add debugfs based Silicon Debug support for DWCShradha Todi
Add support to provide Silicon Debug interface to userspace. This set of debug registers are part of the RAS DES feature present in DesignWare PCIe controllers. Co-developed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Shradha Todi <shradha.t@samsung.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Fan Ni <fan.ni@samsung.com> Tested-by: Hrishikesh Deleep <hrishikesh.d@samsung.com> Link: https://lore.kernel.org/r/20250221131548.59616-4-shradha.t@samsung.com [kwilczynski: commit log, tidy up Kconfig and drop "default y", tidy up code comments, squashed patch that fixes a NULL pointer dereference when debugfs is already unavailable during clean-up from https://lore.kernel.org/linux-pci/20250225171239.19574-2-manivannan.sadhasivam@linaro.org, refactor dwc_pcie_debugfs_init() to not return errors, squashed patch that changes how lack of the RAS DES capability is handled from https://lore.kernel.org/linux-pci/20250304151814.6xu7cbpwpqrvcad5@thinkpad] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-06gpiolib: fix kerneldocBartosz Golaszewski
Add missing '@' to the kernel doc for the new of_node_instance_match field of struct gpio_chip. Fixes: bd3ce71078bd ("gpiolib: of: Handle threecell GPIO chips") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/all/20250305203929.70283b9b@canb.auug.org.au/ Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20250305094939.40011-1-brgl@bgdev.pl Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-03-06drm/gem-shmem: Test for imported buffers with drm_gem_is_imported()Thomas Zimmermann
Instead of testing import_attach for imported GEM buffers, invoke drm_gem_is_imported() to do the test. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Anusha Srivatsa <asrivats@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250226172457.217725-5-tzimmermann@suse.de
2025-03-06drm/gem: Test for imported GEM buffers with helperThomas Zimmermann
Add drm_gem_is_imported() that tests if a GEM object's buffer has been imported. Update the GEM code accordingly. GEM code usually tests for imports if import_attach has been set in struct drm_gem_object. But attaching a dma-buf on import requires a DMA-capable importer device, which is not the case for many serial busses like USB or I2C. The new helper tests if a GEM object's dma-buf has been created from the GEM object. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Anusha Srivatsa <asrivats@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250226172457.217725-2-tzimmermann@suse.de
2025-03-05include/linux/log2.h: mark is_power_of_2() with __always_inlineSu Hui
When building kernel with randconfig, there is an error: In function `kvm_is_cr4_bit_set',inlined from `kvm_update_cpuid_runtime' at arch/x86/kvm/cpuid.c:310:9: include/linux/compiler_types.h:542:38: error: call to `__compiletime_assert_380' declared with attribute error: BUILD_BUG_ON failed: !is_power_of_2(cr4_bit). '!is_power_of_2(X86_CR4_OSXSAVE)' is False, but gcc treats is_power_of_2() as non-inline function and a compilation error happens. Fix this by marking is_power_of_2() with __always_inline. Link: https://lkml.kernel.org/r/20250221071624.1356899-1-suhui@nfschina.com Signed-off-by: Su Hui <suhui@nfschina.com> Cc: Binbin Wu <binbin.wu@linux.intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-05NFS: fix nfs_release_folio() to not deadlock via kcompactd writebackMike Snitzer
Add PF_KCOMPACTD flag and current_is_kcompactd() helper to check for it so nfs_release_folio() can skip calling nfs_wb_folio() from kcompactd. Otherwise NFS can deadlock waiting for kcompactd enduced writeback which recurses back to NFS (which triggers writeback to NFSD via NFS loopback mount on the same host, NFSD blocks waiting for XFS's call to __filemap_get_folio): 6070.550357] INFO: task kcompactd0:58 blocked for more than 4435 seconds. {--- [58] "kcompactd0" [<0>] folio_wait_bit+0xe8/0x200 [<0>] folio_wait_writeback+0x2b/0x80 [<0>] nfs_wb_folio+0x80/0x1b0 [nfs] [<0>] nfs_release_folio+0x68/0x130 [nfs] [<0>] split_huge_page_to_list_to_order+0x362/0x840 [<0>] migrate_pages_batch+0x43d/0xb90 [<0>] migrate_pages_sync+0x9a/0x240 [<0>] migrate_pages+0x93c/0x9f0 [<0>] compact_zone+0x8e2/0x1030 [<0>] compact_node+0xdb/0x120 [<0>] kcompactd+0x121/0x2e0 [<0>] kthread+0xcf/0x100 [<0>] ret_from_fork+0x31/0x40 [<0>] ret_from_fork_asm+0x1a/0x30 ---} [akpm@linux-foundation.org: fix build] Link: https://lkml.kernel.org/r/20250225022002.26141-1-snitzer@kernel.org Fixes: 96780ca55e3c ("NFS: fix up nfs_release_folio() to try to release the page") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Cc: Anna Schumaker <anna.schumaker@oracle.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-05mm/hugetlb: wait for hugetlb folios to be freedGe Yang
Since the introduction of commit c77c0a8ac4c52 ("mm/hugetlb: defer freeing of huge pages if in non-task context"), which supports deferring the freeing of hugetlb pages, the allocation of contiguous memory through cma_alloc() may fail probabilistically. In the CMA allocation process, if it is found that the CMA area is occupied by in-use hugetlb folios, these in-use hugetlb folios need to be migrated to another location. When there are no available hugetlb folios in the free hugetlb pool during the migration of in-use hugetlb folios, new folios are allocated from the buddy system. A temporary state is set on the newly allocated folio. Upon completion of the hugetlb folio migration, the temporary state is transferred from the new folios to the old folios. Normally, when the old folios with the temporary state are freed, it is directly released back to the buddy system. However, due to the deferred freeing of hugetlb pages, the PageBuddy() check fails, ultimately leading to the failure of cma_alloc(). Here is a simplified call trace illustrating the process: cma_alloc() ->__alloc_contig_migrate_range() // Migrate in-use hugetlb folios ->unmap_and_move_huge_page() ->folio_putback_hugetlb() // Free old folios ->test_pages_isolated() ->__test_page_isolated_in_pageblock() ->PageBuddy(page) // Check if the page is in buddy To resolve this issue, we have implemented a function named wait_for_freed_hugetlb_folios(). This function ensures that the hugetlb folios are properly released back to the buddy system after their migration is completed. By invoking wait_for_freed_hugetlb_folios() before calling PageBuddy(), we ensure that PageBuddy() will succeed. Link: https://lkml.kernel.org/r/1739936804-18199-1-git-send-email-yangge1116@126.com Fixes: c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task context") Signed-off-by: Ge Yang <yangge1116@126.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Acked-by: David Hildenbrand <david@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <21cnbao@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>