summaryrefslogtreecommitdiff
path: root/include/linux/workqueue.h
AgeCommit message (Collapse)Author
2024-01-10Merge tag 'header_cleanup-2024-01-10' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull header cleanups from Kent Overstreet: "The goal is to get sched.h down to a type only header, so the main thing happening in this patchset is splitting out various _types.h headers and dependency fixups, as well as moving some things out of sched.h to better locations. This is prep work for the memory allocation profiling patchset which adds new sched.h interdepencencies" * tag 'header_cleanup-2024-01-10' of https://evilpiepirate.org/git/bcachefs: (51 commits) Kill sched.h dependency on rcupdate.h kill unnecessary thread_info.h include Kill unnecessary kernel.h include preempt.h: Kill dependency on list.h rseq: Split out rseq.h from sched.h LoongArch: signal.c: add header file to fix build error restart_block: Trim includes lockdep: move held_lock to lockdep_types.h sem: Split out sem_types.h uidgid: Split out uidgid_types.h seccomp: Split out seccomp_types.h refcount: Split out refcount_types.h uapi/linux/resource.h: fix include x86/signal: kill dependency on time.h syscall_user_dispatch.h: split out *_types.h mm_types_task.h: Trim dependencies Split out irqflags_types.h ipc: Kill bogus dependency on spinlock.h shm: Slim down dependencies workqueue: Split out workqueue_types.h ...
2023-12-20workqueue: Split out workqueue_types.hKent Overstreet
More sched.h dependency culling - this lets us kill a rhashtable-types.h dependency on workqueue.h. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-12workqueue: Add workqueue_unbound_exclude_cpumask() to exclude CPUs from ↵Waiman Long
wq_unbound_cpumask When the "isolcpus" boot command line option is used to add a set of isolated CPUs, those CPUs will be excluded automatically from wq_unbound_cpumask to avoid running work functions from unbound workqueues. Recently cpuset has been extended to allow the creation of partitions of isolated CPUs dynamically. To make it closer to the "isolcpus" in functionality, the CPUs in those isolated cpuset partitions should be excluded from wq_unbound_cpumask as well. This can be done currently by explicitly writing to the workqueue's cpumask sysfs file after creating the isolated partitions. However, this process can be error prone. Ideally, the cpuset code should be allowed to request the workqueue code to exclude those isolated CPUs from wq_unbound_cpumask so that this operation can be done automatically and the isolated CPUs will be returned back to wq_unbound_cpumask after the destructions of the isolated cpuset partitions. This patch adds a new workqueue_unbound_exclude_cpumask() function to enable that. This new function will exclude the specified isolated CPUs from wq_unbound_cpumask. To be able to restore those isolated CPUs back after the destruction of isolated cpuset partitions, a new wq_requested_unbound_cpumask is added to store the user provided unbound cpumask either from the boot command line options or from writing to the cpumask sysfs file. This new cpumask provides the basis for CPU exclusion. To enable users to understand how the wq_unbound_cpumask is being modified internally, this patch also exposes the newly introduced wq_requested_unbound_cpumask as well as a wq_isolated_cpumask to store the cpumask to be excluded from wq_unbound_cpumask as read-only sysfs files. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-10-17workqueue: Provide one lock class key per work_on_cpu() callsiteFrederic Weisbecker
All callers of work_on_cpu() share the same lock class key for all the functions queued. As a result the workqueue related locking scenario for a function A may be spuriously accounted as an inversion against the locking scenario of function B such as in the following model: long A(void *arg) { mutex_lock(&mutex); mutex_unlock(&mutex); } long B(void *arg) { } void launchA(void) { work_on_cpu(0, A, NULL); } void launchB(void) { mutex_lock(&mutex); work_on_cpu(1, B, NULL); mutex_unlock(&mutex); } launchA and launchB running concurrently have no chance to deadlock. However the above can be reported by lockdep as a possible locking inversion because the works containing A() and B() are treated as belonging to the same locking class. The following shows an existing example of such a spurious lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 6.6.0-rc1-00065-g934ebd6e5359 #35409 Not tainted ------------------------------------------------------ kworker/0:1/9 is trying to acquire lock: ffffffff9bc72f30 (cpu_hotplug_lock){++++}-{0:0}, at: _cpu_down+0x57/0x2b0 but task is already holding lock: ffff9e3bc0057e60 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_scheduled_works+0x216/0x500 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 ((work_completion)(&wfc.work)){+.+.}-{0:0}: __flush_work+0x83/0x4e0 work_on_cpu+0x97/0xc0 rcu_nocb_cpu_offload+0x62/0xb0 rcu_nocb_toggle+0xd0/0x1d0 kthread+0xe6/0x120 ret_from_fork+0x2f/0x40 ret_from_fork_asm+0x1b/0x30 -> #1 (rcu_state.barrier_mutex){+.+.}-{3:3}: __mutex_lock+0x81/0xc80 rcu_nocb_cpu_deoffload+0x38/0xb0 rcu_nocb_toggle+0x144/0x1d0 kthread+0xe6/0x120 ret_from_fork+0x2f/0x40 ret_from_fork_asm+0x1b/0x30 -> #0 (cpu_hotplug_lock){++++}-{0:0}: __lock_acquire+0x1538/0x2500 lock_acquire+0xbf/0x2a0 percpu_down_write+0x31/0x200 _cpu_down+0x57/0x2b0 __cpu_down_maps_locked+0x10/0x20 work_for_cpu_fn+0x15/0x20 process_scheduled_works+0x2a7/0x500 worker_thread+0x173/0x330 kthread+0xe6/0x120 ret_from_fork+0x2f/0x40 ret_from_fork_asm+0x1b/0x30 other info that might help us debug this: Chain exists of: cpu_hotplug_lock --> rcu_state.barrier_mutex --> (work_completion)(&wfc.work) Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&wfc.work)); lock(rcu_state.barrier_mutex); lock((work_completion)(&wfc.work)); lock(cpu_hotplug_lock); *** DEADLOCK *** 2 locks held by kworker/0:1/9: #0: ffff900481068b38 ((wq_completion)events){+.+.}-{0:0}, at: process_scheduled_works+0x212/0x500 #1: ffff9e3bc0057e60 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_scheduled_works+0x216/0x500 stack backtrace: CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.6.0-rc1-00065-g934ebd6e5359 #35409 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: events work_for_cpu_fn Call Trace: rcu-torture: rcu_torture_read_exit: Start of episode <TASK> dump_stack_lvl+0x4a/0x80 check_noncircular+0x132/0x150 __lock_acquire+0x1538/0x2500 lock_acquire+0xbf/0x2a0 ? _cpu_down+0x57/0x2b0 percpu_down_write+0x31/0x200 ? _cpu_down+0x57/0x2b0 _cpu_down+0x57/0x2b0 __cpu_down_maps_locked+0x10/0x20 work_for_cpu_fn+0x15/0x20 process_scheduled_works+0x2a7/0x500 worker_thread+0x173/0x330 ? __pfx_worker_thread+0x10/0x10 kthread+0xe6/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2f/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK Fix this with providing one lock class key per work_on_cpu() caller. Reported-and-tested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-08-07workqueue: Make default affinity_scope dynamically updatableTejun Heo
While workqueue.default_affinity_scope is writable, it only affects workqueues which are created afterwards and isn't very useful. Instead, let's introduce explicit "default" scope and update the effective scope dynamically when workqueue.default_affinity_scope is changed. Signed-off-by: Tejun Heo <tj@kernel.org>
2023-08-07workqueue: Implement non-strict affinity scope for unbound workqueuesTejun Heo
An unbound workqueue can be served by multiple worker_pools to improve locality. The segmentation is achieved by grouping CPUs into pods. By default, the cache boundaries according to cpus_share_cache() define the CPUs are grouped. Let's a workqueue is allowed to run on all CPUs and the system has two L3 caches. The workqueue would be mapped to two worker_pools each serving one L3 cache domains. While this improves locality, because the pod boundaries are strict, it limits the total bandwidth a given issuer can consume. For example, let's say there is a thread pinned to a CPU issuing enough work items to saturate the whole machine. With the machine segmented into two pods, no matter how many work items it issues, it can only use half of the CPUs on the system. While this limitation has existed for a very long time, it wasn't very pronounced because the affinity grouping used to be always by NUMA nodes. With cache boundaries as the default and support for even finer grained scopes (smt and cpu), it is now an a lot more pressing problem. This patch implements non-strict affinity scope where the pod boundaries aren't enforced strictly. Going back to the previous example, the workqueue would still be mapped to two worker_pools; however, the affinity enforcement would be soft. The workers in both pools would have their cpus_allowed set to the whole machine thus allowing the scheduler to migrate them anywhere on the machine. However, whenever an idle worker is woken up, the workqueue code asks the scheduler to bring back the task within the pod if the worker is outside. ie. work items start executing within its affinity scope but can be migrated outside as the scheduler sees fit. This removes the hard cap on utilization while maintaining the benefits of affinity scopes. After the earlier ->__pod_cpumask changes, the implementation is pretty simple. When non-strict which is the new default: * pool_allowed_cpus() returns @pool->attrs->cpumask instead of ->__pod_cpumask so that the workers are allowed to run on any CPU that the associated workqueues allow. * If the idle worker task's ->wake_cpu is outside the pod, kick_pool() sets the field to a CPU within the pod. This would be the first use of task_struct->wake_cpu outside scheduler proper, so it isn't clear whether this would be acceptable. However, other methods of migrating tasks are significantly more expensive and are likely prohibitively so if we want to do this on every work item. This needs discussion with scheduler folks. There is also a race window where setting ->wake_cpu wouldn't be effective as the target task is still on CPU. However, the window is pretty small and this being a best-effort optimization, it doesn't seem to warrant more complexity at the moment. While the non-strict cache affinity scopes seem to be the best option, the performance picture interacts with the affinity scope and is a bit complicated to fully discuss in this patch, so the behavior is made easily selectable through wqattrs and sysfs and the next patch will add documentation to discuss performance implications. v2: pool->attrs->affn_strict is set to true for per-cpu worker_pools. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2023-08-07workqueue: Add workqueue_attrs->__pod_cpumaskTejun Heo
workqueue_attrs has two uses: * to specify the required unouned workqueue properties by users * to match worker_pool's properties to workqueues by core code For example, if the user wants to restrict a workqueue to run only CPUs 0 and 2, and the two CPUs are on different affinity scopes, the workqueue's attrs->cpumask would contains CPUs 0 and 2, and the workqueue would be associated with two worker_pools, one with attrs->cpumask containing just CPU 0 and the other CPU 2. Workqueue wants to support non-strict affinity scopes where work items are started in their matching affinity scopes but the scheduler is free to migrate them outside the starting scopes, which can enable utilizing the whole machine while maintaining most of the locality benefits from affinity scopes. To enable that, worker_pools need to distinguish the strict affinity that it has to follow (because that's the restriction coming from the user) and the soft affinity that it wants to apply when dispatching work items. Note that two worker_pools with different soft dispatching requirements have to be separate; otherwise, for example, we'd be ping-ponging worker threads across NUMA boundaries constantly. This patch adds workqueue_attrs->__pod_cpumask. The new field is double underscored as it's only used internally to distinguish worker_pools. A worker_pool's ->cpumask is now always the same as the online subset of allowed CPUs of the associated workqueues, and ->__pod_cpumask is the pod's subset of that ->cpumask. Going back to the example above, both worker_pools would have ->cpumask containing both CPUs 0 and 2 but one's ->__pod_cpumask would contain 0 while the other's 2. * pool_allowed_cpus() is added. It returns the worker_pool's strict cpumask that the pool's workers must stay within. This is currently always ->__pod_cpumask as all boundaries are still strict. * As a workqueue_attrs can now track both the associated workqueues' cpumask and its per-pod subset, wq_calc_pod_cpumask() no longer needs an external out-argument. Drop @cpumask and instead store the result in ->__pod_cpumask. * The above also simplifies apply_wqattrs_prepare() as the same workqueue_attrs can be used to create all pods associated with a workqueue. tmp_attrs is dropped. * wq_update_pod() is updated to use wqattrs_equal() to test whether a pwq update is needed instead of only comparing ->cpumask so that ->__pod_cpumask is compared too. It can directly compare ->__pod_cpumaks but the code is easier to understand and more robust this way. The only user-visible behavior change is that two workqueues with different cpumasks no longer can share worker_pools even when their pod subsets coincide. Going back to the example, let's say there's another workqueue with cpumask 0, 2, 3, where 2 and 3 are in the same pod. It would be mapped to two worker_pools - one with CPU 0, the other with 2 and 3. The former has the same cpumask as the first pod of the earlier example and would have shared the same worker_pool but that's no longer the case after this patch. The worker_pools would have the same ->__pod_cpumask but their ->cpumask's wouldn't match. While this is necessary to support non-strict affinity scopes, there can be further optimizations to maintain sharing among strict affinity scopes. However, non-strict affinity scopes are going to be preferable for most use cases and we don't see very diverse mixture of unbound workqueue cpumasks anyway, so the additional overhead doesn't seem to justify the extra complexity. v2: - wq_update_pod() was incorrectly comparing target_attrs->__pod_cpumask to pool->attrs->cpumask instead of its ->__pod_cpumask. Fix it by using wqattrs_equal() for comparison instead. - Per-cpu worker pools weren't initializing ->__pod_cpumask which caused a subtle problem later on. Set it to cpumask_of(cpu) like ->cpumask. Signed-off-by: Tejun Heo <tj@kernel.org>
2023-08-07workqueue: Add multiple affinity scopes and interface to select themTejun Heo
Add three more affinity scopes - WQ_AFFN_CPU, SMT and CACHE - and make CACHE the default. The code changes to actually add the additional scopes are trivial. Also add module parameter "workqueue.default_affinity_scope" to override the default scope and "affinity_scope" sysfs file to configure it per workqueue. wq_dump.py and documentations are updated accordingly. This enables significant flexibility in configuring how unbound workqueues behave. If affinity scope is set to "cpu", it'll behave close to a per-cpu workqueue. On the other hand, "system" removes all locality boundaries. Many modern machines have multiple L3 caches often while being mostly uniform in terms of memory access. Thus, workqueue's previous behavior of spreading work items in each NUMA node had negative performance implications from unncessarily crossing L3 boundaries between issue and execution. However, picking a finer grained affinity scope also has a downside in that an issuer in one group can't utilize CPUs in other groups. While dependent on the specifics of workload, there's usually a noticeable penalty in crossing L3 boundaries, so let's default to CACHE. This issue will be further addressed and documented with examples in future patches. Signed-off-by: Tejun Heo <tj@kernel.org>
2023-08-07workqueue: Generalize unbound CPU podsTejun Heo
While renamed to pod, the code still assumes that the pods are defined by NUMA boundaries. Let's generalize it: * workqueue_attrs->affn_scope is added. Each enum represents the type of boundaries that define the pods. There are currently two scopes - WQ_AFFN_NUMA and WQ_AFFN_SYSTEM. The former is the same behavior as before - one pod per NUMA node. The latter defines one global pod across the whole system. * struct wq_pod_type is added which describes how pods are configured for each affnity scope. For each pod, it lists the member CPUs and the preferred NUMA node for memory allocations. The reverse mapping from CPU to pod is also available. * wq_pod_enabled is dropped. Pod is now always enabled. The previously disabled behavior is now implemented through WQ_AFFN_SYSTEM. * get_unbound_pool() wants to determine the NUMA node to allocate memory from for the new pool. The variables are renamed from node to pod but the logic still assumes they're one and the same. Clearly distinguish them - walk the WQ_AFFN_NUMA pods to find the matching pod and then use the pod's NUMA node. * wq_calc_pod_cpumask() was taking @pod but assumed that it was the NUMA node. Take @cpu instead and determine the cpumask to use from the pod_type matching @attrs. * apply_wqattrs_prepare() is update to return ERR_PTR() on error instead of NULL so that it can indicate -EINVAL on invalid affinity scopes. This patch allows CPUs to be grouped into pods however desired per type. While this patch causes some internal behavior changes, nothing material should change for workqueue users. v2: Trigger WARN_ON_ONCE() in wqattrs_pod_type() if affn_scope is WQ_AFFN_NR_TYPES which indicates that the function is called with a worker_pool's attrs instead of a workqueue's. Signed-off-by: Tejun Heo <tj@kernel.org>
2023-08-07workqueue: Initialize unbound CPU pods later in the bootTejun Heo
During boot, to initialize unbound CPU pods, wq_pod_init() was called from workqueue_init(). This is early enough for NUMA nodes to be set up but before SMP is brought up and CPU topology information is populated. Workqueue is in the process of improving CPU locality for unbound workqueues and will need access to topology information during pod init. This adds a new init function workqueue_init_topology() which is called after CPU topology information is available and replaces wq_pod_init(). As unbound CPU pods are now initialized after workqueues are activated, we need to revisit the workqueues to apply the pod configuration. Workqueues which are created before workqueue_init_topology() are set up so that they always use the default worker pool. After pods are set up in workqueue_init_topology(), wq_update_pod() is called on all existing workqueues to update the pool associations accordingly. Note that wq_update_pod_attrs_buf allocation is moved to workqueue_init_early(). This isn't necessary right now but enables further generalization of pod handling in the future. This patch changes the initialization sequence but the end result should be the same. Signed-off-by: Tejun Heo <tj@kernel.org>
2023-08-07workqueue: Rename workqueue_attrs->no_numa to ->orderedTejun Heo
With the recent removal of NUMA related module param and sysfs knob, workqueue_attrs->no_numa is now only used to implement ordered workqueues. Let's rename the field so that it's less confusing especially with the planned CPU affinity awareness improvements. Just a rename. No functional changes. Signed-off-by: Tejun Heo <tj@kernel.org>
2023-08-07workqueue: Make unbound workqueues to use per-cpu pool_workqueuesTejun Heo
A pwq (pool_workqueue) represents an association between a workqueue and a worker_pool. When a work item is queued, the workqueue selects the pwq to use, which in turn determines the pool, and queues the work item to the pool through the pwq. pwq is also what implements the maximum concurrency limit - @max_active. As a per-cpu workqueue should be assocaited with a different worker_pool on each CPU, it always had per-cpu pwq's that are accessed through wq->cpu_pwq. However, unbound workqueues were sharing a pwq within each NUMA node by default. The sharing has several downsides: * Because @max_active is per-pwq, the meaning of @max_active changes depending on the machine configuration and whether workqueue NUMA locality support is enabled. * Makes per-cpu and unbound code deviate. * Gets in the way of making workqueue CPU locality awareness more flexible. This patch makes unbound workqueues use per-cpu pwq's the same way per-cpu workqueues do by making the following changes: * wq->numa_pwq_tbl[] is removed and unbound workqueues now use wq->cpu_pwq just like per-cpu workqueues. wq->cpu_pwq is now RCU protected for unbound workqueues. * numa_pwq_tbl_install() is renamed to install_unbound_pwq() and installs the specified pwq to the target CPU's wq->cpu_pwq. * apply_wqattrs_prepare() now always allocates a separate pwq for each CPU unless the workqueue is ordered. If ordered, all CPUs use wq->dfl_pwq. This makes the return value of wq_calc_node_cpumask() unnecessary. It now returns void. * @max_active now means the same thing for both per-cpu and unbound workqueues. WQ_UNBOUND_MAX_ACTIVE now equals WQ_MAX_ACTIVE and documentation is updated accordingly. WQ_UNBOUND_MAX_ACTIVE is no longer used in workqueue implementation and will be removed later. * All unbound pwq operations which used to be per-numa-node are now per-cpu. For most unbound workqueue users, this shouldn't cause noticeable changes. Work item issue and completion will be a small bit faster, flush_workqueue() would become a bit more expensive, and the total concurrency limit would likely become higher. All @max_active==1 use cases are currently being audited for conversion into alloc_ordered_workqueue() and they shouldn't be affected once the audit and conversion is complete. One area where the behavior change may be more noticeable is workqueue_congested() as the reported congestion state is now per CPU instead of NUMA node. There are only two users of this interface - drivers/infiniband/hw/hfi1 and net/smc. Maintainers of both subsystems are cc'd. Inputs on the behavior change would be very much appreciated. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Karsten Graul <kgraul@linux.ibm.com> Cc: Wenjia Zhang <wenjia@linux.ibm.com> Cc: Jan Karcher <jaka@linux.ibm.com>
2023-07-10workqueue: Warn attempt to flush system-wide workqueues.Tetsuo Handa
Based on commit c4f135d643823a86 ("workqueue: Wrap flush_workqueue() using a macro"), all in-tree users stopped flushing system-wide workqueues. Therefore, start emitting runtime message so that all out-of-tree users will understand that they need to update their code. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-06-23workqueue: clean up WORK_* constant types, clarify maskingLinus Torvalds
Dave Airlie reports that gcc-13.1.1 has started complaining about some of the workqueue code in 32-bit arm builds: kernel/workqueue.c: In function ‘get_work_pwq’: kernel/workqueue.c:713:24: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] 713 | return (void *)(data & WORK_STRUCT_WQ_DATA_MASK); | ^ [ ... a couple of other cases ... ] and while it's not immediately clear exactly why gcc started complaining about it now, I suspect it's some C23-induced enum type handlign fixup in gcc-13 is the cause. Whatever the reason for starting to complain, the code and data types are indeed disgusting enough that the complaint is warranted. The wq code ends up creating various "helper constants" (like that WORK_STRUCT_WQ_DATA_MASK) using an enum type, which is all kinds of confused. The mask needs to be 'unsigned long', not some unspecified enum type. To make matters worse, the actual "mask and cast to a pointer" is repeated a couple of times, and the cast isn't even always done to the right pointer, but - as the error case above - to a 'void *' with then the compiler finishing the job. That's now how we roll in the kernel. So create the masks using the proper types rather than some ambiguous enumeration, and use a nice helper that actually does the type conversion in one well-defined place. Incidentally, this magically makes clang generate better code. That, admittedly, is really just a sign of clang having been seriously confused before, and cleaning up the typing unconfuses the compiler too. Reported-by: Dave Airlie <airlied@gmail.com> Link: https://lore.kernel.org/lkml/CAPM=9twNnV4zMCvrPkw3H-ajZOH-01JVh_kDrxdPYQErz8ZTdA@mail.gmail.com/ Cc: Arnd Bergmann <arnd@arndb.de> Cc: Tejun Heo <tj@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-23workqueue: Introduce show_freezable_workqueuesJungseung Lee
Currently show_all_workqueue is called if freeze fails at the time of freeze the workqueues, which shows the status of all workqueues and of all worker pools. In this cases we may only need to dump state of only workqueues that are freezable and busy. This patch defines show_freezable_workqueues, which uses show_one_workqueue, a granular function that shows the state of individual workqueues, so that dump only the state of freezable workqueues at that time. tj: Minor message adjustment. Signed-off-by: Jungseung Lee <js07.lee@samsung.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2023-01-04workqueue: Add a new flag to spot the potential UAF errorRichard Clark
Currently if the user queues a new work item unintentionally into a wq after the destroy_workqueue(wq), the work still can be queued and scheduled without any noticeable kernel message before the end of a RCU grace period. As a debug-aid facility, this commit adds a new flag __WQ_DESTROYING to spot that issue by triggering a kernel WARN message. Signed-off-by: Richard Clark <richard.xnu.clark@gmail.com> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2022-07-12Merge tag 'amd-drm-next-5.20-2022-07-05' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.20-2022-07-05: amdgpu: - Various spelling and grammer fixes - Various eDP fixes - Various DMCUB fixes - VCN fixes - GMC 11 fixes - RAS fixes - TMZ support for GC 10.3.7 - GPUVM TLB flush fixes - SMU 13.0.x updates - DCN 3.2 Support - DCN 3.2.1 Support - MES updates - GFX11 modifiers support - USB-C fixes - MMHUB 3.0.1 support - SDMA 6.0 doorbell fixes - Initial devcoredump support - Enable high priority gfx queue on asics which support it - Enable GPU reset for SMU 13.0.4 - OLED display fixes - MPO fixes - DC frame size fixes - ASPM support for PCIE 7.4/7.6 - GPU reset support for SMU 13.0.0 - GFX11 updates - VCN JPEG fix - BACO support for SMU 13.0.7 - VCN instance handling fix - GFX8 GPUVM TLB flush fix - GPU reset rework - VCN 4.0.2 support - GTT size fixes - DP link training fixes - LSDMA 6.0.1 support - Various backlight fixes - Color encoding fixes - Backlight config cleanup - VCN 4.x unified queue cleanup amdkfd: - MMU notifier fixes - Updates for GC 10.3.6 and 10.3.7 - P2P DMA support using dma-buf - Add available memory IOCTL - SDMA 6.0.1 fix - MES fixes - HMM profiler support radeon: - License fix - Backlight config cleanup UAPI: - Add available memory IOCTL to amdkfd Proposed userspace: https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg75743.html - HMM profiler support for amdkfd Proposed userspace: https://lists.freedesktop.org/archives/amd-gfx/2022-June/080805.html Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220705212633.6037-1-alexander.deucher@amd.com
2022-06-11workqueue: Switch to new kerneldoc syntax for named variable macro argumentJonathan Neuschäfer
The syntax without dots is available since commit 43756e347f21 ("scripts/kernel-doc: Add support for named variable macro arguments"). The same HTML output is produced with and without this patch. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2022-06-10Revert "workqueue: remove unused cancel_work()"Andrey Grodzovsky
This reverts commit 6417250d3f894e66a68ba1cd93676143f2376a6f. amdpgu need this function in order to prematurly stop pending reset works when another reset work already in progress. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Lai Jiangshan<jiangshanlai@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-07workqueue: Wrap flush_workqueue() using a macroTetsuo Handa
Since flush operation synchronously waits for completion, flushing system-wide WQs (e.g. system_wq) might introduce possibility of deadlock due to unexpected locking dependency. Tejun Heo commented at [1] that it makes no sense at all to call flush_workqueue() on the shared WQs as the caller has no idea what it's gonna end up waiting for. Although there is flush_scheduled_work() which flushes system_wq WQ with "Think twice before calling this function! It's very easy to get into trouble if you don't take great care." warning message, syzbot found a circular locking dependency caused by flushing system_wq WQ [2]. Therefore, let's change the direction to that developers had better use their local WQs if flush_scheduled_work()/flush_workqueue(system_*_wq) is inevitable. Steps for converting system-wide WQs into local WQs are explained at [3], and a conversion to stop flushing system-wide WQs is in progress. Now we want some mechanism for preventing developers who are not aware of this conversion from again start flushing system-wide WQs. Since I found that WARN_ON() is complete but awkward approach for teaching developers about this problem, let's use __compiletime_warning() for incomplete but handy approach. For completeness, we will also insert WARN_ON() into __flush_workqueue() after all in-tree users stopped calling flush_scheduled_work(). Link: https://lore.kernel.org/all/YgnQGZWT%2Fn3VAITX@slm.duckdns.org/ [1] Link: https://syzkaller.appspot.com/bug?extid=bde0f89deacca7c765b8 [2] Link: https://lkml.kernel.org/r/49925af7-78a8-a3dd-bce6-cfc02e1a9236@I-love.SAKURA.ne.jp [3] Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-10-20workqueue: Introduce show_one_worker_pool and show_one_workqueue.Imran Khan
Currently show_workqueue_state shows the state of all workqueues and of all worker pools. In certain cases we may need to dump state of only a specific workqueue or worker pool. For example in destroy_workqueue we only need to show state of the workqueue which is getting destroyed. So rename show_workqueue_state to show_all_workqueues(to signify it dumps state of all busy workqueues) and divide it into more granular functions (show_one_workqueue and show_one_worker_pool), that would show states of individual workqueues and worker pools and can be used in cases such as the one mentioned above. Also, as mentioned earlier, make destroy_workqueue dump data pertaining to only the workqueue that is being destroyed and make user(s) of earlier interface(show_workqueue_state), use new interface (show_all_workqueues). Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-09-13workqueue: annotate alloc_workqueue() as printfRolf Eike Beer
This also enables checking of allows alloc_ordered_workqueue(). Signed-off-by: Rolf Eike Beer <eb@emlix.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-08-17workqueue: Remove unused WORK_NO_COLORLai Jiangshan
WORK_NO_COLOR has no user now, just remove it. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-08-17workqueue: Rename "delayed" (delayed by active management) to "inactive"Lai Jiangshan
There are two kinds of "delayed" work items in workqueue subsystem. One is for timer-delayed work items which are visible to workqueue users. The other kind is for work items delayed by active management which can not be directly visible to workqueue users. We mixed the word "delayed" for both kinds and caused somewhat ambiguity. This patch renames the later one (delayed by active management) to "inactive", because it is used for workqueue active management and most of its related symbols are named with "active" or "activate". All "delayed" and "DELAYED" are carefully checked and renamed one by one to avoid accidentally changing the name of the other kind for timer-delayed. No functional change intended. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-08-09workqueue: Fix typo in commentsCai Huoqing
Fix typo: *assing ==> assign *alloced ==> allocated *Retun ==> Return *excute ==> execute v1->v2: *reverse 'iff' *update changelog Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-01-19workqueue: fix annotation for WQ_SYSFSMenglong Dong
'wq_sysfs_register()' in annotation for 'WQ_SYSFS' is unavailable, change it to 'workqueue_sysfs_register()'. Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2020-06-01workqueue: fix a piece of comment about reserved bits for work flagsLai Jiangshan
8a2e8e5dec7e("workqueue: fix cwq->nr_active underflow") allocated one more bit from the work flags, and it updated partial of the comments (128 bytes -> 256 bytes), but it failed to update the info about the number of reserved bits. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2020-04-03Merge branch 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
Pull workqueue updates from Tejun Heo: "Nothing too interesting. Just two trivial patches" * 'for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Mark up unlocked access to wq->first_flusher workqueue: Make workqueue_init*() return void
2020-03-04workqueue: Make workqueue_init*() return voidYu Chen
The return values of workqueue_init() and workqueue_early_int() are always 0, and there is no usage of their return value. So just make them return void. Signed-off-by: Yu Chen <chen.yu@easystack.cn> Signed-off-by: Tejun Heo <tj@kernel.org>
2020-02-12workqueue: Document (some) memory-ordering properties of {queue,schedule}_work()Andrea Parri
It's desirable to be able to rely on the following property: All stores preceding (in program order) a call to a successful queue_work() will be visible from the CPU which will execute the queued work by the time such work executes, e.g., { x is initially 0 } CPU0 CPU1 WRITE_ONCE(x, 1); [ "work" is being executed ] r0 = queue_work(wq, work); r1 = READ_ONCE(x); Forbids: r0 == true && r1 == 0 The current implementation of queue_work() provides such memory-ordering property: - In __queue_work(), the ->lock spinlock is acquired. - On the other side, in worker_thread(), this same ->lock is held when dequeueing work. So the locking ordering makes things work out. Add this property to the DocBook headers of {queue,schedule}_work(). Suggested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2019-09-13workqueue: unconfine alloc/apply/free_workqueue_attrs()Daniel Jordan
padata will use these these interfaces in a later patch, so unconfine them. Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-06-27workqueue: Make alloc/apply/free_workqueue_attrs() staticThomas Gleixner
None of those functions have any users outside of workqueue.c. Confine them. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Tejun Heo <tj@kernel.org>
2019-03-06Merge tag 'driver-core-5.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big driver core patchset for 5.1-rc1 More patches than "normal" here this merge window, due to some work in the driver core by Alexander Duyck to rework the async probe functionality to work better for a number of devices, and independant work from Rafael for the device link functionality to make it work "correctly". Also in here is: - lots of BUS_ATTR() removals, the macro is about to go away - firmware test fixups - ihex fixups and simplification - component additions (also includes i915 patches) - lots of minor coding style fixups and cleanups. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (65 commits) driver core: platform: remove misleading err_alloc label platform: set of_node in platform_device_register_full() firmware: hardcode the debug message for -ENOENT driver core: Add missing description of new struct device_link field driver core: Fix PM-runtime for links added during consumer probe drivers/component: kerneldoc polish async: Add cmdline option to specify drivers to be async probed driver core: Fix possible supplier PM-usage counter imbalance PM-runtime: Fix __pm_runtime_set_status() race with runtime resume driver: platform: Support parsing GpioInt 0 in platform_get_irq() selftests: firmware: fix verify_reqs() return value Revert "selftests: firmware: remove use of non-standard diff -Z option" Revert "selftests: firmware: add CONFIG_FW_LOADER_USER_HELPER_FALLBACK to config" device: Fix comment for driver_data in struct device kernfs: Allocating memory for kernfs_iattrs with kmem_cache. sysfs: remove unused include of kernfs-internal.h driver core: Postpone DMA tear-down until after devres release driver core: Document limitation related to DL_FLAG_RPM_ACTIVE PM-runtime: Take suppliers into account in __pm_runtime_set_status() device.h: Add __cold to dev_<level> logging functions ...
2019-02-28kernel/workqueue: Use dynamic lockdep keys for workqueuesBart Van Assche
The following commit: 87915adc3f0a ("workqueue: re-add lockdep dependencies for flushing") improved deadlock checking in the workqueue implementation. Unfortunately that patch also introduced a few false positive lockdep complaints. This patch suppresses these false positives by allocating the workqueue mutex lockdep key dynamically. An example of a false positive lockdep complaint suppressed by this patch can be found below. The root cause of the lockdep complaint shown below is that the direct I/O code can call alloc_workqueue() from inside a work item created by another alloc_workqueue() call and that both workqueues share the same lockdep key. This patch avoids that that lockdep complaint is triggered by allocating the work queue lockdep keys dynamically. In other words, this patch guarantees that a unique lockdep key is associated with each work queue mutex. ====================================================== WARNING: possible circular locking dependency detected 4.19.0-dbg+ #1 Not tainted fio/4129 is trying to acquire lock: 00000000a01cfe1a ((wq_completion)"dio/%s"sb->s_id){+.+.}, at: flush_workqueue+0xd0/0x970 but task is already holding lock: 00000000a0acecf9 (&sb->s_type->i_mutex_key#14){+.+.}, at: ext4_file_write_iter+0x154/0x710 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&sb->s_type->i_mutex_key#14){+.+.}: down_write+0x3d/0x80 __generic_file_fsync+0x77/0xf0 ext4_sync_file+0x3c9/0x780 vfs_fsync_range+0x66/0x100 dio_complete+0x2f5/0x360 dio_aio_complete_work+0x1c/0x20 process_one_work+0x481/0x9f0 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 -> #1 ((work_completion)(&dio->complete_work)){+.+.}: process_one_work+0x447/0x9f0 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 -> #0 ((wq_completion)"dio/%s"sb->s_id){+.+.}: lock_acquire+0xc5/0x200 flush_workqueue+0xf3/0x970 drain_workqueue+0xec/0x220 destroy_workqueue+0x23/0x350 sb_init_dio_done_wq+0x6a/0x80 do_blockdev_direct_IO+0x1f33/0x4be0 __blockdev_direct_IO+0x79/0x86 ext4_direct_IO+0x5df/0xbb0 generic_file_direct_write+0x119/0x220 __generic_file_write_iter+0x131/0x2d0 ext4_file_write_iter+0x3fa/0x710 aio_write+0x235/0x330 io_submit_one+0x510/0xeb0 __x64_sys_io_submit+0x122/0x340 do_syscall_64+0x71/0x220 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Chain exists of: (wq_completion)"dio/%s"sb->s_id --> (work_completion)(&dio->complete_work) --> &sb->s_type->i_mutex_key#14 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sb->s_type->i_mutex_key#14); lock((work_completion)(&dio->complete_work)); lock(&sb->s_type->i_mutex_key#14); lock((wq_completion)"dio/%s"sb->s_id); *** DEADLOCK *** 1 lock held by fio/4129: #0: 00000000a0acecf9 (&sb->s_type->i_mutex_key#14){+.+.}, at: ext4_file_write_iter+0x154/0x710 stack backtrace: CPU: 3 PID: 4129 Comm: fio Not tainted 4.19.0-dbg+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x86/0xc5 print_circular_bug.isra.32+0x20a/0x218 __lock_acquire+0x1c68/0x1cf0 lock_acquire+0xc5/0x200 flush_workqueue+0xf3/0x970 drain_workqueue+0xec/0x220 destroy_workqueue+0x23/0x350 sb_init_dio_done_wq+0x6a/0x80 do_blockdev_direct_IO+0x1f33/0x4be0 __blockdev_direct_IO+0x79/0x86 ext4_direct_IO+0x5df/0xbb0 generic_file_direct_write+0x119/0x220 __generic_file_write_iter+0x131/0x2d0 ext4_file_write_iter+0x3fa/0x710 aio_write+0x235/0x330 io_submit_one+0x510/0xeb0 __x64_sys_io_submit+0x122/0x340 do_syscall_64+0x71/0x220 entry_SYSCALL_64_after_hwframe+0x49/0xbe Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Berg <johannes.berg@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Link: https://lkml.kernel.org/r/20190214230058.196511-20-bvanassche@acm.org [ Reworked the changelog a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-01-31workqueue: Provide queue_work_node to queue work near a given NUMA nodeAlexander Duyck
Provide a new function, queue_work_node, which is meant to schedule work on a "random" CPU of the requested NUMA node. The main motivation for this is to help assist asynchronous init to better improve boot times for devices that are local to a specific node. For now we just default to the first CPU that is in the intersection of the cpumask of the node and the online cpumask. The only exception is if the CPU is local to the node we will just use the current CPU. This should work for our purposes as we are currently only using this for unbound work so the CPU will be translated to a node anyway instead of being directly used. As we are only using the first CPU to represent the NUMA node for now I am limiting the scope of the function so that it can only be used with unbound workqueues. Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-18workqueue: Show the latest workqueue name in /proc/PID/{comm,stat,status}Tejun Heo
There can be a lot of workqueue workers and they all show up with the cryptic kworker/* names making it difficult to understand which is doing what and how they came to be. # ps -ef | grep kworker root 4 2 0 Feb25 ? 00:00:00 [kworker/0:0H] root 6 2 0 Feb25 ? 00:00:00 [kworker/u112:0] root 19 2 0 Feb25 ? 00:00:00 [kworker/1:0H] root 25 2 0 Feb25 ? 00:00:00 [kworker/2:0H] root 31 2 0 Feb25 ? 00:00:00 [kworker/3:0H] ... This patch makes workqueue workers report the latest workqueue it was executing for through /proc/PID/{comm,stat,status}. The extra information is appended to the kthread name with intervening '+' if currently executing, otherwise '-'. # cat /proc/25/comm kworker/2:0-events_power_efficient # cat /proc/25/stat 25 (kworker/2:0-events_power_efficient) I 2 0 0 0 -1 69238880 0 0... # grep Name /proc/25/status Name: kworker/2:0-events_power_efficient Unfortunately, ps(1) truncates comm to 15 characters, # ps 25 PID TTY STAT TIME COMMAND 25 ? I 0:00 [kworker/2:0-eve] making it a lot less useful; however, this should be an easy fix from ps(1) side. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Craig Small <csmall@enc.com.au>
2018-04-03Merge branch 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
Pull workqueue updates from Tejun Heo: "rcu_work addition and a couple trivial changes" * 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: remove the comment about the old manager_arb mutex workqueue: fix the comments of nr_idle fs/aio: Use rcu_work instead of explicit rcu and work item cgroup: Use rcu_work instead of explicit rcu and work item RCU, workqueue: Implement rcu_work
2018-03-19RCU, workqueue: Implement rcu_workTejun Heo
There are cases where RCU callback needs to be bounced to a sleepable context. This is currently done by the RCU callback queueing a work item, which can be cumbersome to write and confusing to read. This patch introduces rcu_work, a workqueue work variant which gets executed after a RCU grace period, and converts the open coded bouncing in fs/aio and kernel/cgroup. v3: Dropped queue_rcu_work_on(). Documented rcu grace period behavior after queue_rcu_work(). v2: Use rcu_barrier() instead of synchronize_rcu() to wait for completion of previously queued rcu callback as per Paul. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2018-03-13workqueue: remove unused cancel_work()Stephen Hemminger
Found this by accident. There are no usages of bare cancel_work() in current kernel source. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2018-02-16workqueue: Allow retrieval of current task's work structLukas Wunner
Introduce a helper to retrieve the current task's work struct if it is a workqueue worker. This allows us to fix a long-standing deadlock in several DRM drivers wherein the ->runtime_suspend callback waits for a specific worker to finish and that worker in turn calls a function which waits for runtime suspend to finish. That function is invoked from multiple call sites and waiting for runtime suspend to finish is the correct thing to do except if it's executing in the context of the worker. Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Link: https://patchwork.freedesktop.org/patch/msgid/2d8f603074131eb87e588d2b803a71765bd3a2fd.1518338788.git.lukas@wunner.de
2017-11-21treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE castsKees Cook
With all callbacks converted, and the timer callback prototype switched over, the TIMER_FUNC_TYPE cast is no longer needed, so remove it. Conversion was done with the following scripts: perl -pi -e 's|\(TIMER_FUNC_TYPE\)||g' \ $(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u) perl -pi -e 's|\(TIMER_DATA_TYPE\)||g' \ $(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u) The now unused macros are also dropped from include/linux/timer.h. Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21timer: Remove redundant __setup_timer*() macrosKees Cook
With __init_timer*() now matching __setup_timer*(), remove the redundant internal interface, clean up the resulting definitions and add more documentation. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Shaohua Li <shli@fb.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21timer: Remove unused data arguments from macrosKees Cook
With the .data field removed, the ignored data arguments in timer macros can be removed. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Shaohua Li <shli@fb.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-13Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Yet another big pile of changes: - More year 2038 work from Arnd slowly reaching the point where we need to think about the syscalls themself. - A new timer function which allows to conditionally (re)arm a timer only when it's either not running or the new expiry time is sooner than the armed expiry time. This allows to use a single timer for multiple timeout requirements w/o caring about the first expiry time at the call site. - A new NMI safe accessor to clock real time for the printk timestamp work. Can be used by tracing, perf as well if required. - A large number of timer setup conversions from Kees which got collected here because either maintainers requested so or they simply got ignored. As Kees pointed out already there are a few trivial merge conflicts and some redundant commits which was unavoidable due to the size of this conversion effort. - Avoid a redundant iteration in the timer wheel softirq processing. - Provide a mechanism to treat RTC implementations depending on their hardware properties, i.e. don't inflict the write at the 0.5 seconds boundary which originates from the PC CMOS RTC to all RTCs. No functional change as drivers need to be updated separately. - The usual small updates to core code clocksource drivers. Nothing really exciting" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (111 commits) timers: Add a function to start/reduce a timer pstore: Use ktime_get_real_fast_ns() instead of __getnstimeofday() timer: Prepare to change all DEFINE_TIMER() callbacks netfilter: ipvs: Convert timers to use timer_setup() scsi: qla2xxx: Convert timers to use timer_setup() block/aoe: discover_timer: Convert timers to use timer_setup() ide: Convert timers to use timer_setup() drbd: Convert timers to use timer_setup() mailbox: Convert timers to use timer_setup() crypto: Convert timers to use timer_setup() drivers/pcmcia: omap1: Fix error in automated timer conversion ARM: footbridge: Fix typo in timer conversion drivers/sgi-xp: Convert timers to use timer_setup() drivers/pcmcia: Convert timers to use timer_setup() drivers/memstick: Convert timers to use timer_setup() drivers/macintosh: Convert timers to use timer_setup() hwrng/xgene-rng: Convert timers to use timer_setup() auxdisplay: Convert timers to use timer_setup() sparc/led: Convert timers to use timer_setup() mips: ip22/32: Convert timers to use timer_setup() ...
2017-11-07Merge branch 'linus' into locking/core, to resolve conflictsIngo Molnar
Conflicts: include/linux/compiler-clang.h include/linux/compiler-gcc.h include/linux/compiler-intel.h include/uapi/linux/stddef.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-25workqueue: Remove now redundant lock acquisitions wrt. workqueue flushesByungchul Park
The workqueue code added manual lock acquisition annotations to catch deadlocks. After lockdepcrossrelease was introduced, some of those became redundant, since wait_for_completion() already does the acquisition and tracking. Remove the duplicate annotations. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: amir73il@gmail.com Cc: axboe@kernel.dk Cc: darrick.wong@oracle.com Cc: david@fromorbit.com Cc: hch@infradead.org Cc: idryomov@gmail.com Cc: johan@kernel.org Cc: johannes.berg@intel.com Cc: kernel-team@lge.com Cc: linux-block@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-xfs@vger.kernel.org Cc: oleg@redhat.com Cc: tj@kernel.org Link: http://lkml.kernel.org/r/1508921765-15396-9-git-send-email-byungchul.park@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-05workqueue: Convert callback to use from_timer()Kees Cook
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch workqueue to use from_timer() and pass the timer pointer explicitly. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mips@linux-mips.org Cc: Petr Mladek <pmladek@suse.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Sebastian Reichel <sre@kernel.org> Cc: Kalle Valo <kvalo@qca.qualcomm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: linux1394-devel@lists.sourceforge.net Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: linux-s390@vger.kernel.org Cc: linux-wireless@vger.kernel.org Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: Wim Van Sebroeck <wim@iguana.be> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ursula Braun <ubraun@linux.vnet.ibm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Harish Patil <harish.patil@cavium.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Manish Chopra <manish.chopra@cavium.com> Cc: Len Brown <len.brown@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linux-pm@vger.kernel.org Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Julian Wiedmann <jwi@linux.vnet.ibm.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Mark Gross <mark.gross@intel.com> Cc: linux-watchdog@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: Michael Reed <mdr@sgi.com> Cc: netdev@vger.kernel.org Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Link: https://lkml.kernel.org/r/1507159627-127660-14-git-send-email-keescook@chromium.org
2017-10-05timer: Remove expires argument from __TIMER_INITIALIZER()Kees Cook
The expires field is normally initialized during the first mod_timer() call. It was unused by all callers, so remove it from the macro. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: linux-mips@linux-mips.org Cc: Petr Mladek <pmladek@suse.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Sebastian Reichel <sre@kernel.org> Cc: Kalle Valo <kvalo@qca.qualcomm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: linux1394-devel@lists.sourceforge.net Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: linux-s390@vger.kernel.org Cc: linux-wireless@vger.kernel.org Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: Wim Van Sebroeck <wim@iguana.be> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ursula Braun <ubraun@linux.vnet.ibm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Harish Patil <harish.patil@cavium.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Michael Reed <mdr@sgi.com> Cc: Manish Chopra <manish.chopra@cavium.com> Cc: Len Brown <len.brown@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linux-pm@vger.kernel.org Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Julian Wiedmann <jwi@linux.vnet.ibm.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Mark Gross <mark.gross@intel.com> Cc: linux-watchdog@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: Guenter Roeck <linux@roeck-us.net> Cc: netdev@vger.kernel.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Link: https://lkml.kernel.org/r/1507159627-127660-12-git-send-email-keescook@chromium.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-09-05workqueue: Fix flag collisionBen Hutchings
Commit 0a94efb5acbb ("workqueue: implicit ordered attribute should be overridable") introduced a __WQ_ORDERED_EXPLICIT flag but gave it the same value as __WQ_LEGACY. I don't believe these were intended to mean the same thing, so renumber __WQ_ORDERED_EXPLICIT. Fixes: 0a94efb5acbb ("workqueue: implicit ordered attribute should be ...") Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: stable@vger.kernel.org # v4.13 Signed-off-by: Tejun Heo <tj@kernel.org>