summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-08-17Merge tag 'xfs-6.11-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Chandan Babu: - Check for presence of only 'attr' feature before scrubbing an inode's attribute fork. - Restore the behaviour of setting AIL thread to TASK_INTERRUPTIBLE for long (i.e. 50ms) sleep durations to prevent high load averages. - Do not allow users to change the realtime flag of a file unless the datadev and rtdev both support fsdax access modes. * tag 'xfs-6.11-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: conditionally allow FS_XFLAG_REALTIME changes if S_DAX is set xfs: revert AIL TASK_KILLABLE threshold xfs: attr forks require attr, not attr2
2024-08-17Merge tag 'bcachefs-2024-08-16' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent OverstreetL - New on disk format version, bcachefs_metadata_version_disk_accounting_inum This adds one more disk accounting counter, which counts disk usage and number of extents per inode number. This lets us track fragmentation, for implementing defragmentation later, and it also counts disk usage per inode in all snapshots, which will be a useful thing to expose to users. - One performance issue we've observed is threads spinning when they should be waiting for dirty keys in the key cache to be flushed by journal reclaim, so we now have hysteresis for the waiting thread, as well as improving the tracepoint and a new time_stat, for tracking time blocked waiting on key cache flushing. ... and various assorted smaller fixes. * tag 'bcachefs-2024-08-16' of git://evilpiepirate.org/bcachefs: bcachefs: Fix locking in __bch2_trans_mark_dev_sb() bcachefs: fix incorrect i_state usage bcachefs: avoid overflowing LRU_TIME_BITS for cached data lru bcachefs: Fix forgetting to pass trans to fsck_err() bcachefs: Increase size of cuckoo hash table on too many rehashes bcachefs: bcachefs_metadata_version_disk_accounting_inum bcachefs: Kill __bch2_accounting_mem_mod() bcachefs: Make bkey_fsck_err() a wrapper around fsck_err() bcachefs: Fix warning in __bch2_fsck_err() for trans not passed in bcachefs: Add a time_stat for blocked on key cache flush bcachefs: Improve trans_blocked_journal_reclaim tracepoint bcachefs: Add hysteresis to waiting on btree key cache flush lib/generic-radix-tree.c: Fix rare race in __genradix_ptr_alloc() bcachefs: Convert for_each_btree_node() to lockrestart_do() bcachefs: Add missing downgrade table entry bcachefs: disk accounting: ignore unknown types bcachefs: bch2_accounting_invalid() fixup bcachefs: Fix bch2_trigger_alloc when upgrading from old versions bcachefs: delete faulty fastpath in bch2_btree_path_traverse_cached()
2024-08-17sched/eevdf: Propagate min_slice up the cgroup hierarchyPeter Zijlstra
In the absence of an explicit cgroup slice configureation, make mixed slice length work with cgroups by propagating the min_slice up the hierarchy. This ensures the cgroup entity gets timely service to service its entities that have this timing constraint set on them. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Valentin Schneider <vschneid@redhat.com> Link: https://lkml.kernel.org/r/20240727105030.948188417@infradead.org
2024-08-17sched/eevdf: Use sched_attr::sched_runtime to set request/slice suggestionPeter Zijlstra
Allow applications to directly set a suggested request/slice length using sched_attr::sched_runtime. The implementation clamps the value to: 0.1[ms] <= slice <= 100[ms] which is 1/10 the size of HZ=1000 and 10 times the size of HZ=100. Applications should strive to use their periodic runtime at a high confidence interval (95%+) as the target slice. Using a smaller slice will introduce undue preemptions, while using a larger value will increase latency. For all the following examples assume a scheduling quantum of 8, and for consistency all examples have W=4: {A,B,C,D}(w=1,r=8): ABCD... +---+---+---+--- t=0, V=1.5 t=1, V=3.5 A |------< A |------< B |------< B |------< C |------< C |------< D |------< D |------< ---+*------+-------+--- ---+--*----+-------+--- t=2, V=5.5 t=3, V=7.5 A |------< A |------< B |------< B |------< C |------< C |------< D |------< D |------< ---+----*--+-------+--- ---+------*+-------+--- Note: 4 identical tasks in FIFO order ~~~ {A,B}(w=1,r=16) C(w=2,r=16) AACCBBCC... +---+---+---+--- t=0, V=1.25 t=2, V=5.25 A |--------------< A |--------------< B |--------------< B |--------------< C |------< C |------< ---+*------+-------+--- ---+----*--+-------+--- t=4, V=8.25 t=6, V=12.25 A |--------------< A |--------------< B |--------------< B |--------------< C |------< C |------< ---+-------*-------+--- ---+-------+---*---+--- Note: 1 heavy task -- because q=8, double r such that the deadline of the w=2 task doesn't go below q. Note: observe the full schedule becomes: W*max(r_i/w_i) = 4*2q = 8q in length. Note: the period of the heavy task is half the full period at: W*(r_i/w_i) = 4*(2q/2) = 4q ~~~ {A,C,D}(w=1,r=16) B(w=1,r=8): BAACCBDD... +---+---+---+--- t=0, V=1.5 t=1, V=3.5 A |--------------< A |---------------< B |------< B |------< C |--------------< C |--------------< D |--------------< D |--------------< ---+*------+-------+--- ---+--*----+-------+--- t=3, V=7.5 t=5, V=11.5 A |---------------< A |---------------< B |------< B |------< C |--------------< C |--------------< D |--------------< D |--------------< ---+------*+-------+--- ---+-------+--*----+--- t=6, V=13.5 A |---------------< B |------< C |--------------< D |--------------< ---+-------+----*--+--- Note: 1 short task -- again double r so that the deadline of the short task won't be below q. Made B short because its not the leftmost task, but is eligible with the 0,1,2,3 spread. Note: like with the heavy task, the period of the short task observes: W*(r_i/w_i) = 4*(1q/1) = 4q ~~~ A(w=1,r=16) B(w=1,r=8) C(w=2,r=16) BCCAABCC... +---+---+---+--- t=0, V=1.25 t=1, V=3.25 A |--------------< A |--------------< B |------< B |------< C |------< C |------< ---+*------+-------+--- ---+--*----+-------+--- t=3, V=7.25 t=5, V=11.25 A |--------------< A |--------------< B |------< B |------< C |------< C |------< ---+------*+-------+--- ---+-------+--*----+--- t=6, V=13.25 A |--------------< B |------< C |------< ---+-------+----*--+--- Note: 1 heavy and 1 short task -- combine them all. Note: both the short and heavy task end up with a period of 4q ~~~ A(w=1,r=16) B(w=2,r=16) C(w=1,r=8) BBCAABBC... +---+---+---+--- t=0, V=1 t=2, V=5 A |--------------< A |--------------< B |------< B |------< C |------< C |------< ---+*------+-------+--- ---+----*--+-------+--- t=3, V=7 t=5, V=11 A |--------------< A |--------------< B |------< B |------< C |------< C |------< ---+------*+-------+--- ---+-------+--*----+--- t=7, V=15 A |--------------< B |------< C |------< ---+-------+------*+--- Note: as before but permuted ~~~ From all this it can be deduced that, for the steady state: - the total period (P) of a schedule is: W*max(r_i/w_i) - the average period of a task is: W*(r_i/w_i) - each task obtains the fair share: w_i/W of each full period P Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Valentin Schneider <vschneid@redhat.com> Link: https://lkml.kernel.org/r/20240727105030.842834421@infradead.org
2024-08-17sched/eevdf: Allow shorter slices to wakeup-preemptPeter Zijlstra
Part of the reason to have shorter slices is to improve responsiveness. Allow shorter slices to preempt longer slices on wakeup. Task | Runtime ms | Switches | Avg delay ms | Max delay ms | Sum delay ms | 100ms massive_intr 500us cyclictest NO_PREEMPT_SHORT 1 massive_intr:(5) | 846018.956 ms | 779188 | avg: 0.273 ms | max: 58.337 ms | sum:212545.245 ms | 2 massive_intr:(5) | 853450.693 ms | 792269 | avg: 0.275 ms | max: 71.193 ms | sum:218263.588 ms | 3 massive_intr:(5) | 843888.920 ms | 771456 | avg: 0.277 ms | max: 92.405 ms | sum:213353.221 ms | 1 chromium-browse:(8) | 53015.889 ms | 131766 | avg: 0.463 ms | max: 36.341 ms | sum:60959.230 ms | 2 chromium-browse:(8) | 53864.088 ms | 136962 | avg: 0.480 ms | max: 27.091 ms | sum:65687.681 ms | 3 chromium-browse:(9) | 53637.904 ms | 132637 | avg: 0.481 ms | max: 24.756 ms | sum:63781.673 ms | 1 cyclictest:(5) | 12615.604 ms | 639689 | avg: 0.471 ms | max: 32.272 ms | sum:301351.094 ms | 2 cyclictest:(5) | 12511.583 ms | 642578 | avg: 0.448 ms | max: 44.243 ms | sum:287632.830 ms | 3 cyclictest:(5) | 12545.867 ms | 635953 | avg: 0.475 ms | max: 25.530 ms | sum:302374.658 ms | 100ms massive_intr 500us cyclictest PREEMPT_SHORT 1 massive_intr:(5) | 839843.919 ms | 837384 | avg: 0.264 ms | max: 74.366 ms | sum:221476.885 ms | 2 massive_intr:(5) | 852449.913 ms | 845086 | avg: 0.252 ms | max: 68.162 ms | sum:212595.968 ms | 3 massive_intr:(5) | 839180.725 ms | 836883 | avg: 0.266 ms | max: 69.742 ms | sum:222812.038 ms | 1 chromium-browse:(11) | 54591.481 ms | 138388 | avg: 0.458 ms | max: 35.427 ms | sum:63401.508 ms | 2 chromium-browse:(8) | 52034.541 ms | 132276 | avg: 0.436 ms | max: 31.826 ms | sum:57732.958 ms | 3 chromium-browse:(8) | 55231.771 ms | 141892 | avg: 0.469 ms | max: 27.607 ms | sum:66538.697 ms | 1 cyclictest:(5) | 13156.391 ms | 667412 | avg: 0.373 ms | max: 38.247 ms | sum:249174.502 ms | 2 cyclictest:(5) | 12688.939 ms | 665144 | avg: 0.374 ms | max: 33.548 ms | sum:248509.392 ms | 3 cyclictest:(5) | 13475.623 ms | 669110 | avg: 0.370 ms | max: 37.819 ms | sum:247673.390 ms | As per the numbers the, this makes cyclictest (short slice) it's max-delay more consistent and consistency drops the sum-delay. The trade-off is that the massive_intr (long slice) gets more context switches and a slight increase in sum-delay. Chunxin contributed did_preempt_short() where a task that lost slice protection from PREEMPT_SHORT gets rescheduled once it becomes in-eligible. [mike: numbers] Co-Developed-by: Chunxin Zang <zangchunxin@lixiang.com> Signed-off-by: Chunxin Zang <zangchunxin@lixiang.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Valentin Schneider <vschneid@redhat.com> Tested-by: Mike Galbraith <umgwanakikbuti@gmail.com> Link: https://lkml.kernel.org/r/20240727105030.735459544@infradead.org
2024-08-17sched/fair: Avoid re-setting virtual deadline on 'migrations'Peter Zijlstra
During OSPM24 Youssef noted that migrations are re-setting the virtual deadline. Notably everything that does a dequeue-enqueue, like setting nice, changing preferred numa-node, and a myriad of other random crap, will cause this to happen. This shouldn't be. Preserve the relative virtual deadline across such dequeue/enqueue cycles. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Tested-by: Valentin Schneider <vschneid@redhat.com> Link: https://lkml.kernel.org/r/20240727105030.625119246@infradead.org
2024-08-17sched/eevdf: Fixup PELT vs DELAYED_DEQUEUEPeter Zijlstra
Note that tasks that are kept on the runqueue to burn off negative lag, are not in fact runnable anymore, they'll get dequeued the moment they get picked. As such, don't count this time towards runnable. Thanks to Valentin for spotting I had this backwards initially. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Tested-by: Valentin Schneider <vschneid@redhat.com> Link: https://lkml.kernel.org/r/20240727105030.514088302@infradead.org
2024-08-17sched/fair: Implement DELAY_ZEROPeter Zijlstra
'Extend' DELAY_DEQUEUE by noting that since we wanted to dequeued them at the 0-lag point, truncate lag (eg. don't let them earn positive lag). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Tested-by: Valentin Schneider <vschneid@redhat.com> Link: https://lkml.kernel.org/r/20240727105030.403750550@infradead.org
2024-08-17sched/fair: Implement delayed dequeuePeter Zijlstra
Extend / fix 86bfbb7ce4f6 ("sched/fair: Add lag based placement") by noting that lag is fundamentally a temporal measure. It should not be carried around indefinitely. OTOH it should also not be instantly discarded, doing so will allow a task to game the system by purposefully (micro) sleeping at the end of its time quantum. Since lag is intimately tied to the virtual time base, a wall-time based decay is also insufficient, notably competition is required for any of this to make sense. Instead, delay the dequeue and keep the 'tasks' on the runqueue, competing until they are eligible. Strictly speaking, we only care about keeping them until the 0-lag point, but that is a difficult proposition, instead carry them around until they get picked again, and dequeue them at that point. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Tested-by: Valentin Schneider <vschneid@redhat.com> Link: https://lkml.kernel.org/r/20240727105030.226163742@infradead.org
2024-08-17sched: Teach dequeue_task() about special task statesPeter Zijlstra
Since special task states must not suffer spurious wakeups, and the proposed delayed dequeue can cause exactly these (under some boundary conditions), propagate this knowledge into dequeue_task() such that it can do the right thing. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Tested-by: Valentin Schneider <vschneid@redhat.com> Link: https://lkml.kernel.org/r/20240727105030.110439521@infradead.org
2024-08-17sched,freezer: Mark TASK_FROZEN specialPeter Zijlstra
The special task states are those that do not suffer spurious wakeups, TASK_FROZEN is very much one of those, mark it as such. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Tested-by: Valentin Schneider <vschneid@redhat.com> Link: https://lkml.kernel.org/r/20240727105029.998329901@infradead.org
2024-08-17sched/fair: Implement ENQUEUE_DELAYEDPeter Zijlstra
Doing a wakeup on a delayed dequeue task is about as simple as it sounds -- remove the delayed mark and enjoy the fact it was actually still on the runqueue. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Tested-by: Valentin Schneider <vschneid@redhat.com> Link: https://lkml.kernel.org/r/20240727105029.888107381@infradead.org
2024-08-17sched/fair: Prepare pick_next_task() for delayed dequeuePeter Zijlstra
Delayed dequeue's natural end is when it gets picked again. Ensure pick_next_task() knows what to do with delayed tasks. Note, this relies on the earlier patch that made pick_next_task() state invariant -- it will restart the pick on dequeue, because obviously the just dequeued task is no longer eligible. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Tested-by: Valentin Schneider <vschneid@redhat.com> Link: https://lkml.kernel.org/r/20240727105029.747330118@infradead.org
2024-08-17sched/fair: Prepare exit/cleanup paths for delayed_dequeuePeter Zijlstra
When dequeue_task() is delayed it becomes possible to exit a task (or cgroup) that is still enqueued. Ensure things are dequeued before freeing. Thanks to Valentin for asking the obvious questions and making switched_from_fair() less weird. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Tested-by: Valentin Schneider <vschneid@redhat.com> Link: https://lkml.kernel.org/r/20240727105029.631948434@infradead.org
2024-08-17sched/fair: Assert {set_next,put_prev}_entity() are properly balancedPeter Zijlstra
Just a little sanity test.. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Tested-by: Valentin Schneider <vschneid@redhat.com> Link: https://lkml.kernel.org/r/20240727105029.486423066@infradead.org
2024-08-17sched/uclamg: Handle delayed dequeuePeter Zijlstra
Delayed dequeue has tasks sit around on the runqueue that are not actually runnable -- specifically, they will be dequeued the moment they get picked. One side-effect is that such a task can get migrated, which leads to a 'nested' dequeue_task() scenario that messes up uclamp if we don't take care. Notably, dequeue_task(DEQUEUE_SLEEP) can 'fail' and keep the task on the runqueue. This however will have removed the task from uclamp -- per uclamp_rq_dec() in dequeue_task(). So far so good. However, if at that point the task gets migrated -- or nice adjusted or any of a myriad of operations that does a dequeue-enqueue cycle -- we'll pass through dequeue_task()/enqueue_task() again. Without modification this will lead to a double decrement for uclamp, which is wrong. Reported-by: Luis Machado <luis.machado@arm.com> Reported-by: Hongyan Xia <hongyan.xia2@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Tested-by: Valentin Schneider <vschneid@redhat.com> Link: https://lkml.kernel.org/r/20240727105029.315205425@infradead.org
2024-08-17sched: Prepare generic code for delayed dequeuePeter Zijlstra
While most of the delayed dequeue code can be done inside the sched_class itself, there is one location where we do not have an appropriate hook, namely ttwu_runnable(). Add an ENQUEUE_DELAYED call to the on_rq path to deal with waking delayed dequeue tasks. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Tested-by: Valentin Schneider <vschneid@redhat.com> Link: https://lkml.kernel.org/r/20240727105029.200000445@infradead.org
2024-08-17sched: Split DEQUEUE_SLEEP from deactivate_task()Peter Zijlstra
As a preparation for dequeue_task() failing, and a second code-path needing to take care of the 'success' path, split out the DEQEUE_SLEEP path from deactivate_task(). Much thanks to Libo for spotting and fixing a TASK_ON_RQ_MIGRATING ordering fail. Fixed-by: Libo Chen <libo.chen@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Tested-by: Valentin Schneider <vschneid@redhat.com> Link: https://lkml.kernel.org/r/20240727105029.086192709@infradead.org
2024-08-17sched/fair: Re-organize dequeue_task_fair()Peter Zijlstra
Working towards delaying dequeue, notably also inside the hierachy, rework dequeue_task_fair() such that it can 'resume' an interrupted hierarchy walk. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Tested-by: Valentin Schneider <vschneid@redhat.com> Link: https://lkml.kernel.org/r/20240727105028.977256873@infradead.org
2024-08-17sched: Allow sched_class::dequeue_task() to failPeter Zijlstra
Change the function signature of sched_class::dequeue_task() to return a boolean, allowing future patches to 'fail' dequeue. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Tested-by: Valentin Schneider <vschneid@redhat.com> Link: https://lkml.kernel.org/r/20240727105028.864630153@infradead.org
2024-08-17sched/fair: Unify pick_{,next_}_task_fair()Peter Zijlstra
Implement pick_next_task_fair() in terms of pick_task_fair() to de-duplicate the pick loop. More importantly, this makes all the pick loops use the state-invariant form, which is useful to introduce further re-try conditions in later patches. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Tested-by: Valentin Schneider <vschneid@redhat.com> Link: https://lkml.kernel.org/r/20240727105028.725062368@infradead.org
2024-08-17sched/fair: Cleanup pick_task_fair()'s currPeter Zijlstra
With 4c456c9ad334 ("sched/fair: Remove unused 'curr' argument from pick_next_entity()") curr is no longer being used, so no point in clearing it. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Tested-by: Valentin Schneider <vschneid@redhat.com> Link: https://lkml.kernel.org/r/20240727105028.614707623@infradead.org
2024-08-17sched/fair: Cleanup pick_task_fair() vs throttlePeter Zijlstra
Per 54d27365cae8 ("sched/fair: Prevent throttling in early pick_next_task_fair()") the reason check_cfs_rq_runtime() is under the 'if (curr)' check is to ensure the (downward) traversal does not result in an empty cfs_rq. But then the pick_task_fair() 'copy' of all this made it restart the traversal anyway, so that seems to solve the issue too. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ben Segall <bsegall@google.com> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Tested-by: Valentin Schneider <vschneid@redhat.com> Link: https://lkml.kernel.org/r/20240727105028.501679876@infradead.org
2024-08-17sched/eevdf: Remove min_vruntime_copyPeter Zijlstra
Since commit e8f331bcc270 ("sched/smp: Use lag to simplify cross-runqueue placement") the min_vruntime_copy is no longer used. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Tested-by: Valentin Schneider <vschneid@redhat.com> Link: https://lkml.kernel.org/r/20240727105028.395297941@infradead.org
2024-08-17sched/eevdf: Add feature commentsPeter Zijlstra
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Tested-by: Valentin Schneider <vschneid@redhat.com> Link: https://lkml.kernel.org/r/20240727105028.287790895@infradead.org
2024-08-16bcachefs: Fix locking in __bch2_trans_mark_dev_sb()Kent Overstreet
We run this in full RW mode now, so we have to guard against the superblock buffer being reallocated. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-16Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull memcg-v1 fix from Al Viro: "memcg_write_event_control() oops fix" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: memcg_write_event_control(): fix a user-triggerable oops
2024-08-16Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Fix the arm64 __get_mem_asm() to use the _ASM_EXTABLE_##type##ACCESS() macro instead of the *_ERR() one in order to avoid writing -EFAULT to the value register in case of a fault - Initialise all elements of the acpi_early_node_map[] to NUMA_NO_NODE. Prior to this fix, only the first element was initialised - Move the KASAN random tag seed initialisation after the per-CPU areas have been initialised (prng_state is __percpu) * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Fix KASAN random tag seed initialization arm64: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE arm64: uaccess: correct thinko in __get_mem_asm()
2024-08-16Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fix from Stephen Boyd: "One fix for the new T-Head TH1520 clk driver that marks a bus clk critical so that it isn't turned off during late init which breaks emmc-sdio" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: thead: fix dependency on clk_ignore_unused
2024-08-16Merge tag 'block-6.11-20240824' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - Fix corruption issues with s390/dasd (Eric, Stefan) - Fix a misuse of non irq locking grab of a lock (Li) - MD pull request with a single data corruption fix for raid1 (Yu) * tag 'block-6.11-20240824' of git://git.kernel.dk/linux: block: Fix lockdep warning in blk_mq_mark_tag_wait md/raid1: Fix data corruption for degraded array with slow disk s390/dasd: fix error recovery leading to data corruption on ESE devices s390/dasd: Remove DMA alignment
2024-08-16Merge tag 'io_uring-6.11-20240824' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring fixes from Jens Axboe: - Fix a comment in the uapi header using the wrong member name (Caleb) - Fix KCSAN warning for a debug check in sqpoll (me) - Two more NAPI tweaks (Olivier) * tag 'io_uring-6.11-20240824' of git://git.kernel.dk/linux: io_uring: fix user_data field name in comment io_uring/sqpoll: annotate debug task == current with data_race() io_uring/napi: remove duplicate io_napi_entry timeout assignation io_uring/napi: check napi_enabled in io_napi_add() before proceeding
2024-08-16Merge tag 'devicetree-fixes-for-6.11-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree fixes from Rob Herring: - Fix a possible (but unlikely) out-of-bounds read in interrupts parsing code - Add AT25 EEPROM "fujitsu,mb85rs256" compatible - Update Konrad Dybcio's email * tag 'devicetree-fixes-for-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: of/irq: Prevent device address out-of-bounds read in interrupt map walk dt-bindings: eeprom: at25: add fujitsu,mb85rs256 compatible dt-bindings: Batch-update Konrad Dybcio's email
2024-08-16btrfs: only enable extent map shrinker for DEBUG buildsQu Wenruo
Although there are several patches improving the extent map shrinker, there are still reports of too frequent shrinker behavior, taking too much CPU for the kswapd process. So let's only enable extent shrinker for now, until we got more comprehensive understanding and a better solution. Link: https://lore.kernel.org/linux-btrfs/3df4acd616a07ef4d2dc6bad668701504b412ffc.camel@intelfx.name/ Link: https://lore.kernel.org/linux-btrfs/c30fd6b3-ca7a-4759-8a53-d42878bf84f7@gmail.com/ Fixes: 956a17d9d050 ("btrfs: add a shrinker for extent maps") CC: stable@vger.kernel.org # 6.10+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-08-16Merge tag 'thermal-6.11-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fix from Rafael Wysocki: "Fix a Bang-bang thermal governor issue causing it to fail to reset the state of cooling devices if they are 'on' to start with, but the thermal zone temperature is always below the corresponding trip point (Rafael Wysocki)" * tag 'thermal-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: gov_bang_bang: Use governor_data to reduce overhead thermal: gov_bang_bang: Add .manage() callback thermal: gov_bang_bang: Split bang_bang_control() thermal: gov_bang_bang: Call __thermal_cdev_update() directly
2024-08-16Merge tag 'acpi-6.11-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Fix an issue related to the ACPI EC device handling that causes the _REG control method to be evaluated for EC operation regions that are not expected to be used. This confuses the platform firmware and provokes various types of misbehavior on some systems (Rafael Wysocki)" * tag 'acpi-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: EC: Evaluate _REG outside the EC scope more carefully ACPICA: Add a depth argument to acpi_execute_reg_methods() Revert "ACPI: EC: Evaluate orphan _REG under EC device"
2024-08-16Merge tag 'libnvdimm-fixes-6.11-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fix from Ira Weiny: "Commit f467fee48da4 ("block: move the dax flag to queue_limits") broke the DAX tests by skipping over the legacy pmem mapping pages case. Set the DAX flag in this case as well" * tag 'libnvdimm-fixes-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nvdimm/pmem: Set dax flag for all 'PFN_MAP' cases
2024-08-16io_uring: fix user_data field name in commentCaleb Sander Mateos
io_uring_cqe's user_data field refers to `sqe->data`, but io_uring_sqe does not have a data field. Fix the comment to say `sqe->user_data`. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Link: https://github.com/axboe/liburing/pull/1206 Link: https://lore.kernel.org/r/20240816181526.3642732-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-08-16Merge tag 'rust-fixes-6.11' of https://github.com/Rust-for-Linux/linuxLinus Torvalds
Pull rust fixes from Miguel Ojeda: - Fix '-Os' Rust 1.80.0+ builds adding more intrinsics (also tweaked in upstream Rust for the upcoming 1.82.0). - Fix support for the latest version of rust-analyzer due to a change on rust-analyzer config file semantics (considered a fix since most developers use the latest version of the tool, which is the only one actually supported by upstream). I am discussing stability of the config file with upstream -- they may be able to start versioning it. - Fix GCC 14 builds due to '-fmin-function-alignment' not skipped for libclang (bindgen). - A couple Kconfig fixes around '{RUSTC,BINDGEN}_VERSION_TEXT' to suppress error messages in a foreign architecture chroot and to use a proper default format. - Clean 'rust-analyzer' target warning due to missing recursive make invocation mark. - Clean Clippy warning due to missing indentation in docs. - Clean LLVM 19 build warning due to removed 3dnow feature upstream. * tag 'rust-fixes-6.11' of https://github.com/Rust-for-Linux/linux: rust: x86: remove `-3dnow{,a}` from target features kbuild: rust-analyzer: mark `rust_is_available.sh` invocation as recursive rust: add intrinsics to fix `-Os` builds kbuild: rust: skip -fmin-function-alignment in bindgen flags rust: Support latest version of `rust-analyzer` rust: macros: indent list item in `module!`'s docs rust: fix the default format for CONFIG_{RUSTC,BINDGEN}_VERSION_TEXT rust: suppress error messages from CONFIG_{RUSTC,BINDGEN}_VERSION_TEXT
2024-08-16Merge tag 'riscv-for-linus-6.11-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - reintroduce the text patching global icache flush - fix syscall entry code to correctly initialize a0, which manifested as a strace bug - XIP kernels now map the entire kernel, which fixes boot under at least DEBUG_VIRTUAL=y - initialize all nodes in the acpi_early_node_map initializer - fix OOB access in the Andes vendor extension probing code - A new key for scalar misaligned access performance in hwprobe, which correctly treat the values as an enum (as opposed to a bitmap) * tag 'riscv-for-linus-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fix out-of-bounds when accessing Andes per hart vendor extension array RISC-V: hwprobe: Add SCALAR to misaligned perf defines RISC-V: hwprobe: Add MISALIGNED_PERF key RISC-V: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE riscv: change XIP's kernel_map.size to be size of the entire kernel riscv: entry: always initialize regs->a0 to -ENOSYS riscv: Re-introduce global icache flush in patch_text_XXX()
2024-08-16Merge tag 'trace-v6.11-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: "A couple of fixes for tracing: - Prevent a NULL pointer dereference in the error path of RTLA tool - Fix an infinite loop bug when reading from the ring buffer when closed. If there's a thread trying to read the ring buffer and it gets closed by another thread, the one reading will go into an infinite loop when the buffer is empty instead of exiting back to user space" * tag 'trace-v6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: rtla/osnoise: Prevent NULL dereference in error handling tracing: Return from tracing_buffers_read() if the file has been closed
2024-08-16Merge tag 'keys-trusted-next-6.11-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull key fixes from Jarkko Sakkinen: "Two bug fixes for a memory corruption bug and a memory leak bug in the DCP trusted keys type. Just as a reminder DCP was a crypto coprocessor in i.MX SoCs" * tag 'keys-trusted-next-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: KEYS: trusted: dcp: fix leak of blob encryption key KEYS: trusted: fix DCP blob payload length assignment
2024-08-16bcachefs: fix incorrect i_state usageKent Overstreet
Reported-by: syzbot+95e40eae71609e40d851@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-16bcachefs: avoid overflowing LRU_TIME_BITS for cached data lruKent Overstreet
Reported-by: syzbot+510b0b28f8e6de64d307@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-16bcachefs: Fix forgetting to pass trans to fsck_err()Kent Overstreet
Reported-by: syzbot+e3938cd6d761b78750e6@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-16bcachefs: Increase size of cuckoo hash table on too many rehashesKent Overstreet
Also, improve the calculation of the new table size, so that it can shrink when needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-16Merge tag 'for-6.11/dm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mikulas Patocka: - fix misbehavior if suspend or resume is interrupted by a signal - fix wrong indentation in dm-crypt.rst - fix memory allocation failure in dm-persistent-data * tag 'for-6.11/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm persistent data: fix memory allocation failure Documentation: dm-crypt.rst warning + error fix dm resume: don't return EINVAL when signalled dm suspend: return -ERESTARTSYS instead of -EINTR
2024-08-16Merge tag 'iommu-fixes-v6.11-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu fixes from Joerg Roedel: - Bring back a lost return statement in io-page-fault code - Remove an unused function declaration * tag 'iommu-fixes-v6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: iommu: Remove unused declaration iommu_sva_unbind_gpasid() iommu: Restore lost return in iommu_report_device_fault()
2024-08-16Merge tag 'gpio-fixes-for-v6.11-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fix from Bartosz Golaszewski: - add the shutdown() callback to gpio-mlxbf3 in order to disable interrupts during graceful reboot * tag 'gpio-fixes-for-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: mlxbf3: Support shutdown() function
2024-08-16Merge tag 'sound-6.11-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "All small fixes, mostly for usual suspects, HD-audio and USB-audio device-specific fixes / quirks. The Cirrus codec support took the update of SPI header as well. Other than that, there is a regression fix in the sanity check of ALSA timer code" * tag 'sound-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/tas2781: Use correct endian conversion ALSA: usb-audio: Support Yamaha P-125 quirk entry ALSA: hda: cs35l41: Remove redundant call to hda_cs_dsp_control_remove() ALSA: hda: cs35l56: Remove redundant call to hda_cs_dsp_control_remove() ALSA: hda/tas2781: fix wrong calibrated data order ALSA: usb-audio: Add delay quirk for VIVO USB-C-XE710 HEADSET ALSA: hda/realtek: Add support for new HP G12 laptops ALSA: hda/realtek: Fix noise from speakers on Lenovo IdeaPad 3 15IAU7 ALSA: timer: Relax start tick time check for slave timer elements spi: Add empty versions of ACPI functions
2024-08-16Merge tag 'drm-fixes-2024-08-16' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm fixes from Dave Airlie: "Weekly drm fixes, mostly amdgpu and xe. The larger amdgpu fix is for a new IP block introduced in rc1, so should be fine. The xe fixes contain some missed fixes from the end of the previous round along with some fixes which required precursor changes, but otherwise everything seems fine, mediatek: - fix cursor crash amdgpu: - Fix MES ring buffer overflow - DCN 3.5 fix - DCN 3.2.1 fix - DP MST fix - Cursor fixes - JPEG fixes - Context ops validation - MES 12 fixes - VCN 5.0 fix - HDP fix panel: - dt bindings style fix - orientation quirks rockchip: - inno-hdmi: fix infoframe upload v3d: - fix OOB access in v3d_csd_job_run() xe: - Validate user fence during creation - Fix use after free when client stats are captured - SRIOV fixes - Runtime PM fixes" * tag 'drm-fixes-2024-08-16' of https://gitlab.freedesktop.org/drm/kernel: (37 commits) drm/xe: Hold a PM ref when GT TLB invalidations are inflight drm/xe: Drop xe_gt_tlb_invalidation_wait drm/xe: Add xe_gt_tlb_invalidation_fence_init helper drm/xe/pf: Fix VF config validation on multi-GT platforms drm/xe: Build PM into GuC CT layer drm/xe/vf: Fix register value lookup drm/xe: Fix use after free when client stats are captured drm/xe: Take a ref to xe file when user creates a VM drm/xe: Add ref counting for xe_file drm/xe: Move part of xe_file cleanup to a helper drm/xe: Validate user fence during creation drm/rockchip: inno-hdmi: Fix infoframe upload drm/amd/amdgpu: add HDP_SD support on gc 12.0.0/1 drm/amdgpu: Update kmd_fw_shared for VCN5 drm/amd/amdgpu: command submission parser for JPEG drm/amdgpu/mes12: fix suspend issue drm/amdgpu/mes12: sw/hw fini for unified mes drm/amdgpu/mes12: configure two pipes hardware resources drm/amdgpu/mes12: adjust mes12 sw/hw init for multiple pipes drm/amdgpu/mes12: add mes pipe switch support ...