summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-11-10i40e, xsk: uninitialized variable in i40e_clean_rx_irq_zc()Dan Carpenter
The "failure" variable is used without being initialized. It should be set to false. Fixes: 8cbf74149903 ("i40e, xsk: move buffer allocation out of the Rx processing loop") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-11-10i40e: Fix MAC address setting for a VF via Host/VMSlawomir Laba
Fix MAC setting flow for the PF driver. Update the unicast VF's MAC address in VF structure if it is a new setting in i40e_vc_add_mac_addr_msg. When unicast MAC address gets deleted, record that and set the new unicast MAC address that is already waiting in the filter list. This logic is based on the order of messages arriving to the PF driver. Without this change the MAC address setting was interpreted incorrectly in the following use cases: 1) Print incorrect VF MAC or zero MAC ip link show dev $pf 2) Don't preserve MAC between driver reload rmmod iavf; modprobe iavf 3) Update VF MAC when macvlan was set ip link add link $vf address $mac $vf.1 type macvlan 4) Failed to update mac address when VF was trusted ip link set dev $vf address $mac This includes all other configurations including above commands. Fixes: f657a6e1313b ("i40e: Fix VF driver MAC address configuration") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-11-10selftest: fix flower terse dump testsVlad Buslov
Iproute2 tc classifier terse dump has been accepted with modified syntax. Update the tests accordingly. Signed-off-by: Vlad Buslov <vlad@buslov.dev> Fixes: e7534fd42a99 ("selftests: implement flower classifier terse dump tests") Link: https://lore.kernel.org/r/20201107111928.453534-1-vlad@buslov.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-10um: Call pgtable_pmd_page_dtor() in __pmd_free_tlb()Richard Weinberger
Commit b2b29d6d0119 ("mm: account PMD tables like PTE tables") uncovered a bug in uml, we forgot to call the destructor. While we are here, give x a sane name. Reported-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Co-developed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Richard Weinberger <richard@nod.at> Tested-by: Christopher Obbard <chris.obbard@collabora.com>
2020-11-10kunit: fix display of failed expectations for stringsDaniel Latypov
Currently the following expectation KUNIT_EXPECT_STREQ(test, "hi", "bye"); will produce: Expected "hi" == "bye", but "hi" == 1625079497 "bye" == 1625079500 After this patch: Expected "hi" == "bye", but "hi" == hi "bye" == bye KUNIT_INIT_BINARY_STR_ASSERT_STRUCT() was written but just mistakenly not actually used by KUNIT_EXPECT_STREQ() and friends. Signed-off-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Tested-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-11-10kunit: tool: fix extra trailing \n in raw + parsed test outputDaniel Latypov
For simplcity, strip all trailing whitespace from parsed output. I imagine no one is printing out meaningful trailing whitespace via KUNIT_FAIL() or similar, and that if they are, they really shouldn't. `isolate_kunit_output()` yielded liens with trailing \n, which results in artifacty output like this: $ ./tools/testing/kunit/kunit.py run [16:16:46] [FAILED] example_simple_test [16:16:46] # example_simple_test: EXPECTATION FAILED at lib/kunit/kunit-example-test.c:29 [16:16:46] Expected 1 + 1 == 3, but [16:16:46] 1 + 1 == 2 [16:16:46] 3 == 3 [16:16:46] not ok 1 - example_simple_test [16:16:46] After this change: [16:16:46] # example_simple_test: EXPECTATION FAILED at lib/kunit/kunit-example-test.c:29 [16:16:46] Expected 1 + 1 == 3, but [16:16:46] 1 + 1 == 2 [16:16:46] 3 == 3 [16:16:46] not ok 1 - example_simple_test [16:16:46] We should *not* be expecting lines to end with \n in kunit_tool_test.py for this reason. Do the same for `raw_output()` as well which suffers from the same issue. This is a followup to [1], but rebased onto kunit-fixes to pick up the other raw_output() fix and fixes for kunit_tool_test.py. [1] https://lore.kernel.org/linux-kselftest/20201020233219.4146059-1-dlatypov@google.com/ Signed-off-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: David Gow <davidgow@google.com> Tested-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-11-10kunit: tool: print out stderr from make (like build warnings)Daniel Latypov
Currently the tool redirects make stdout + stderr, and only shows them if the make command fails. This means build warnings aren't shown to the user. This change prints the contents of stderr even if make succeeds, under the assumption these are only build warnings or other messages the user likely wants to see. We drop stdout from the raised exception since we can no longer easily collate stdout and stderr and just showing the stderr seems fine. Example with a warning: [14:56:35] Building KUnit Kernel ... ../lib/kunit/kunit-test.c: In function ‘kunit_test_successful_try’: ../lib/kunit/kunit-test.c:19:6: warning: unused variable ‘unused’ [-Wunused-variable] 19 | int unused; | ^~~~~~ [14:56:40] Starting KUnit Kernel ... Note the stderr has a trailing \n, and since we use print, we add another, but it helps separate make and kunit.py output. Example with a build error: [15:02:45] Building KUnit Kernel ... ERROR:root:../lib/kunit/kunit-test.c: In function ‘kunit_test_successful_try’: ../lib/kunit/kunit-test.c:19:2: error: unknown type name ‘invalid_type’ 19 | invalid_type *test = data; | ^~~~~~~~~~~~ ... Signed-off-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-11-10KUnit: Docs: usage: wording fixesRandy Dunlap
Fix minor grammar and punctutation glitches. Hyphenate "architecture-specific" instances. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: David Gow <davidgow@google.com> Cc: linux-kselftest@vger.kernel.org Cc: kunit-dev@googlegroups.com Cc: Shuah Khan <shuah@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Brendan Higgins <brendanhiggins@google.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-11-10KUnit: Docs: style: fix some Kconfig example issuesRandy Dunlap
Fix the Kconfig example to be closer to Kconfig coding style. Also add punctuation and a trailing slash ('/') to a sub-directory name -- this is how the text mostly appears in other Kconfig files. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: David Gow <davidgow@google.com> Cc: linux-kselftest@vger.kernel.org Cc: kunit-dev@googlegroups.com Cc: Shuah Khan <shuah@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Brendan Higgins <brendanhiggins@google.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-11-10KUnit: Docs: fix a wording typoRandy Dunlap
Fix a wording typo (keyboard glitch). Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: David Gow <davidgow@google.com> Cc: linux-kselftest@vger.kernel.org Cc: kunit-dev@googlegroups.com Cc: Shuah Khan <shuah@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Brendan Higgins <brendanhiggins@google.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-11-10kunit: Do not pollute source directory with generated files (test.log)Andy Shevchenko
When --build_dir is provided use it and do not pollute source directory which even can be mounted over network or read-only. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Tested-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-11-10kunit: Do not pollute source directory with generated files (.kunitconfig)Andy Shevchenko
When --build_dir is provided use it and do not pollute source directory which even can be mounted over network or read-only. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Tested-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-11-10kunit: tool: fix pre-existing python type annotation errorsDaniel Latypov
The code uses annotations, but they aren't accurate. Note that type checking in python is a separate process, running `kunit.py run` will not check and complain about invalid types at runtime. Fix pre-existing issues found by running a type checker $ mypy *.py All but one of these were returning `None` without denoting this properly (via `Optional[Type]`). Signed-off-by: Daniel Latypov <dlatypov@google.com> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-11-10kunit: Fix kunit.py parse subcommand (use null build_dir)David Gow
When JSON support was added in [1], the KunitParseRequest tuple was updated to contain a 'build_dir' field, but kunit.py parse doesn't accept --build_dir as an option. The code nevertheless tried to access it, resulting in this error: AttributeError: 'Namespace' object has no attribute 'build_dir' Given that the parser only uses the build_dir variable to set the 'build_environment' json field, we set it to None (which gives the JSON 'null') for now. Ultimately, we probably do want to be able to set this, but since it's new functionality which (for the parse subcommand) never worked, this is the quickest way of getting it back up and running. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git/commit/?h=kunit-fixes&id=21a6d1780d5bbfca0ce9b8104ca6233502fcbf86 Fixes: 21a6d1780d5b ("kunit: tool: allow generating test results in JSON") Signed-off-by: David Gow <davidgow@google.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Tested-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-11-10kunit: tool: unmark test_data as binary blobsBrendan Higgins
The tools/testing/kunit/test_data/ directory was marked as binary because some of the test_data files cause checkpatch warnings. Fix this by dropping the .gitattributes file. Fixes: afc63da64f1e ("kunit: kunit_parser: make parser more robust") Signed-off-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-11-10Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull core dump fix from Al Viro: "Fix for multithreaded coredump playing fast and loose with getting registers of secondary threads; if a secondary gets caught in the middle of exit(2), the conditition it will be stopped in for dumper to examine might be unusual enough for things to go wrong. Quite a few architectures are fine with that, but some are not." * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: don't dump the threads that had been already exiting when zapped.
2020-11-10efi/x86: Free efi_pgd with free_pages()Arvind Sankar
Commit d9e9a6418065 ("x86/mm/pti: Allocate a separate user PGD") changed the PGD allocation to allocate PGD_ALLOCATION_ORDER pages, so in the error path it should be freed using free_pages() rather than free_page(). Commit 06ace26f4e6f ("x86/efi: Free efi_pgd with free_pages()") fixed one instance of this, but missed another. Move the freeing out-of-line to avoid code duplication and fix this bug. Fixes: d9e9a6418065 ("x86/mm/pti: Allocate a separate user PGD") Link: https://lore.kernel.org/r/20201110163919.1134431-1-nivedita@alum.mit.edu Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-11-10Merge tag 'for-5.10-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A handful of minor fixes and updates: - handle missing device replace item on mount (syzbot report) - fix space reservation calculation when finishing relocation - fix memory leak on error path in ref-verify (debugging feature) - fix potential overflow during defrag on 32bit arches - minor code update to silence smatch warning - minor error message updates" * tag 'for-5.10-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: ref-verify: fix memory leak in btrfs_ref_tree_mod btrfs: dev-replace: fail mount if we don't have replace item with target device btrfs: scrub: update message regarding read-only status btrfs: clean up NULL checks in qgroup_unreserve_range() btrfs: fix min reserved size calculation in merge_reloc_root btrfs: print the block rsv type when we fail our reservation btrfs: fix potential overflow in cluster_pages_for_defrag on 32bit arch
2020-11-10Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscryptLinus Torvalds
Pull fscrypt fix from Eric Biggers: "Fix a regression where a new WARN_ON() was reachable when using FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32 on ext4, causing xfstest generic/602 to sometimes fail on ext4" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fscrypt: remove reachable WARN in fscrypt_setup_iv_ino_lblk_32_key()
2020-11-10Merge branch 'turbostat' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull turbostat updates from Len Brown: "Update update to version 20.09.30, one kernel side fix" * 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: update version number powercap: restrict energy meter to root access tools/power turbostat: harden against cpu hotplug tools/power turbostat: adjust for temperature offset tools/power turbostat: Build with _FILE_OFFSET_BITS=64 tools/power turbostat: Support AMD Family 19h tools/power turbostat: Remove empty columns for Jacobsville tools/power turbostat: Add a new GFXAMHz column that exposes gt_act_freq_mhz. tools/power x86_energy_perf_policy: Input/output error in a VM tools/power turbostat: Skip pc8, pc9, pc10 columns, if they are disabled tools/power turbostat: Support additional CPU model numbers tools/power turbostat: Fix output formatting for ACPI CST enumeration tools/power turbostat: Replace HTTP links with HTTPS ones: TURBOSTAT UTILITY tools/power turbostat: Use sched_getcpu() instead of hardcoded cpu 0 tools/power turbostat: Enable accumulate RAPL display tools/power turbostat: Introduce functions to accumulate RAPL consumption tools/power turbostat: Make the energy variable to be 64 bit tools/power turbostat: Always print idle in the system configuration header tools/power turbostat: Print /dev/cpu_dma_latency
2020-11-10ACPI: DPTF: Support Alder LakeSrinivas Pandruvada
Add Alder Lake ACPI IDs for DPTF devices. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-11-10Documentation: ACPI: fix spelling mistakesFlavio Suligoi
Signed-off-by: Flavio Suligoi <f.suligoi@asem.it> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-11-10sched/fair: Dissociate wakeup decisions from SD flag valueValentin Schneider
The CFS wakeup code will only ever go through EAS / its fast path on "regular" wakeups (i.e. not on forks or execs). These are currently gated by a check against 'sd_flag', which would be SD_BALANCE_WAKE at wakeup. However, we now have a flag that explicitly tells us whether a wakeup is a "regular" one, so hinge those conditions on that flag instead. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201102184514.2733-4-valentin.schneider@arm.com
2020-11-10sched: Remove select_task_rq()'s sd_flag parameterValentin Schneider
Only select_task_rq_fair() uses that parameter to do an actual domain search, other classes only care about what kind of wakeup is happening (fork, exec, or "regular") and thus just translate the flag into a wakeup type. WF_TTWU and WF_EXEC have just been added, use these along with WF_FORK to encode the wakeup types we care about. For select_task_rq_fair(), we can simply use the shiny new WF_flag : SD_flag mapping. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201102184514.2733-3-valentin.schneider@arm.com
2020-11-10sched: Add WF_TTWU, WF_EXEC wakeup flagsValentin Schneider
To remove the sd_flag parameter of select_task_rq(), we need another way of encoding wakeup types. There already is a WF_FORK flag, add the missing two. With that said, we still need an easy way to turn WF_foo into SD_bar (e.g. WF_TTWU into SD_BALANCE_WAKE). As suggested by Peter, let's make our lives easier and make them match exactly, and throw in some compile-time checks for good measure. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201102184514.2733-2-valentin.schneider@arm.com
2020-11-10sched/fair: Remove superfluous lock section in do_sched_cfs_slack_timer()Hui Su
Since ab93a4bc955b ("sched/fair: Remove distribute_running fromCFS bandwidth"), there is nothing to protect between raw_spin_lock_irqsave/store() in do_sched_cfs_slack_timer(). Signed-off-by: Hui Su <sh_def@163.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Phil Auld <pauld@redhat.com> Reviewed-by: Ben Segall <bsegall@google.com> Link: https://lkml.kernel.org/r/20201030144621.GA96974@rlk
2020-11-10Merge branch 'sched/migrate-disable'Peter Zijlstra
2020-11-10sched: Comment affine_move_task()Valentin Schneider
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201013140116.26651-2-valentin.schneider@arm.com
2020-11-10sched: Deny self-issued __set_cpus_allowed_ptr() when migrate_disable()Valentin Schneider
migrate_disable(); set_cpus_allowed_ptr(current, {something excluding task_cpu(current)}); affine_move_task(); <-- never returns Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201013140116.26651-1-valentin.schneider@arm.com
2020-11-10sched/proc: Print accurate cpumask vs migrate_disable()Peter Zijlstra
Ensure /proc/*/status doesn't print 'random' cpumasks due to migrate_disable(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201023102347.593984734@infradead.org
2020-11-10sched: Fix migrate_disable() vs rt/dl balancingPeter Zijlstra
In order to minimize the interference of migrate_disable() on lower priority tasks, which can be deprived of runtime due to being stuck below a higher priority task. Teach the RT/DL balancers to push away these higher priority tasks when a lower priority task gets selected to run on a freshly demoted CPU (pull). This adds migration interference to the higher priority task, but restores bandwidth to system that would otherwise be irrevocably lost. Without this it would be possible to have all tasks on the system stuck on a single CPU, each task preempted in a migrate_disable() section with a single high priority task running. This way we can still approximate running the M highest priority tasks on the system. Migrating the top task away is (ofcourse) still subject to migrate_disable() too, which means the lower task is subject to an interference equivalent to the worst case migrate_disable() section. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201023102347.499155098@infradead.org
2020-11-10sched, lockdep: Annotate ->pi_lock recursionPeter Zijlstra
There's a valid ->pi_lock recursion issue where the actual PI code tries to wake up the stop task. Make lockdep aware so it doesn't complain about this. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201023102347.406912197@infradead.org
2020-11-10sched,rt: Use the full cpumask for balancingPeter Zijlstra
We want migrate_disable() tasks to get PULLs in order for them to PUSH away the higher priority task. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201023102347.310519774@infradead.org
2020-11-10sched,rt: Use cpumask_any*_distribute()Peter Zijlstra
Replace a bunch of cpumask_any*() instances with cpumask_any*_distribute(), by injecting this little bit of random in cpu selection, we reduce the chance two competing balance operations working off the same lowest_mask pick the same CPU. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201023102347.190759694@infradead.org
2020-11-10sched/core: Make migrate disable and CPU hotplug cooperativeThomas Gleixner
On CPU unplug tasks which are in a migrate disabled region cannot be pushed to a different CPU until they returned to migrateable state. Account the number of tasks on a runqueue which are in a migrate disabled section and make the hotplug wait mechanism respect that. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201023102347.067278757@infradead.org
2020-11-10sched: Fix migrate_disable() vs set_cpus_allowed_ptr()Peter Zijlstra
Concurrent migrate_disable() and set_cpus_allowed_ptr() has interesting features. We rely on set_cpus_allowed_ptr() to not return until the task runs inside the provided mask. This expectation is exported to userspace. This means that any set_cpus_allowed_ptr() caller must wait until migrate_enable() allows migrations. At the same time, we don't want migrate_enable() to schedule, due to patterns like: preempt_disable(); migrate_disable(); ... migrate_enable(); preempt_enable(); And: raw_spin_lock(&B); spin_unlock(&A); this means that when migrate_enable() must restore the affinity mask, it cannot wait for completion thereof. Luck will have it that that is exactly the case where there is a pending set_cpus_allowed_ptr(), so let that provide storage for the async stop machine. Much thanks to Valentin who used TLA+ most effective and found lots of 'interesting' cases. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201023102346.921768277@infradead.org
2020-11-10sched: Add migrate_disable()Peter Zijlstra
Add the base migrate_disable() support (under protest). While migrate_disable() is (currently) required for PREEMPT_RT, it is also one of the biggest flaws in the system. Notably this is just the base implementation, it is broken vs sched_setaffinity() and hotplug, both solved in additional patches for ease of review. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201023102346.818170844@infradead.org
2020-11-10sched: Massage set_cpus_allowed()Peter Zijlstra
Thread a u32 flags word through the *set_cpus_allowed*() callchain. This will allow adding behavioural tweaks for future users. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201023102346.729082820@infradead.org
2020-11-10sched: Fix hotplug vs CPU bandwidth controlPeter Zijlstra
Since we now migrate tasks away before DYING, we should also move bandwidth unthrottle, otherwise we can gain tasks from unthrottle after we expect all tasks to be gone already. Also; it looks like the RT balancers don't respect cpu_active() and instead rely on rq->online in part, complete this. This too requires we do set_rq_offline() earlier to match the cpu_active() semantics. (The bigger patch is to convert RT to cpu_active() entirely) Since set_rq_online() is called from sched_cpu_activate(), place set_rq_offline() in sched_cpu_deactivate(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201023102346.639538965@infradead.org
2020-11-10sched/hotplug: Consolidate task migration on CPU unplugThomas Gleixner
With the new mechanism which kicks tasks off the outgoing CPU at the end of schedule() the situation on an outgoing CPU right before the stopper thread brings it down completely is: - All user tasks and all unbound kernel threads have either been migrated away or are not running and the next wakeup will move them to a online CPU. - All per CPU kernel threads, except cpu hotplug thread and the stopper thread have either been unbound or parked by the responsible CPU hotplug callback. That means that at the last step before the stopper thread is invoked the cpu hotplug thread is the last legitimate running task on the outgoing CPU. Add a final wait step right before the stopper thread is kicked which ensures that any still running tasks on the way to park or on the way to kick themself of the CPU are either sleeping or gone. This allows to remove the migrate_tasks() crutch in sched_cpu_dying(). If sched_cpu_dying() detects that there is still another running task aside of the stopper thread then it will explode with the appropriate fireworks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201023102346.547163969@infradead.org
2020-11-10workqueue: Manually break affinity on hotplugPeter Zijlstra
Don't rely on the scheduler to force break affinity for us -- it will stop doing that for per-cpu-kthreads. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201023102346.464718669@infradead.org
2020-11-10sched/core: Wait for tasks being pushed away on hotplugThomas Gleixner
RT kernels need to ensure that all tasks which are not per CPU kthreads have left the outgoing CPU to guarantee that no tasks are force migrated within a migrate disabled section. There is also some desire to (ab)use fine grained CPU hotplug control to clear a CPU from active state to force migrate tasks which are not per CPU kthreads away for power control purposes. Add a mechanism which waits until all tasks which should leave the CPU after the CPU active flag is cleared have moved to a different online CPU. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201023102346.377836842@infradead.org
2020-11-10sched/hotplug: Ensure only per-cpu kthreads run during hotplugPeter Zijlstra
In preparation for migrate_disable(), make sure only per-cpu kthreads are allowed to run on !active CPUs. This is ran (as one of the very first steps) from the cpu-hotplug task which is a per-cpu kthread and completion of the hotplug operation only requires such tasks. This constraint enables the migrate_disable() implementation to wait for completion of all migrate_disable regions on this CPU at hotplug time without fear of any new ones starting. This replaces the unlikely(rq->balance_callbacks) test at the tail of context_switch with an unlikely(rq->balance_work), the fast path is not affected. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201023102346.292709163@infradead.org
2020-11-10sched: Fix balance_callback()Peter Zijlstra
The intent of balance_callback() has always been to delay executing balancing operations until the end of the current rq->lock section. This is because balance operations must often drop rq->lock, and that isn't safe in general. However, as noted by Scott, there were a few holes in that scheme; balance_callback() was called after rq->lock was dropped, which means another CPU can interleave and touch the callback list. Rework code to call the balance callbacks before dropping rq->lock where possible, and otherwise splice the balance list onto a local stack. This guarantees that the balance list must be empty when we take rq->lock. IOW, we'll only ever run our own balance callbacks. Reported-by: Scott Wood <swood@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201023102346.203901269@infradead.org
2020-11-10stop_machine: Add function and caller debug infoPeter Zijlstra
Crashes in stop-machine are hard to connect to the calling code, add a little something to help with that. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201023102346.116513635@infradead.org
2020-11-10sched/debug: Fix memory corruption caused by multiple small reads of flagsColin Ian King
Reading /proc/sys/kernel/sched_domain/cpu*/domain0/flags mutliple times with small reads causes oopses with slub corruption issues because the kfree is free'ing an offset from a previous allocation. Fix this by adding in a new pointer 'buf' for the allocation and kfree and use the temporary pointer tmp to handle memory copies of the buf offsets. Fixes: 5b9f8ff7b320 ("sched/debug: Output SD flag names rather than their values") Reported-by: Jeff Bastian <jbastian@redhat.com> Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20201029151103.373410-1-colin.king@canonical.com
2020-11-10sched/fair: Prefer prev cpu in asymmetric wakeup pathVincent Guittot
During fast wakeup path, scheduler always check whether local or prev cpus are good candidates for the task before looking for other cpus in the domain. With commit b7a331615d25 ("sched/fair: Add asymmetric CPU capacity wakeup scan") the heterogenous system gains a dedicated path but doesn't try to reuse prev cpu whenever possible. If the previous cpu is idle and belong to the LLC domain, we should check it 1st before looking for another cpu because it stays one of the best candidate and this also stabilizes task placement on the system. This change aligns asymmetric path behavior with symmetric one and reduces cases where the task migrates across all cpus of the sd_asym_cpucapacity domains at wakeup. This change does not impact normal EAS mode but only the overloaded case or when EAS is not used. - On hikey960 with performance governor (EAS disable) ./perf bench sched pipe -T -l 50000 mainline w/ patch # migrations 999364 0 ops/sec 149313(+/-0.28%) 182587(+/- 0.40) +22% - On hikey with performance governor ./perf bench sched pipe -T -l 50000 mainline w/ patch # migrations 0 0 ops/sec 47721(+/-0.76%) 47899(+/- 0.56) +0.4% According to test on hikey, the patch doesn't impact symmetric system compared to current implementation (only tested on arm64) Also read the uclamped value of task's utilization at most twice instead instead each time we compare task's utilization with cpu's capacity. Fixes: b7a331615d25 ("sched/fair: Add asymmetric CPU capacity wakeup scan") Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20201029161824.26389-1-vincent.guittot@linaro.org
2020-11-10sched/fair: Ensure tasks spreading in LLC during LBVincent Guittot
schbench shows latency increase for 95 percentile above since: commit 0b0695f2b34a ("sched/fair: Rework load_balance()") Align the behavior of the load balancer with the wake up path, which tries to select an idle CPU which belongs to the LLC for a waking task. calculate_imbalance() will use nr_running instead of the spare capacity when CPUs share resources (ie cache) at the domain level. This will ensure a better spread of tasks on idle CPUs. Running schbench on a hikey (8cores arm64) shows the problem: tip/sched/core : schbench -m 2 -t 4 -s 10000 -c 1000000 -r 10 Latency percentiles (usec) 50.0th: 33 75.0th: 45 90.0th: 51 95.0th: 4152 *99.0th: 14288 99.5th: 14288 99.9th: 14288 min=0, max=14276 tip/sched/core + patch : schbench -m 2 -t 4 -s 10000 -c 1000000 -r 10 Latency percentiles (usec) 50.0th: 34 75.0th: 47 90.0th: 52 95.0th: 78 *99.0th: 94 99.5th: 94 99.9th: 94 min=0, max=94 Fixes: 0b0695f2b34a ("sched/fair: Rework load_balance()") Reported-by: Chris Mason <clm@fb.com> Suggested-by: Rik van Riel <riel@surriel.com> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Rik van Riel <riel@surriel.com> Tested-by: Rik van Riel <riel@surriel.com> Link: https://lkml.kernel.org/r/20201102102457.28808-1-vincent.guittot@linaro.org
2020-11-10perf/x86/intel/uncore: Fix Add BW copypastaArnd Bergmann
gcc -Wextra points out a duplicate initialization of one array member: arch/x86/events/intel/uncore_snb.c:478:37: warning: initialized field overwritten [-Woverride-init] 478 | [SNB_PCI_UNCORE_IMC_DATA_READS] = { SNB_UNCORE_PCI_IMC_DATA_WRITES_BASE, The only sensible explanation is that a duplicate 'READS' was used instead of the correct 'WRITES', so change it back. Fixes: 24633d901ea4 ("perf/x86/intel/uncore: Add BW counters for GT, IA and IO breakdown") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201026215203.3893972-1-arnd@kernel.org
2020-11-10lockdep: Avoid to modify chain keys in validate_chain()Boqun Feng
Chris Wilson reported a problem spotted by check_chain_key(): a chain key got changed in validate_chain() because we modify the ->read in validate_chain() to skip checks for dependency adding, and ->read is taken into calculation for chain key since commit f611e8cf98ec ("lockdep: Take read/write status in consideration when generate chainkey"). Fix this by avoiding to modify ->read in validate_chain() based on two facts: a) since we now support recursive read lock detection, there is no need to skip checks for dependency adding for recursive readers, b) since we have a), there is only one case left (nest_lock) where we want to skip checks in validate_chain(), we simply remove the modification for ->read and rely on the return value of check_deadlock() to skip the dependency adding. Reported-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201102053743.450459-1-boqun.feng@gmail.com