summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-01-14keys: Fix request_key() cacheDavid Howells
When the key cached by request_key() and co. is cleaned up on exit(), the code looks in the wrong task_struct, and so clears the wrong cache. This leads to anomalies in key refcounting when doing, say, a kernel build on an afs volume, that then trigger kasan to report a use-after-free when the key is viewed in /proc/keys. Fix this by making exit_creds() look in the passed-in task_struct rather than in current (the task_struct cleanup code is deferred by RCU and potentially run in another task). Fixes: 7743c48e54ee ("keys: Cache result of request_key*() temporarily in task_struct") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-14Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "11 mm fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: khugepaged: add trace status description for SCAN_PAGE_HAS_PRIVATE mm: memcg/slab: call flush_memcg_workqueue() only if memcg workqueue is valid mm/page-writeback.c: improve arithmetic divisions mm/page-writeback.c: use div64_ul() for u64-by-unsigned-long divide mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio() mm, debug_pagealloc: don't rely on static keys too early mm: memcg/slab: fix percpu slab vmstats flushing mm/shmem.c: thp, shmem: fix conflict of above-47bit hint address and PMD alignment mm/huge_memory.c: thp: fix conflict of above-47bit hint address and PMD alignment mm/memory_hotplug: don't free usage map when removing a re-added early section mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations
2020-01-14arm64: nofpsmid: Handle TIF_FOREIGN_FPSTATE flag cleanlySuzuki K Poulose
We detect the absence of FP/SIMD after an incapable CPU is brought up, and by then we have kernel threads running already with TIF_FOREIGN_FPSTATE set which could be set for early userspace applications (e.g, modprobe triggered from initramfs) and init. This could cause the applications to loop forever in do_nofity_resume() as we never clear the TIF flag, once we now know that we don't support FP. Fix this by making sure that we clear the TIF_FOREIGN_FPSTATE flag for tasks which may have them set, as we would have done in the normal case, but avoiding touching the hardware state (since we don't support any). Also to make sure we handle the cases seemlessly we categorise the helper functions to two : 1) Helpers for common core code, which calls into take appropriate actions without knowing the current FPSIMD state of the CPU/task. e.g fpsimd_restore_current_state(), fpsimd_flush_task_state(), fpsimd_save_and_flush_cpu_state(). We bail out early for these functions, taking any appropriate actions (e.g, clearing the TIF flag) where necessary to hide the handling from core code. 2) Helpers used when the presence of FP/SIMD is apparent. i.e, save/restore the FP/SIMD register state, modify the CPU/task FP/SIMD state. e.g, fpsimd_save(), task_fpsimd_load() - save/restore task FP/SIMD registers fpsimd_bind_task_to_cpu() \ - Update the "state" metadata for CPU/task. fpsimd_bind_state_to_cpu() / fpsimd_update_current_state() - Update the fp/simd state for the current task from memory. These must not be called in the absence of FP/SIMD. Put in a WARNING to make sure they are not invoked in the absence of FP/SIMD. KVM also uses the TIF_FOREIGN_FPSTATE flag to manage the FP/SIMD state on the CPU. However, without FP/SIMD support we trap all accesses and inject undefined instruction. Thus we should never "load" guest state. Add a sanity check to make sure this is valid. Fixes: 82e0191a1aa11abf ("arm64: Support systems without FP/ASIMD") Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-01-14arm64: signal: nofpsimd: Handle fp/simd context for signal framesSuzuki K Poulose
Make sure we try to save/restore the vfp/fpsimd context for signal handling only when the fp/simd support is available. Otherwise, skip the frames. Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-01-14arm64: ptrace: nofpsimd: Fail FP/SIMD regset operationsSuzuki K Poulose
When fp/simd is not supported on the system, fail the operations of FP/SIMD regsets. Fixes: 82e0191a1aa11abf ("arm64: Support systems without FP/ASIMD") Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-01-14arm64: cpufeature: Set the FP/SIMD compat HWCAP bits properlySuzuki K Poulose
We set the compat_elf_hwcap bits unconditionally on arm64 to include the VFP and NEON support. However, the FP/SIMD unit is optional on Arm v8 and thus could be missing. We already handle this properly in the kernel, but still advertise to the COMPAT applications that the VFP is available. Fix this to make sure we only advertise when we really have them. Fixes: 82e0191a1aa11abf ("arm64: Support systems without FP/ASIMD") Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-01-14arm64: cpufeature: Fix the type of no FP/SIMD capabilitySuzuki K Poulose
The NO_FPSIMD capability is defined with scope SYSTEM, which implies that the "absence" of FP/SIMD on at least one CPU is detected only after all the SMP CPUs are brought up. However, we use the status of this capability for every context switch. So, let us change the scope to LOCAL_CPU to allow the detection of this capability as and when the first CPU without FP is brought up. Also, the current type allows hotplugged CPU to be brought up without FP/SIMD when all the current CPUs have FP/SIMD and we have the userspace up. Fix both of these issues by changing the capability to BOOT_RESTRICTED_LOCAL_CPU_FEATURE. Fixes: 82e0191a1aa11abf ("arm64: Support systems without FP/ASIMD") Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-01-14arm64: fpsimd: Make sure SVE setup is complete before SIMD is usedSuzuki K Poulose
In-kernel users of NEON rely on may_use_simd() to check if the SIMD can be used. However, we must initialize the SVE before SIMD can be used. Add a sanity check to make sure that we have completed the SVE setup before anyone uses the SIMD. Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-01-14arm64: Introduce system_capabilities_finalized() markerSuzuki K Poulose
We finalize the system wide capabilities after the SMP CPUs are booted by the kernel. This is used as a marker for deciding various checks in the kernel. e.g, sanity check the hotplugged CPUs for missing mandatory features. However there is no explicit helper available for this in the kernel. There is sys_caps_initialised, which is not exposed. The other closest one we have is the jump_label arm64_const_caps_ready which denotes that the capabilities are set and the capability checks could use the individual jump_labels for fast path. This is performed before setting the ELF Hwcaps, which must be checked against the new CPUs. We also perform some of the other initialization e.g, SVE setup, which is important for the use of FP/SIMD where SVE is supported. Normally userspace doesn't get to run before we finish this. However the in-kernel users may potentially start using the neon mode. So, we need to reject uses of neon mode before we are set. Instead of defining a new marker for the completion of SVE setup, we could simply reuse the arm64_const_caps_ready and enable it once we have finished all the setup. Also we could expose this to the various users as "system_capabilities_finalized()" to make it more meaningful than "const_caps_ready". Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-01-14spi: fsl: simplify error path in of_fsl_spi_probe()Christophe Leroy
No need to 'goto err;' for just doing a return. return directly from where the error happens. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Link: https://lore.kernel.org/r/2a4a7e11b37cfa0558d68f0d35e90d6da858b059.1579017697.git.christophe.leroy@c-s.fr Signed-off-by: Mark Brown <broonie@kernel.org>
2020-01-14spi: fsl-lpspi: fix only one cs-gpio workingPhilippe Schenker
Why it does not work at the moment: - num_chipselect sets the number of cs-gpios that are in the DT. This comes from drivers/spi/spi.c - num_chipselect gets set with devm_spi_register_controller, that is called in drivers/spi/spi.c - devm_spi_register_controller got called after num_chipselect has been used. How this commit fixes the issue: - devm_spi_register_controller gets called before num_chipselect is being used. Fixes: c7a402599504 ("spi: lpspi: use the core way to implement cs-gpio function") Signed-off-by: Philippe Schenker <philippe.schenker@toradex.com> Link: https://lore.kernel.org/r/20191204141312.1411251-1-philippe.schenker@toradex.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-01-14regulator: core: Add regulator_is_equal() helperMarek Vasut
Add regulator_is_equal() helper to compare whether two regulators are the same. This is useful for checking whether two separate regulators in a driver are actually the same supply. Signed-off-by: Marek Vasut <marex@denx.de> Cc: Fabio Estevam <festevam@gmail.com> Cc: Igor Opaniuk <igor.opaniuk@toradex.com> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Marcel Ziswiler <marcel.ziswiler@toradex.com> Cc: Mark Brown <broonie@kernel.org> Cc: Oleksandr Suvorov <oleksandr.suvorov@toradex.com> Link: https://lore.kernel.org/r/20191220164450.1395038-1-marex@denx.de Signed-off-by: Mark Brown <broonie@kernel.org>
2020-01-14spi: spi-ti-qspi: optimize byte-transfersJean Pihet
Optimize the 8-bit based transfers, as used by the SPI flash devices, by reading the data registers by 32 and 128 bits when possible and copy the contents to the receive buffer. The speed improvement is 4.9x using quad read. Signed-off-by: Jean Pihet <jean.pihet@newoldbits.com> Cc: Ryan Barnett <ryan.barnett@rockwellcollins.com> Cc: Conrad Ratschan <conrad.ratschan@rockwellcollins.com> Cc: Arnout Vandecappelle <arnout.vandecappelle@essensium.com> Link: https://lore.kernel.org/r/20200114124125.361429-3-jean.pihet@newoldbits.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-01-14spi: spi-ti-qspi: support large flash devicesJean Pihet
The TI QSPI IP has limitations: - the MMIO region is 64MB in size - in non-MMIO mode, the transfer can handle 4096 words max. Add support for bigger devices. Use MMIO and DMA transfers below the 64MB boundary, use software generated transfers above. Signed-off-by: Jean Pihet <jean.pihet@newoldbits.com> Cc: Ryan Barnett <ryan.barnett@rockwellcollins.com> Cc: Conrad Ratschan <conrad.ratschan@rockwellcollins.com> Cc: Arnout Vandecappelle <arnout.vandecappelle@essensium.com> Link: https://lore.kernel.org/r/20200114124125.361429-2-jean.pihet@newoldbits.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-01-14regulator: mpq7920: Convert to use .probe_newAxel Lin
Use the new .probe_new instead. Signed-off-by: Axel Lin <axel.lin@ingics.com> Link: https://lore.kernel.org/r/20200114124449.28408-2-axel.lin@ingics.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-01-14regulator: mpq7920: Remove unneeded fields from struct mpq7920_regulator_infoAxel Lin
Both *dev and *rdev are only used in .probe, so use local variable instead. Also remove mpq7920_regulator_register() because it is so trivial and there is only one caller. Signed-off-by: Axel Lin <axel.lin@ingics.com> Link: https://lore.kernel.org/r/20200114124449.28408-1-axel.lin@ingics.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-01-14Documentation/process: Add Amazon contact for embargoed hardware issuesDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://lore.kernel.org/r/da6467d2649339b42339124fd19a8a2f91cc00dd.camel@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-14lkdtm/bugs: fix build error in lkdtm_UNSET_SMEPBrendan Higgins
When building ARCH=um with CONFIG_UML_X86=y and CONFIG_64BIT=y we get the build errors: drivers/misc/lkdtm/bugs.c: In function ‘lkdtm_UNSET_SMEP’: drivers/misc/lkdtm/bugs.c:288:8: error: implicit declaration of function ‘native_read_cr4’ [-Werror=implicit-function-declaration] cr4 = native_read_cr4(); ^~~~~~~~~~~~~~~ drivers/misc/lkdtm/bugs.c:290:13: error: ‘X86_CR4_SMEP’ undeclared (first use in this function); did you mean ‘X86_FEATURE_SMEP’? if ((cr4 & X86_CR4_SMEP) != X86_CR4_SMEP) { ^~~~~~~~~~~~ X86_FEATURE_SMEP drivers/misc/lkdtm/bugs.c:290:13: note: each undeclared identifier is reported only once for each function it appears in drivers/misc/lkdtm/bugs.c:297:2: error: implicit declaration of function ‘native_write_cr4’; did you mean ‘direct_write_cr4’? [-Werror=implicit-function-declaration] native_write_cr4(cr4); ^~~~~~~~~~~~~~~~ direct_write_cr4 So specify that this block of code should only build when CONFIG_X86_64=y *AND* CONFIG_UML is unset. Signed-off-by: Brendan Higgins <brendanhiggins@google.com> Acked-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20191213003522.66450-1-brendanhiggins@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-14lkdtm/bugs: Make double-fault test always availableKees Cook
Adjust the DOUBLE_FAULT test to always be available (so test harnesses don't have to make exceptions more missing tests), and for the arch-specific tests to "XFAIL" so that test harnesses can reason about expected vs unexpected failures. Fixes: b09511c253e5 ("lkdtm: Add a DOUBLE_FAULT crash type on x86") Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/202001021226.751D3F869D@keescook Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-14coresight: etm4x: Fix unused function warningArnd Bergmann
Some of the newly added code in the etm4x driver is inside of an #ifdef, and some other code is outside of it, leading to a harmless warning when CONFIG_CPU_PM is disabled: drivers/hwtracing/coresight/coresight-etm4x.c:68:13: error: 'etm4_os_lock' defined but not used [-Werror=unused-function] static void etm4_os_lock(struct etmv4_drvdata *drvdata) ^~~~~~~~~~~~ To avoid the warning and simplify the the #ifdef checks, use IS_ENABLED() instead, so the compiler can drop the unused functions without complaining. Fixes: f188b5e76aae ("coresight: etm4x: Save/restore state across CPU low power states") Signed-off-by: Arnd Bergmann <arnd@arndb.de> [Fixed capital 'f' in title] Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Link: https://lore.kernel.org/r/20191213223107.1484-2-mathieu.poirier@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-14regulator: vqmmc-ipq4019: Trivial clean upAxel Lin
A few trivial clean up: * Make ipq4019_regulator_voltage_ops and vmmc_regulator const * Make ipq4019_vmmcq_regmap_config static * Use regulator_map_voltage_ascend Signed-off-by: Axel Lin <axel.lin@ingics.com> Link: https://lore.kernel.org/r/20200114065847.31667-2-axel.lin@ingics.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-01-14regulator: vqmmc-ipq4019: Remove ipq4019_regulator_removeAxel Lin
This driver is using devm_regulator_register() so no need to call regulator_unregister() in ipq4019_regulator_remove(). Signed-off-by: Axel Lin <axel.lin@ingics.com> Link: https://lore.kernel.org/r/20200114065847.31667-1-axel.lin@ingics.com Signed-off-by: Mark Brown <broonie@kernel.org>
2020-01-14regulator: bindings: Drop document bindings for mpq7920Mark Brown
This reverts commit f5fa59a61eca "regulator: bindings: add document bindings for mpq7920" as Rob has a number of problems with the use of DT schema. Signed-off-by: Mark Brown <broonie@kernel.org>
2020-01-14dt-bindings: Drop entry for Monolithic Power System, MPSMark Brown
This reverts commit 9399e5dc6b679994 adding the entry, Rob has rescinded his ack. Signed-off-by: Mark Brown <broonie@kernel.org>
2020-01-14selftests/timens: Check for right timens offsets after fork and execAndrei Vagin
Output on success: 1..1 ok 1 exec # Pass 1 Fail 0 Xfail 0 Xpass 0 Skip 0 Error 0 Output on failure: 1..1 not ok 1 36016 16 Bail out! Output with lack of permissions: 1..1 not ok 1 # SKIP need to run as root Output without support of time namespaces: 1..1 not ok 1 # SKIP Time namespaces are not supported Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-35-dima@arista.com
2020-01-14selftests/timens: Add a simple perf test for clock_gettime()Andrei Vagin
Output on success: 1..4 ok 1 host: clock: monotonic cycles: 148323947 ok 2 host: clock: boottime cycles: 148577503 ok 3 ns: clock: monotonic cycles: 137659217 ok 4 ns: clock: boottime cycles: 137959154 # Pass 4 Fail 0 Xfail 0 Xpass 0 Skip 0 Error 0 Output with lack of permissions: 1..4 ok 1 host: clock: monotonic cycles: 145671139 ok 2 host: clock: boottime cycles: 146958357 not ok 3 # SKIP need to run as root Output without support of time namespaces: 1..4 ok 1 host: clock: monotonic cycles: 145671139 ok 2 host: clock: boottime cycles: 146958357 not ok 3 # SKIP Time namespaces are not supported Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-34-dima@arista.com
2020-01-14selftests/timens: Add timer offsets testAndrei Vagin
Check that timer_create() takes into account clock offsets. Output on success: 1..3 ok 1 clockid=7 ok 2 clockid=1 ok 3 clockid=9 # Pass 3 Fail 0 Xfail 0 Xpass 0 Skip 0 Error 0 Output with lack of permissions: 1..3 not ok 1 # SKIP need to run as root Output without support of time namespaces: 1..3 not ok 1 # SKIP Time namespaces are not supported Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-33-dima@arista.com
2020-01-14selftests/timens: Add procfs selftestDmitry Safonov
Check that /proc/uptime is correct inside a new time namespace. Output on success: 1..1 ok 1 Passed for /proc/uptime # Pass 1 Fail 0 Xfail 0 Xpass 0 Skip 0 Error 0 Output with lack of permissions: 1..1 not ok 1 # SKIP need to run as root Output without support of time namespaces: 1..1 not ok 1 # SKIP Time namespaces are not supported Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-32-dima@arista.com
2020-01-14selftests/timens: Add a test for clock_nanosleep()Andrei Vagin
Check that clock_nanosleep() takes into account clock offsets. Output on success: 1..4 ok 1 clockid: 1 abs:0 ok 2 clockid: 1 abs:1 ok 3 clockid: 9 abs:0 ok 4 clockid: 9 abs:1 Output with lack of permissions: 1..4 not ok 1 # SKIP need to run as root Output without support of time namespaces: 1..4 not ok 1 # SKIP Time namespaces are not supported Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-31-dima@arista.com
2020-01-14selftests/timens: Add a test for timerfdAndrei Vagin
Check that timerfd_create() takes into account clock offsets. Output on success: 1..3 ok 1 clockid=7 ok 2 clockid=1 ok 3 clockid=9 # Pass 3 Fail 0 Xfail 0 Xpass 0 Skip 0 Error 0 Output on failure: 1..3 not ok 1 clockid: 7 elapsed: 0 not ok 2 clockid: 1 elapsed: 0 not ok 3 clockid: 9 elapsed: 0 Bail out! Output with lack of permissions: 1..3 not ok 1 # SKIP need to run as root Output without support of time namespaces: 1..3 not ok 1 # SKIP Time namespaces are not supported Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-30-dima@arista.com
2020-01-14selftests/timens: Add Time Namespace test for supported clocksDmitry Safonov
A test to check that all supported clocks work on host and inside a new time namespace. Use both ways to get time: through VDSO and by entering the kernel with implicit syscall. Introduce a new timens directory in selftests framework for the next timens tests. Output on success: 1..10 ok 1 Passed for CLOCK_BOOTTIME (syscall) ok 2 Passed for CLOCK_BOOTTIME (vdso) ok 3 Passed for CLOCK_BOOTTIME_ALARM (syscall) ok 4 Passed for CLOCK_BOOTTIME_ALARM (vdso) ok 5 Passed for CLOCK_MONOTONIC (syscall) ok 6 Passed for CLOCK_MONOTONIC (vdso) ok 7 Passed for CLOCK_MONOTONIC_COARSE (syscall) ok 8 Passed for CLOCK_MONOTONIC_COARSE (vdso) ok 9 Passed for CLOCK_MONOTONIC_RAW (syscall) ok 10 Passed for CLOCK_MONOTONIC_RAW (vdso) # Pass 10 Fail 0 Xfail 0 Xpass 0 Skip 0 Error 0 Output with lack of permissions: 1..10 not ok 1 # SKIP need to run as root Output without support of time namespaces: 1..10 not ok 1 # SKIP Time namespaces are not supported Co-developed-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-29-dima@arista.com
2020-01-14fs/proc: Introduce /proc/pid/timens_offsetsAndrei Vagin
API to set time namespace offsets for children processes, i.e.: echo "$clockid $offset_sec $offset_nsec" > /proc/self/timens_offsets Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-28-dima@arista.com
2020-01-14x86/vdso: Zap vvar pages when switching to a time namespaceDmitry Safonov
The VVAR page layout depends on whether a task belongs to the root or non-root time namespace. Whenever a task changes its namespace, the VVAR page tables are cleared and then they will be re-faulted with a corresponding layout. Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-27-dima@arista.com
2020-01-14x86/vdso: On timens page fault prefault also VVAR pageDmitry Safonov
As timens page has offsets to data on VVAR page VVAR is going to be accessed shortly. Set it up with timens in one page fault as optimization. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-26-dima@arista.com
2020-01-14x86/vdso: Handle faults on timens pageDmitry Safonov
If a task belongs to a time namespace then the VVAR page which contains the system wide VDSO data is replaced with a namespace specific page which has the same layout as the VVAR page. Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-25-dima@arista.com
2020-01-14time: Allocate per-timens vvar pageDmitry Safonov
VDSO support for Time namespace needs to set up a page with the same layout as VVAR. That timens page will be placed on position of VVAR page inside namespace. That page contains time namespace clock offsets and it has vdso_data->seq set to 1 to enforce the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace handling path. Allocate the timens page during namespace creation. Setup the offsets when the first task enters the ns and freeze them to guarantee the pace of monotonic/boottime clocks and to avoid breakage of applications. The design decision is to have a global offset_lock which is used during namespace offsets setup and to freeze offsets when the first task joins the new time namespace. That is better in terms of memory usage compared to having a per namespace mutex that's used only during the setup period. Suggested-by: Andy Lutomirski <luto@kernel.org> Based-on-work-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-24-dima@arista.com
2020-01-14x86/vdso: Add time napespace pageDmitry Safonov
To support time namespaces in the VDSO with a minimal impact on regular non time namespace affected tasks, the namespace handling needs to be hidden in a slow path. The most obvious place is vdso_seq_begin(). If a task belongs to a time namespace then the VVAR page which contains the system wide VDSO data is replaced with a namespace specific page which has the same layout as the VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace handling path. The extra check in the case that vdso_data->seq is odd, e.g. a concurrent update of the VDSO data is in progress, is not really affecting regular tasks which are not part of a time namespace as the task is spin waiting for the update to finish and vdso_data->seq to become even again. If a time namespace task hits that code path, it invokes the corresponding time getter function which retrieves the real VVAR page, reads host time and then adds the offset for the requested clock which is stored in the special VVAR page. Allocate the time namespace page among VVAR pages and place vdso_data on it. Provide __arch_get_timens_vdso_data() helper for VDSO code to get the code-relative position of VVARs on that special page. Co-developed-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-23-dima@arista.com
2020-01-14x86/vdso: Provide vdso_data offset on vvar_pageDmitry Safonov
VDSO support for time namespaces needs to set up a page with the same layout as VVAR. That timens page will be placed on position of VVAR page inside namespace. That page has vdso_data->seq set to 1 to enforce the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace handling path. To prepare the time namespace page the kernel needs to know the vdso_data offset. Provide arch_get_vdso_data() helper for locating vdso_data on VVAR page. Co-developed-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-22-dima@arista.com
2020-01-14lib/vdso: Prepare for time namespace supportThomas Gleixner
To support time namespaces in the vdso with a minimal impact on regular non time namespace affected tasks, the namespace handling needs to be hidden in a slow path. The most obvious place is vdso_seq_begin(). If a task belongs to a time namespace then the VVAR page which contains the system wide vdso data is replaced with a namespace specific page which has the same layout as the VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace handling path. The extra check in the case that vdso_data->seq is odd, e.g. a concurrent update of the vdso data is in progress, is not really affecting regular tasks which are not part of a time namespace as the task is spin waiting for the update to finish and vdso_data->seq to become even again. If a time namespace task hits that code path, it invokes the corresponding time getter function which retrieves the real VVAR page, reads host time and then adds the offset for the requested clock which is stored in the special VVAR page. If VDSO time namespace support is disabled the whole magic is compiled out. Initial testing shows that the disabled case is almost identical to the host case which does not take the slow timens path. With the special timens page installed the performance hit is constant time and in the range of 5-7%. For the vdso functions which are not using the sequence count an unconditional check for vdso_data->clock_mode is added which switches to the real vdso when the clock_mode is VCLOCK_TIMENS. [avagin: Make do_hres_timens() work with raw clocks too: choose vdso_data pointer by CS_RAW offset.] Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-21-dima@arista.com
2020-01-14x86/vdso: Restrict splitting VVAR VMADmitry Safonov
Forbid splitting VVAR VMA resulting in a stricter ABI and reducing the amount of corner-cases to consider while working further on VDSO time namespace support. As the offset from timens to VVAR page is computed compile-time, the pages in VVAR should stay together and not being partically mremap()'ed. Co-developed-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-20-dima@arista.com
2020-01-14fs/proc: Respect boottime inside time namespace for /proc/uptimeDmitry Safonov
Make sure that /proc/uptime is adjusted to the tasks time namespace. Co-developed-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-19-dima@arista.com
2020-01-14posix-timers: Make clock_nanosleep() time namespace awareAndrei Vagin
clock_nanosleep() accepts absolute values of expiration time, if the TIMER_ABSTIME flag is set. This value is in the tasks time namespace, which has to be converted to the host time namespace. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-18-dima@arista.com
2020-01-14hrtimers: Prepare hrtimer_nanosleep() for time namespacesAndrei Vagin
clock_nanosleep() accepts absolute values of expiration time when TIMER_ABSTIME flag is set. This absolute value is inside the task's time namespace, and has to be converted to the host's time. There is timens_ktime_to_host() helper for converting time, but it accepts ktime argument. As a preparation, make hrtimer_nanosleep() accept a clock value in ktime instead of timespec64. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-17-dima@arista.com
2020-01-14alarmtimer: Make nanosleep() time namespace awareAndrei Vagin
clock_nanosleep() accepts absolute values of expiration time when the TIMER_ABSTIME flag is set. This absolute value is inside the task's time namespace and has to be converted to the host's time. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-16-dima@arista.com
2020-01-14posix-timers: Make timer_settime() time namespace awareAndrei Vagin
Wire timer_settime() syscall into time namespace virtualization. sys_timer_settime() calls the ktime->timer_set() callback. Right now, common_timer_set() is the only implementation for the callback. The user-supplied expiry value is converted from timespec64 to ktime and then timens_ktime_to_host() can be used to convert namespace's time to the host time. Inside a time namespace kernel's time differs by a fixed offset from a user-supplied time, but only absolute values (TIMER_ABSTIME) must be converted. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-15-dima@arista.com
2020-01-14timerfd: Make timerfd_settime() time namespace awareAndrei Vagin
timerfd_settime() accepts an absolute value of the expiration time if TFD_TIMER_ABSTIME is specified. This value is in the task's time namespace and has to be converted to the host's time namespace. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-14-dima@arista.com
2020-01-14time: Add do_timens_ktime_to_host() helperAndrei Vagin
The helper subtracts namespace's clock offset from the given time and ensures that the result is within [0, KTIME_MAX]. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-13-dima@arista.com
2020-01-14posix-clocks: Wire up clock_gettime() with timens offsetsAndrei Vagin
Adjust monotonic and boottime clocks with per-timens offsets. As the result a process inside time namespace will see timers and clocks corrected to offsets that were set when the namespace was created Note that applications usually go through vDSO to get time, which is not yet adjusted. Further changes will complete time namespace virtualisation with vDSO support. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-12-dima@arista.com
2020-01-14posix-timers: Use clock_get_ktime() in common_timer_get()Andrei Vagin
Now, when the clock_get_ktime() callback exists, the suboptimal timespec64-based conversion can be removed from common_timer_get(). Suggested-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-11-dima@arista.com
2020-01-14posix-clocks: Introduce clock_get_ktime() callbackAndrei Vagin
The callsite in common_timer_get() has already a comment: /* * The timespec64 based conversion is suboptimal, but it's not * worth to implement yet another callback. */ kc->clock_get(timr->it_clock, &ts64); now = timespec64_to_ktime(ts64); The upcoming support for time namespaces requires to have access to: - The time in a task's time namespace for sys_clock_gettime() - The time in the root name space for common_timer_get() That adds a valid reason to finally implement a separate callback which returns the time in ktime_t format. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-10-dima@arista.com