Age | Commit message (Collapse) | Author |
|
This patch uses the cpufeatures framework to determine common SVE
capabilities and vector lengths, and configures the runtime SVE
support code appropriately.
ZCR_ELx is not really a feature register, but it is convenient to
use it as a template for recording the maximum vector length
supported by a CPU, using the LEN field. This field is similar to
a feature field in that it is a contiguous bitfield for which we
want to determine the minimum system-wide value. This patch adds
ZCR as a pseudo-register in cpuinfo/cpufeatures, with appropriate
custom code to populate it. Finding the minimum supported value of
the LEN field is left to the cpufeatures framework in the usual
way.
The meaning of ID_AA64ZFR0_EL1 is not architecturally defined yet,
so for now we just require it to be zero.
Note that much of this code is dormant and SVE still won't be used
yet, since system_supports_sve() remains hardwired to false.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
update_cpu_features() currently cannot tell whether it is being
called during early or late secondary boot. This doesn't
desperately matter for anything it currently does.
However, SVE will need to know here whether the set of available
vector lengths is known or still to be determined when booting a
CPU, so that it can be updated appropriately.
This patch simply moves the sys_caps_initialised stuff to the top
of the file so that it can be used more widely. There doesn't seem
to be a more obvious place to put it.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
This patch implements the core logic for changing a task's vector
length on request from userspace. This will be used by the ptrace
and prctl frontends that are implemented in later patches.
The SVE architecture permits, but does not require, implementations
to support vector lengths that are not a power of two. To handle
this, logic is added to check a requested vector length against a
possibly sparse bitmap of available vector lengths at runtime, so
that the best supported value can be chosen.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
This patch implements support for saving and restoring the SVE
registers around signals.
A fixed-size header struct sve_context is always included in the
signal frame encoding the thread's vector length at the time of
signal delivery, optionally followed by a variable-layout structure
encoding the SVE registers.
Because of the need to preserve backwards compatibility, the FPSIMD
view of the SVE registers is always dumped as a struct
fpsimd_context in the usual way, in addition to any sve_context.
The SVE vector registers are dumped in full, including bits 127:0
of each register which alias the corresponding FPSIMD vector
registers in the hardware. To avoid any ambiguity about which
alias to restore during sigreturn, the kernel always restores bits
127:0 of each SVE vector register from the fpsimd_context in the
signal frame (which must be present): userspace needs to take this
into account if it wants to modify the SVE vector register contents
on return from a signal.
FPSR and FPCR, which are used by both FPSIMD and SVE, are not
included in sve_context because they are always present in
fpsimd_context anyway.
For signal delivery, a new helper
fpsimd_signal_preserve_current_state() is added to update _both_
the FPSIMD and SVE views in the task struct, to make it easier to
populate this information into the signal frame. Because of the
redundancy between the two views of the state, only one is updated
otherwise.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
It's desirable to be able to reset the vector length to some sane
default for new processes, since the new binary and its libraries
may or may not be SVE-aware.
This patch tracks the desired post-exec vector length (if any) in a
new thread member sve_vl_onexec, and adds a new thread flag
TIF_SVE_VL_INHERIT to control whether to inherit or reset the
vector length. Currently these are inactive. Subsequent patches
will provide the capability to configure them.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
This patch adds the core support for switching and managing the SVE
architectural state of user tasks.
Calls to the existing FPSIMD low-level save/restore functions are
factored out as new functions task_fpsimd_{save,load}(), since SVE
now dynamically may or may not need to be handled at these points
depending on the kernel configuration, hardware features discovered
at boot, and the runtime state of the task. To make these
decisions as fast as possible, const cpucaps are used where
feasible, via the system_supports_sve() helper.
The SVE registers are only tracked for threads that have explicitly
used SVE, indicated by the new thread flag TIF_SVE. Otherwise, the
FPSIMD view of the architectural state is stored in
thread.fpsimd_state as usual.
When in use, the SVE registers are not stored directly in
thread_struct due to their potentially large and variable size.
Because the task_struct slab allocator must be configured very
early during kernel boot, it is also tricky to configure it
correctly to match the maximum vector length provided by the
hardware, since this depends on examining secondary CPUs as well as
the primary. Instead, a pointer sve_state in thread_struct points
to a dynamically allocated buffer containing the SVE register data,
and code is added to allocate and free this buffer at appropriate
times.
TIF_SVE is set when taking an SVE access trap from userspace, if
suitable hardware support has been detected. This enables SVE for
the thread: a subsequent return to userspace will disable the trap
accordingly. If such a trap is taken without sufficient system-
wide hardware support, SIGILL is sent to the thread instead as if
an undefined instruction had been executed: this may happen if
userspace tries to use SVE in a system where not all CPUs support
it for example.
The kernel will clear TIF_SVE and disable SVE for the thread
whenever an explicit syscall is made by userspace. For backwards
compatibility reasons and conformance with the spirit of the base
AArch64 procedure call standard, the subset of the SVE register
state that aliases the FPSIMD registers is still preserved across a
syscall even if this happens. The remainder of the SVE register
state logically becomes zero at syscall entry, though the actual
zeroing work is currently deferred until the thread next tries to
use SVE, causing another trap to the kernel. This implementation
is suboptimal: in the future, the fastpath case may be optimised
to zero the registers in-place and leave SVE enabled for the task,
where beneficial.
TIF_SVE is also cleared in the following slowpath cases, which are
taken as reasonable hints that the task may no longer use SVE:
* exec
* fork and clone
Code is added to sync data between thread.fpsimd_state and
thread.sve_state whenever enabling/disabling SVE, in a manner
consistent with the SVE architectural programmer's model.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Alex Bennée <alex.bennee@linaro.org>
[will: added #include to fix allnoconfig build]
[will: use enable_daif in do_sve_acc]
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
To enable the kernel to use SVE, SVE traps from EL1 to EL2 must be
disabled. To take maximum advantage of the hardware, the full
available vector length also needs to be enabled for EL1 by
programming ZCR_EL2.LEN. (The kernel will program ZCR_EL1.LEN as
required, but this cannot override the limit set by ZCR_EL2.)
This patch makes the appropriate changes to the EL2 early setup
code.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
This patch defines the representation that will be used for the SVE
register state in the signal frame, and implements support for
saving and restoring the SVE registers around signals.
The same layout will also be used for the in-kernel task state.
Due to the variability of the SVE vector length, it is not possible
to define a fixed C struct to describe all the registers. Instead,
Macros are defined in sigcontext.h to facilitate access to the
parts of the structure.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
This patch adds CONFIG_ARM64_SVE to control building of SVE support
into the kernel, and adds a stub predicate system_supports_sve() to
control conditional compilation and runtime SVE support.
system_supports_sve() just returns false for now: it will be
replaced with a non-trivial implementation in a later patch, once
SVE support is complete enough to be enabled safely.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Manipulating the SVE architectural state, including the vector and
predicate registers, first-fault register and the vector length,
requires the use of dedicated instructions added by SVE.
This patch adds suitable assembly functions for saving and
restoring the SVE registers and querying the vector length.
Setting of the vector length is done as part of register restore.
Since people building kernels may not all get an SVE-enabled
toolchain for a while, this patch uses macros that generate
explicit opcodes in place of assembler mnemonics.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The SVE architecture adds some system registers, ID register fields
and a dedicated ESR exception class.
This patch adds the appropriate definitions that will be needed by
the kernel.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The existing FPSIMD context switch code contains a couple of
instances of {set,clear}_ti_thread(task_thread_info(task)). Since
there are thread flag manipulators that operate directly on
task_struct, this verbosity isn't strictly needed.
For consistency, this patch simplifies the affected calls. This
should have no impact on behaviour.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Currently, armv8_deprected.c takes charge of the "abi" sysctl
directory, which makes life difficult for other code that wants to
register sysctls in the same directory.
There is a "new" [1] sysctl registration interface that removes the
need to define ctl_tables for parent directories explicitly, which
is ideal here.
This patch ports register_insn_emulation_sysctl() over to the
register_sysctl() interface and removes the redundant ctl_table for
"abi".
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
[1] fea478d4101a (sysctl: Add register_sysctl for normal sysctl
users)
The commit message notes an intent to port users of the
pre-existing interfaces over to register_sysctl(), though the
number of users of the new interface currently appears negligible.
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The EFI runtime services ABI permits calls to EFI to clobber
certain FPSIMD/NEON registers, as per the AArch64 procedure call
standard.
Saving/restoring the clobbered registers around such calls needs
KERNEL_MODE_NEON, but the dependency is missing from Kconfig.
This patch adds the missing dependency.
This will aid bisection of the patches implementing support for the
ARM Scalable Vector Extension (SVE).
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Currently, a guest kernel sees the true CPU feature registers
(ID_*_EL1) when it reads them using MRS instructions. This means
that the guest may observe features that are present in the
hardware but the host doesn't understand or doesn't provide support
for. A guest may legimitately try to use such a feature as per the
architecture, but use of the feature may trap instead of working
normally, triggering undef injection into the guest.
This is not a problem for the host, but the guest may go wrong when
running on newer hardware than the host knows about.
This patch hides from guest VMs any AArch64-specific CPU features
that the host doesn't support, by exposing to the guest the
sanitised versions of the registers computed by the cpufeatures
framework, instead of the true hardware registers. To achieve
this, HCR_EL2.TID3 is now set for AArch64 guests, and emulation
code is added to KVM to report the sanitised versions of the
affected registers in response to MRS and register reads from
userspace.
The affected registers are removed from invariant_sys_regs[] (since
the invariant_sys_regs handling is no longer quite correct for
them) and added to sys_reg_desgs[], with appropriate access(),
get_user() and set_user() methods. No runtime vcpu storage is
allocated for the registers: instead, they are read on demand from
the cpufeatures framework. This may need modification in the
future if there is a need for userspace to customise the features
visible to the guest.
Attempts by userspace to write the registers are handled similarly
to the current invariant_sys_regs handling: writes are permitted,
but only if they don't attempt to change the value. This is
sufficient to support VM snapshot/restore from userspace.
Because of the additional registers, restoring a VM on an older
kernel may not work unless userspace knows how to handle the extra
VM registers exposed to the KVM user ABI by this patch.
Under the principle of least damage, this patch makes no attempt to
handle any of the other registers currently in
invariant_sys_regs[], or to emulate registers for AArch32: however,
these could be handled in a similar way in future, as necessary.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Currently sys_rt_sigreturn() verifies that the base sigframe is
readable, but no similar check is performed on the extra data to
which an extra_context record points.
This matters because the extra data will be read with the
unprotected user accessors. However, this is not a problem at
present because the extra data base address is required to be
exactly at the end of the base sigframe. So, there would need to
be a non-user-readable kernel address within about 59K
(SIGFRAME_MAXSZ - sizeof(struct rt_sigframe)) of some address for
which access_ok(VERIFY_READ) returns true, in order for sigreturn
to be able to read kernel memory that should be inaccessible to the
user task. This is currently impossible due to the untranslatable
address hole between the TTBR0 and TTBR1 address ranges.
Disappearance of the hole between the TTBR0 and TTBR1 mapping
ranges would require the VA size for TTBR0 and TTBR1 to grow to at
least 55 bits, and either the disabling of tagged pointers for
userspace or enabling of tagged pointers for kernel space; none of
which is currently envisaged.
Even so, it is wrong to use the unprotected user accessors without
an accompanying access_ok() check.
To avoid the potential for future surprises, this patch does an
explicit access_ok() check on the extra data space when parsing an
extra_context record.
Fixes: 33f082614c34 ("arm64: signal: Allow expansion of the signal frame")
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
A couple of FPSIMD exception handling functions that are called
from entry.S are currently not annotated as such.
This is not a big deal since asmlinkage does nothing on arm/arm64,
but fixing the annotations is more consistent and may help avoid
future surprises.
This patch adds appropriate asmlinkage annotations for
do_fpsimd_acc() and do_fpsimd_exc().
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Currently the regset API doesn't allow for the possibility that
regsets (or at least, the amount of meaningful data in a regset)
may change in size.
In particular, this results in useless padding being added to
coredumps if a regset's current size is smaller than its
theoretical maximum size.
This patch adds a get_size() function to struct user_regset.
Individual regset implementations can implement this function to
return the current size of the regset data. A regset_size()
function is added to provide callers with an abstract interface for
determining the size of a regset without needing to know whether
the regset is dynamically sized or not.
The only affected user of this interface is the ELF coredump code:
This patch ports ELF coredump to dump regsets with their actual
size in the coredump. This has no effect except for new regsets
that are dynamically sized and provide a get_size() implementation.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: H. J. Lu <hjl.tools@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
When the PMU driver is built as a module, the perf expects the
pmu->module to be valid, so that the driver is prevented from
being unloaded while it is in use. Fix the CCN pmu driver to
fill in this field.
Fixes: a33b0daab73a0 ("bus: ARM CCN PMU driver")
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
When the PMU driver is built as a module, the perf expects the
pmu->module to be valid, so that the driver is prevented from
being unloaded while it is in use. Fix the SPE pmu driver to
fill in this field.
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Updated comment. We are keeping track of maximum number of retries per
command via retries/allowed in struct scsi_cmnd. Corrected comment
positioning.
[mkp: applied by hand]
Signed-off-by: Petros Koutoupis <petros@petroskoutoupis.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The needed headroom that we ask the stack to reserve for us in TX
skbs is larger than the headroom available in RX frames, which
leads to skb reallocations in forwarding scenarios involving two
DPNI interfaces.
Configure the hardware to reserve some extra space in the RX
frame headroom to avoid this situation. The value is chosen based
on the Tx frame data offset, the Rx buffer alignment value and the
netdevice required headroom.
The network stack will take care to reserve space for HH_DATA_MOD when
building the skb, so there's no need to account for it in the netdevice
needed headroom.
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@nxp.com>
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The WRIOP hardware block v1.0.0 (found on LS2080A board)
requires data in RX buffers to be aligned to 256B, but
newer revisions (e.g. on LS2088A, LS1088A) only require
64B alignment.
Check WRIOP version and decide at runtime which alignment
requirement to configure for ingress buffers.
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@nxp.com>
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
When configuring the Tx buffer layout, the software annotation size is
mentioned, and MC accounts for it when configuring the frame
tx_data_offset. No need to handle it in the driver as well.
This results in 64B less memory allocated per frame.
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@nxp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Since setup_dpni() became a bit too long, move the buffer layout
configuration to a separate function.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@nxp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Clean up goto labels in a couple of functions, by
removing/renaming redundant ones.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@nxp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
A caller should never care about a debugfs error return value, and it
should never abort its normal operation if something "odd" goes on. Fix
up the unisys init code to not care if the root debugfs directory for
the subsystem is created or not, as no place it is used will matter.
Cc: David Kershner <david.kershner@unisys.com>
Cc: Tim Sell <Timothy.Sell@unisys.com>
Cc: Sameer Wadgaonkar <sameer.wadgaonkar@unisys.com>
Cc: David Binder <david.binder@unisys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
One page may store a set of entries of the sis->swap_map
(swap_info_struct->swap_map) in multiple swap clusters.
If some of the entries has sis->swap_map[offset] > SWAP_MAP_MAX,
multiple pages will be used to store the set of entries of the
sis->swap_map. And the pages are linked with page->lru. This is called
swap count continuation. To access the pages which store the set of
entries of the sis->swap_map simultaneously, previously, sis->lock is
used. But to improve the scalability of __swap_duplicate(), swap
cluster lock may be used in swap_count_continued() now. This may race
with add_swap_count_continuation() which operates on a nearby swap
cluster, in which the sis->swap_map entries are stored in the same page.
The race can cause wrong swap count in practice, thus cause unfreeable
swap entries or software lockup, etc.
To fix the race, a new spin lock called cont_lock is added to struct
swap_info_struct to protect the swap count continuation page list. This
is a lock at the swap device level, so the scalability isn't very well.
But it is still much better than the original sis->lock, because it is
only acquired/released when swap count continuation is used. Which is
considered rare in practice. If it turns out that the scalability
becomes an issue for some workloads, we can split the lock into some
more fine grained locks.
Link: http://lkml.kernel.org/r/20171017081320.28133-1-ying.huang@intel.com
Fixes: 235b62176712 ("mm/swap: add cluster lock")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shaohua Li <shli@kernel.org>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org> [4.11+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We need to deposit pre-allocated PTE page table when a PMD migration
entry is copied in copy_huge_pmd(). Otherwise, we will leak the
pre-allocated page and cause a NULL pointer dereference later in
zap_huge_pmd().
The missing counters during PMD migration entry copy process are added
as well.
The bug report is here: https://lkml.org/lkml/2017/10/29/214
Link: http://lkml.kernel.org/r/20171030144636.4836-1-zi.yan@sent.com
Fixes: 84c3fc4e9c563 ("mm: thp: check pmd migration entry in common path")
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is a follow-up to commit 57ddfdaa9a72 ("initramfs: fix disabling of
initramfs (and its compression)"). This particular commit fixed the use
case where we build the kernel with an initramfs with no compression,
and then we build the kernel with no initramfs.
Now this still left us with the same case as described here:
http://lkml.kernel.org/r/20170521033337.6197-1-f.fainelli@gmail.com
not working with initramfs compression. This can be seen by the
following steps/timestamps:
https://www.spinics.net/lists/kernel/msg2598153.html
.initramfs_data.cpio.gz.cmd is correct:
cmd_usr/initramfs_data.cpio.gz := /bin/bash
./scripts/gen_initramfs_list.sh -o usr/initramfs_data.cpio.gz -u 1000 -g 1000 /home/fainelli/work/uclinux-rootfs/romfs /home/fainelli/work/uclinux-rootfs/misc/initramfs.dev
and was generated the first time we did generate the gzip initramfs, so
the command has not changed, nor its arguments, so we just don't call
it, no initramfs cpio is re-generated as a consequence.
The fix for this problem is just to properly keep track of the
.initramfs_cpio_data.d file by suffixing it with the compression
extension. This takes care of properly tracking dependencies such that
the initramfs get (re)generated any time files are added/deleted etc.
Link: http://lkml.kernel.org/r/20170930033936.6722-1-f.fainelli@gmail.com
Fixes: db2aa7fd15e8 ("initramfs: allow again choice of the embedded initramfs compression algorithm")
Fixes: 9e3596b0c653 ("kbuild: initramfs cleanup, set target from Kconfig")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Cc: "Francisco Blas Izquierdo Riera (klondike)" <klondike@xiscosoft.net>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Calling madvise(MADV_HWPOISON) on a hugetlbfs page will result in bad
(negative) reserved huge page counts. This may not happen immediately,
but may happen later when the underlying file is removed or filesystem
unmounted. For example:
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
HugePages_Total: 1
HugePages_Free: 0
HugePages_Rsvd: 18446744073709551615
HugePages_Surp: 0
Hugepagesize: 2048 kB
In routine hugetlbfs_error_remove_page(), hugetlb_fix_reserve_counts is
called after remove_huge_page. hugetlb_fix_reserve_counts is designed
to only be called/used only if a failure is returned from
hugetlb_unreserve_pages. Therefore, call hugetlb_unreserve_pages as
required and only call hugetlb_fix_reserve_counts in the unlikely event
that hugetlb_unreserve_pages returns an error.
Link: http://lkml.kernel.org/r/20171019230007.17043-2-mike.kravetz@oracle.com
Fixes: 78bb920344b8 ("mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The first cluster group descriptor is not stored at the start of the
group but at an offset from the start. We need to take this into
account while doing fstrim on the first cluster group. Otherwise we
will wrongly start fstrim a few blocks after the desired start block and
the range can cross over into the next cluster group and zero out the
group descriptor there. This can cause filesytem corruption that cannot
be fixed by fsck.
Link: http://lkml.kernel.org/r/1507835579-7308-1-git-send-email-ashish.samant@oracle.com
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When the pagetable is walked in the implementation of /proc/<pid>/pagemap,
pmd_soft_dirty() is used for both the PMD huge page map and the PMD
migration entries. That is wrong, pmd_swp_soft_dirty() should be used
for the PMD migration entries instead because the different page table
entry flag is used.
As a result, /proc/pid/pagemap may report incorrect soft dirty information
for PMD migration entries.
Link: http://lkml.kernel.org/r/20171017081818.31795-1-ying.huang@intel.com
Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Daniel Colascione <dancol@google.com>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This oops:
kernel BUG at fs/hugetlbfs/inode.c:484!
RIP: remove_inode_hugepages+0x3d0/0x410
Call Trace:
hugetlbfs_setattr+0xd9/0x130
notify_change+0x292/0x410
do_truncate+0x65/0xa0
do_sys_ftruncate.constprop.3+0x11a/0x180
SyS_ftruncate+0xe/0x10
tracesys+0xd9/0xde
was caused by the lack of i_size check in hugetlb_mcopy_atomic_pte.
mmap() can still succeed beyond the end of the i_size after vmtruncate
zapped vmas in those ranges, but the faults must not succeed, and that
includes UFFDIO_COPY.
We could differentiate the retval to userland to represent a SIGBUS like
a page fault would do (vs SIGSEGV), but it doesn't seem very useful and
we'd need to pick a random retval as there's no meaningful syscall
retval that would differentiate from SIGSEGV and SIGBUS, there's just
-EFAULT.
Link: http://lkml.kernel.org/r/20171016223914.2421-2-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When using the rfc4543(gcm(aes))) mode, the registers of the hardware
engine are not empty after use. If the engine is not reset before its
next use, the following results will be invalid.
Always reset the hardware engine.
Signed-off-by: Romain Izard <romain.izard.pro@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Certain cipher modes like CTS expect the IV (req->info) of
ablkcipher_request (or equivalently req->iv of skcipher_request) to
contain the last ciphertext block when the {en,de}crypt operation is done.
Fix this issue for the Atmel AES hardware engine. The tcrypt test
case for cts(cbc(aes)) is now correctly passed.
In the case of in-place decryption, copy the ciphertext in an
intermediate buffer before decryption.
Signed-off-by: Romain Izard <romain.izard.pro@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
User is able to select a chosen rng by writing its name to rng_current
but there is no way to reset it without unbinding the rng. Let user
write "" to rng_current and delesect the chosen rng.
Signed-off-by: PrasannaKumar Muralidharan <prasannatsmkumar@gmail.com>
reviewed-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add support for MD5, SHA1, SHA256 hash algorithms for Exynos HW.
It uses the crypto framework asynchronous hash api.
It is based on omap-sham.c driver.
S5P has some HW differencies and is not implemented.
Modifications in s5p-sss:
- Add hash supporting structures and functions.
- Modify irq handler to handle both aes and hash signals.
- Resize resource end in probe if EXYNOS_HASH is enabled in
Kconfig.
- Add new copyright line and new author.
- Tested on Odroid-U3 with Exynos 4412 CPU, kernel 4.13-rc6
with crypto run-time self test testmgr
and with tcrypt module with: modprobe tcrypt sec=1 mode=N
where N=402, 403, 404 (MD5, SHA1, SHA256).
Modifications in drivers/crypto/Kconfig:
- Add new CRYPTO_DEV_EXYNOS_HASH, depend on !EXYNOS_RNG
and CRYPTO_DEV_S5P
- Select sw algorithms MD5, SHA1 and SHA256 in EXYNOS_HASH
as they are needed for fallback.
Acked-by: Vladimir Zapolskiy <vz@mleia.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Kamil Konieczny <k.konieczny@partner.samsung.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Change #define lines to use tabs consistently.
Acked-by: Vladimir Zapolskiy <vz@mleia.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Kamil Konieczny <k.konieczny@partner.samsung.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Lars Persson <lars.persson@axis.com>
Cc: Niklas Cassel <niklas.cassel@axis.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jamie Iles <jamie@jamieiles.com>
Cc: linux-arm-kernel@axis.com
Cc: linux-crypto@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Jamie Iles <jamie@jamieiles.com>
Acked-by: Lars Persson <lars.persson@axis.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
caam/qi frontend (i.e. caamalg_qi) mustn't be used in case it runs on a
DPAA2 part (this could happen when using a multiplatform kernel).
Fixes: 297b9cebd2fc ("crypto: caam/jr - add support for DPAA2 parts")
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Fixes: 3ebfa92f49a6 ("crypto: caam - Add new macros for building extended SEC descriptors (> 64 words)")
Signed-off-by: Radu Alexe <radu.alexe@nxp.com>
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
irq would be set to -1 and then unused, if we failed to get IORESOURCE_MEM.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Return -ENODEV when dma_request_slave_channel_compat() fails.
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The code sample is waiting for an async. crypto op completion.
Adapt sample to use the new generic infrastructure to do the same.
This also fixes a possible data coruption bug created by the
use of wait_for_completion_interruptible() without dealing
correctly with an interrupt aborting the wait prior to the
async op finishing.
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The mediatek driver starts several async crypto ops and waits for their
completions. Move it over to generic code doing the same.
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The qce driver starts several async crypto ops and waits for their
completions. Move it over to generic code doing the same.
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The talitos driver starts several async crypto ops and waits for their
completions. Move it over to generic code doing the same.
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
tcrypt starts several async crypto ops and waits for their completions.
Move it over to generic code doing the same.
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
ima starts several async crypto ops and waits for their completions.
Move it over to generic code doing the same.
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|