summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-02-10crypto: hisilicon - PASID fixed on Kunpeng 930Weili Qian
Enable PASID by setting 'sqc' and 'cqc' pasid bits per queue in Kunpeng 930. For Kunpeng 920, PASID is effective for all queues once set in SVA scenarios. Signed-off-by: Weili Qian <qianweili@huawei.com> Signed-off-by: Hui Tang <tanghui20@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: hisilicon/qm - fix use of 'dma_map_single'Weili Qian
Calling 'dma_map_single' after the data is written to ensure that the cpu cache and dma cache are consistent. Signed-off-by: Weili Qian <qianweili@huawei.com> Signed-off-by: Hui Tang <tanghui20@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: hisilicon/hpre - tiny fixHui Tang
Update since some special settings only for Kunpeng920. Signed-off-by: Hui Tang <tanghui20@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: hisilicon/hpre - adapt the number of clustersHui Tang
HPRE of Kunpeng 930 is updated on cluster numbers, so we try to update this driver to make it running okay on Kunpeng920/Kunpeng930 chips. Signed-off-by: Hui Tang <tanghui20@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: cpt - remove casting dma_alloc_coherentXu Wang
Remove casting the values returned by dma_alloc_coherent. Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: keembay-ocs-aes - Fix 'q' assignment during CCM B0 generationDaniele Alessandrelli
In ocs_aes_ccm_write_b0(), 'q' (the octet length of the binary representation of the octet length of the payload) is set to 'iv[0]', while it should be set to 'iv[0] & 0x7' (i.e., only the last 3 bits of iv[0] should be used), as documented in NIST Special Publication 800-38C: https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38c.pdf In practice, this is not an issue, since 'iv[0]' is checked to be in the range [1-7] by ocs_aes_validate_inputs(), but let's fix the assignment anyway, in order to make the code more robust. Signed-off-by: Daniele Alessandrelli <daniele.alessandrelli@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: xor - Fix typo of optimizationBhaskar Chowdhury
s/optimzation/optimization/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10hwrng: optee - Use device-managed registration APITian Tao
Use devm_hwrng_register to get rid of manual unregistration. Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Reviewed-by: Sumit Garg <sumit.garg@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: arm64/crc-t10dif - move NEON yield to C codeArd Biesheuvel
Instead of yielding from the bowels of the asm routine if a reschedule is needed, divide up the input into 4 KB chunks in the C glue. This simplifies the code substantially, and avoids scheduling out the task with the asm routine on the call stack, which is undesirable from a CFI/instrumentation point of view. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: arm64/aes-ce-mac - simplify NEON yieldArd Biesheuvel
Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: arm64/aes-neonbs - remove NEON yield callsArd Biesheuvel
There is no need for elaborate yield handling in the bit-sliced NEON implementation of AES, given that skciphers are naturally bounded by the size of the chunks returned by the skcipher_walk API. So remove the yield calls from the asm code. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: arm64/sha512-ce - simplify NEON yieldArd Biesheuvel
Instead of calling into kernel_neon_end() and kernel_neon_begin() (and potentially into schedule()) from the assembler code when running in task mode and a reschedule is pending, perform only the preempt count check in assembler, but simply return early in this case, and let the C code deal with the consequences. This reverts commit 6caf7adc5e458f77f550b6c6ca8effa152d61b4a. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: arm64/sha3-ce - simplify NEON yieldArd Biesheuvel
Instead of calling into kernel_neon_end() and kernel_neon_begin() (and potentially into schedule()) from the assembler code when running in task mode and a reschedule is pending, perform only the preempt count check in assembler, but simply return early in this case, and let the C code deal with the consequences. This reverts commit 7edc86cb1c18b4c274672232117586ea2bef1d9a. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: arm64/sha2-ce - simplify NEON yieldArd Biesheuvel
Instead of calling into kernel_neon_end() and kernel_neon_begin() (and potentially into schedule()) from the assembler code when running in task mode and a reschedule is pending, perform only the preempt count check in assembler, but simply return early in this case, and let the C code deal with the consequences. This reverts commit d82f37ab5e2426287013eba38b1212e8b71e5be3. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: arm64/sha1-ce - simplify NEON yieldArd Biesheuvel
Instead of calling into kernel_neon_end() and kernel_neon_begin() (and potentially into schedule()) from the assembler code when running in task mode and a reschedule is pending, perform only the preempt count check in assembler, but simply return early in this case, and let the C code deal with the consequences. This reverts commit 7df8d164753e6e6f229b72767595072bc6a71f48. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: ecdh_helper - Ensure 'len >= secret.len' in decode_key()Daniele Alessandrelli
The length ('len' parameter) passed to crypto_ecdh_decode_key() is never checked against the length encoded in the passed buffer ('buf' parameter). This could lead to an out-of-bounds access when the passed length is less than the encoded length. Add a check to prevent that. Fixes: 3c4b23901a0c7 ("crypto: ecdh - Add ECDH software support") Signed-off-by: Daniele Alessandrelli <daniele.alessandrelli@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: powerpc/sha256 - remove unneeded semicolonYang Li
Eliminate the following coccicheck warning: ./arch/powerpc/crypto/sha256-spe-glue.c:132:2-3: Unneeded semicolon Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: caam - Replace DEFINE_SIMPLE_ATTRIBUTE with DEFINE_DEBUGFS_ATTRIBUTEJiapeng Chong
Fix the following coccicheck warning: ./drivers/crypto/caam/debugfs.c:23:0-23: WARNING: caam_fops_u64_ro should be defined with DEFINE_DEBUGFS_ATTRIBUTE. ./drivers/crypto/caam/debugfs.c:22:0-23: WARNING: caam_fops_u32_ro should be defined with DEFINE_DEBUGFS_ATTRIBUTE. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: twofish - use unaligned accessors instead of alignmaskArd Biesheuvel
Instead of using an alignmask of 0x3 to ensure 32-bit alignment of the Twofish input and output blocks, which propagates to mode drivers, and results in pointless copying on architectures that don't care about alignment, use the unaligned accessors, which will do the right thing on each respective architecture, avoiding the need for double buffering. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: fcrypt - drop unneeded alignmaskArd Biesheuvel
The fcrypt implementation uses memcpy() to access the input and output buffers so there is no need to set an alignmask. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: cast6 - use unaligned accessors instead of alignmaskArd Biesheuvel
Instead of using an alignmask of 0x3 to ensure 32-bit alignment of the CAST6 input and output blocks, which propagates to mode drivers, and results in pointless copying on architectures that don't care about alignment, use the unaligned accessors, which will do the right thing on each respective architecture, avoiding the need for double buffering. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: cast5 - use unaligned accessors instead of alignmaskArd Biesheuvel
Instead of using an alignmask of 0x3 to ensure 32-bit alignment of the CAST5 input and output blocks, which propagates to mode drivers, and results in pointless copying on architectures that don't care about alignment, use the unaligned accessors, which will do the right thing on each respective architecture, avoiding the need for double buffering. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: camellia - use unaligned accessors instead of alignmaskArd Biesheuvel
Instead of using an alignmask of 0x3 to ensure 32-bit alignment of the Camellia input and output blocks, which propagates to mode drivers, and results in pointless copying on architectures that don't care about alignment, use the unaligned accessors, which will do the right thing on each respective architecture, avoiding the need for double buffering. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: blowfish - use unaligned accessors instead of alignmaskArd Biesheuvel
Instead of using an alignmask of 0x3 to ensure 32-bit alignment of the Blowfish input and output blocks, which propagates to mode drivers, and results in pointless copying on architectures that don't care about alignment, use the unaligned accessors, which will do the right thing on each respective architecture, avoiding the need for double buffering. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: serpent - use unaligned accessors instead of alignmaskArd Biesheuvel
Instead of using an alignmask of 0x3 to ensure 32-bit alignment of the Serpent input and output blocks, which propagates to mode drivers, and results in pointless copying on architectures that don't care about alignment, use the unaligned accessors, which will do the right thing on each respective architecture, avoiding the need for double buffering. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: serpent - get rid of obsolete tnepres variantArd Biesheuvel
It is not trivial to trace back why exactly the tnepres variant of serpent was added ~17 years ago - Google searches come up mostly empty, but it seems to be related with the 'kerneli' version, which was based on an incorrect interpretation of the serpent spec. In other words, nobody is likely to care anymore today, so let's get rid of it. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: michael_mic - fix broken misalignment handlingArd Biesheuvel
The Michael MIC driver uses the cra_alignmask to ensure that pointers presented to its update and finup/final methods are 32-bit aligned. However, due to the way the shash API works, this is no guarantee that the 32-bit reads occurring in the update method are also aligned, as the size of the buffer presented to update may be of uneven length. For instance, an update() of 3 bytes followed by a misaligned update() of 4 or more bytes will result in a misaligned access using an accessor that is not suitable for this. On most architectures, this does not matter, and so setting the cra_alignmask is pointless. On architectures where this does matter, setting the cra_alignmask does not actually solve the problem. So let's get rid of the cra_alignmask, and use unaligned accessors instead, where appropriate. Cc: <stable@vger.kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10hwrng: timeriomem - Fix cooldown period calculationJan Henrik Weinstock
Ensure cooldown period tolerance of 1% is actually accounted for. Fixes: ca3bff70ab32 ("hwrng: timeriomem - Improve performance...") Signed-off-by: Jan Henrik Weinstock <jan.weinstock@rwth-aachen.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10crypto: marvell - CRYPTO_DEV_OCTEONTX2_CPT should depend on ARCH_THUNDER2Geert Uytterhoeven
The Marvell OcteonTX2 CPT physical function PCI device is present only on OcteonTx2 SoC, and not available as an independent PCIe endpoint. Hence add a dependency on ARCH_THUNDER2, to prevent asking the user about this driver when configuring a kernel without OcteonTx2 platform support. Fixes: 5e8ce8334734c5f2 ("crypto: marvell - add Marvell OcteonTX2 CPT PF driver") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux for-next/cryptoHerbert Xu
Pull change from arm64 tree that's needed for crypto arm changes.
2021-02-10KVM: PPC: Don't always report hash MMU capability for P9 < DD2.2Fabiano Rosas
These machines don't support running both MMU types at the same time, so remove the KVM_CAP_PPC_MMU_HASH_V3 capability when the host is using Radix MMU. [paulus@ozlabs.org - added defensive check on kvmppc_hv_ops->hash_v3_possible] Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-09Revert "drm/amd/display: Update NV1x SR latency values"Alex Deucher
This reverts commit 4a3dea8932d3b1199680d2056dd91d31d94d70b7. This causes blank screens for some users. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1482 Cc: Alvin Lee <alvin.lee2@amd.com> Cc: Jun Lei <Jun.Lei@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-02-10KVM: PPC: Book3S HV: Save and restore FSCR in the P9 pathFabiano Rosas
The Facility Status and Control Register is a privileged SPR that defines the availability of some features in problem state. Since it can be written by the guest, we must restore it to the previous host value after guest exit. This restoration is currently done by taking the value from current->thread.fscr, which in the P9 path is not enough anymore because the guest could context switch the QEMU thread, causing the guest-current value to be saved into the thread struct. The above situation manifested when running a QEMU linked against a libc with System Call Vectored support, which causes scv instructions to be run by QEMU early during the guest boot (during SLOF), at which point the FSCR is 0 due to guest entry. After a few scv calls (1 to a couple hundred), the context switching happens and the QEMU thread runs with the guest value, resulting in a Facility Unavailable interrupt. This patch saves and restores the host value of FSCR in the inner guest entry loop in a way independent of current->thread.fscr. The old way of doing it is still kept in place because it works for the old entry path. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10KVM: PPC: remove unneeded semicolonYang Li
Eliminate the following coccicheck warning: ./arch/powerpc/kvm/booke.c:701:2-3: Unneeded semicolon Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10KVM: PPC: Book3S HV: Use POWER9 SLBIA IH=6 variant to clear SLBNicholas Piggin
IH=6 may preserve hypervisor real-mode ERAT entries and is the recommended SLBIA hint for switching partitions. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10KVM: PPC: Book3S HV: No need to clear radix host SLB before loading HPT guestNicholas Piggin
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10KVM: PPC: Book3S HV: Fix radix guest SLB side channelNicholas Piggin
The slbmte instruction is legal in radix mode, including radix guest mode. This means radix guests can load the SLB with arbitrary data. KVM host does not clear the SLB when exiting a guest if it was a radix guest, which would allow a rogue radix guest to use the SLB as a side channel to communicate with other guests. Fix this by ensuring the SLB is cleared when coming out of a radix guest. Only the first 4 entries are a concern, because radix guests always run with LPCR[UPRT]=1, which limits the reach of slbmte. slbia is not used (except in a non-performance-critical path) because it can clear cached translations. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host ↵Nicholas Piggin
without mixed mode support This reverts much of commit c01015091a770 ("KVM: PPC: Book3S HV: Run HPT guests on POWER9 radix hosts"), which was required to run HPT guests on RPT hosts on early POWER9 CPUs without support for "mixed mode", which meant the host could not run with MMU on while guests were running. This code has some corner case bugs, e.g., when the guest hits a machine check or HMI the primary locks up waiting for secondaries to switch LPCR to host, which they never do. This could all be fixed in software, but most CPUs in production have mixed mode support, and those that don't are believed to be all in installations that don't use this capability. So simplify things and remove support. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10KVM: PPC: Book3S HV: Introduce new capability for 2nd DAWRRavi Bangoria
Introduce KVM_CAP_PPC_DAWR1 which can be used by QEMU to query whether KVM supports 2nd DAWR or not. The capability is by default disabled even when the underlying CPU supports 2nd DAWR. QEMU needs to check and enable it manually to use the feature. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10KVM: PPC: Book3S HV: Add infrastructure to support 2nd DAWRRavi Bangoria
KVM code assumes single DAWR everywhere. Add code to support 2nd DAWR. DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/ unset it. Introduce new case H_SET_MODE_RESOURCE_SET_DAWR1 for 2nd DAWR. Also, KVM will support 2nd DAWR only if CPU_FTR_DAWR1 is set. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10KVM: PPC: Book3S HV: Rename current DAWR macros and variablesRavi Bangoria
Power10 is introducing a second DAWR (Data Address Watchpoint Register). Use real register names (with suffix 0) from ISA for current macros and variables used by kvm. One exception is KVM_REG_PPC_DAWR. Keep it as it is because it's uapi so changing it will break userspace. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10KVM: PPC: Book3S HV: Allow nested guest creation when L0 hv_guest_state > L1Ravi Bangoria
On powerpc, L1 hypervisor takes help of L0 using H_ENTER_NESTED hcall to load L2 guest state in cpu. L1 hypervisor prepares the L2 state in struct hv_guest_state and passes a pointer to it via hcall. Using that pointer, L0 reads/writes that state directly from/to L1 memory. Thus L0 must be aware of hv_guest_state layout of L1. Currently it uses version field to achieve this. i.e. If L0 hv_guest_state.version != L1 hv_guest_state.version, L0 won't allow nested kvm guest. This restriction can be loosened up a bit. L0 can be taught to understand older layout of hv_guest_state, if we restrict the new members to be added only at the end, i.e. we can allow nested guest even when L0 hv_guest_state.version > L1 hv_guest_state.version. Though, the other way around is not possible. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2021-02-10 The following pull-request contains BPF updates for your *net* tree. We've added 5 non-merge commits during the last 8 day(s) which contain a total of 3 files changed, 22 insertions(+), 21 deletions(-). The main changes are: 1) Fix missed execution of kprobes BPF progs when kprobe is firing via int3, from Alexei Starovoitov. 2) Fix potential integer overflow in map max_entries for stackmap on 32 bit archs, from Bui Quang Minh. 3) Fix a verifier pruning and a insn rewrite issue related to 32 bit ops, from Daniel Borkmann. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> c# Please enter a commit message to explain why this merge is necessary,
2021-02-09cifs: do not disable noperm if multiuser mount option is not providedRonnie Sahlberg
Fixes small regression in implementation of new mount API. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reported-by: Hyunchul Lee <hyc.lee@gmail.com> Tested-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-02-09io_uring: make op handlers always take issue flagsPavel Begunkov
Make opcode handler interfaces a bit more consistent by always passing in issue flags. Bulky but pretty easy and mechanical change. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-09io_uring: replace force_nonblock with flagsPavel Begunkov
Replace bool force_nonblock with flags. It has a long standing goal of differentiating context from which we execute. Currently we have some subtle places where some invariants, like holding of uring_lock, are subtly inferred. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-09Revert "mm: memcontrol: avoid workload stalls when lowering memory.high"Johannes Weiner
This reverts commit 536d3bf261a2fc3b05b3e91e7eef7383443015cf, as it can cause writers to memory.high to get stuck in the kernel forever, performing page reclaim and consuming excessive amounts of CPU cycles. Before the patch, a write to memory.high would first put the new limit in place for the workload, and then reclaim the requested delta. After the patch, the kernel tries to reclaim the delta before putting the new limit into place, in order to not overwhelm the workload with a sudden, large excess over the limit. However, if reclaim is actively racing with new allocations from the uncurbed workload, it can keep the write() working inside the kernel indefinitely. This is causing problems in Facebook production. A privileged system-level daemon that adjusts memory.high for various workloads running on a host can get unexpectedly stuck in the kernel and essentially turn into a sort of involuntary kswapd for one of the workloads. We've observed that daemon busy-spin in a write() for minutes at a time, neglecting its other duties on the system, and expending privileged system resources on behalf of a workload. To remedy this, we have first considered changing the reclaim logic to break out after a couple of loops - whether the workload has converged to the new limit or not - and bound the write() call this way. However, the root cause that inspired the sequence change in the first place has been fixed through other means, and so a revert back to the proven limit-setting sequence, also used by memory.max, is preferable. The sequence was changed to avoid extreme latencies in the workload when the limit was lowered: the sudden, large excess created by the limit lowering would erroneously trigger the penalty sleeping code that is meant to throttle excessive growth from below. Allocating threads could end up sleeping long after the write() had already reclaimed the delta for which they were being punished. However, erroneous throttling also caused problems in other scenarios at around the same time. This resulted in commit b3ff92916af3 ("mm, memcg: reclaim more aggressively before high allocator throttling"), included in the same release as the offending commit. When allocating threads now encounter large excess caused by a racing write() to memory.high, instead of entering punitive sleeps, they will simply be tasked with helping reclaim down the excess, and will be held no longer than it takes to accomplish that. This is in line with regular limit enforcement - i.e. if the workload allocates up against or over an otherwise unchanged limit from below. With the patch breaking userspace, and the root cause addressed by other means already, revert it again. Link: https://lkml.kernel.org/r/20210122184341.292461-1-hannes@cmpxchg.org Fixes: 536d3bf261a2 ("mm: memcontrol: avoid workload stalls when lowering memory.high") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Tejun Heo <tj@kernel.org> Acked-by: Chris Down <chris@chrisdown.name> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: <stable@vger.kernel.org> [5.8+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09MAINTAINERS: update Andrey Ryabinin's email addressAndrey Ryabinin
Update my email, @virtuozzo.com will stop working shortly. Link: https://lkml.kernel.org/r/20210204223904.3824-1-ryabinin.a.a@gmail.com Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09selftests/vm: rename file run_vmtests to run_vmtests.shRong Chen
Commit c2aa8afc36fa has renamed run_vmtests in Makefile, but the file still uses the old name. The kernel test robot reported the following issue: # selftests: vm: run_vmtests.sh # Warning: file run_vmtests.sh is missing! not ok 1 selftests: vm: run_vmtests.sh Link: https://lkml.kernel.org/r/20210205085507.1479894-1-rong.a.chen@intel.com Fixes: c2aa8afc36fa (selftests/vm: rename run_vmtests --> run_vmtests.sh) Signed-off-by: Rong Chen <rong.a.chen@intel.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09tmpfs: disallow CONFIG_TMPFS_INODE64 on alphaSeth Forshee
As with s390, alpha is a 64-bit architecture with a 32-bit ino_t. With CONFIG_TMPFS_INODE64=y tmpfs mounts will get 64-bit inode numbers and display "inode64" in the mount options, whereas passing "inode64" in the mount options will fail. This leads to erroneous behaviours such as this: # mkdir mnt # mount -t tmpfs nodev mnt # mount -o remount,rw mnt mount: /home/ubuntu/mnt: mount point not mounted or bad option. Prevent CONFIG_TMPFS_INODE64 from being selected on alpha. Link: https://lkml.kernel.org/r/20210208215726.608197-1-seth.forshee@canonical.com Fixes: ea3271f7196c ("tmpfs: support 64-bit inums per-sb") Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Chris Down <chris@chrisdown.name> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: <stable@vger.kernel.org> [5.9+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>