Age | Commit message (Collapse) | Author |
|
If the PM closes a fully established MPJ subflow or the subflow
creation errors out in it's early stage the subflows counter is
not bumped accordingly.
This change adds the missing accounting, additionally taking care
of updating accordingly the 'accept_subflow' flag.
Fixes: a88c9e496937 ("mptcp: do not block subflows creation on errors")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove napi_weight statics which are set to 64 and never modified,
remnants of the out-of-tree napi_weight module param.
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/20220512205603.1536771-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
protected_* files have 600 permissions which prevents non-superuser from
reading them.
Container like "AWS greengrass" refuse to launch unless
protected_hardlinks and protected_symlinks are set. When containers like
these run with "userns-remap" or "--user" mapping container's root to
non-superuser on host, they fail to run due to denied read access to these
files.
As these protections are hardly a secret, and do not possess any security
risk, making them world readable.
Though above greengrass usecase needs read access to only
protected_hardlinks and protected_symlinks files, setting all other
protected_* files to 644 to keep consistency.
Link: http://lkml.kernel.org/r/20200709235115.56954-1-jpitti@cisco.com
Fixes: 800179c9b8a1 ("fs: add link restrictions")
Signed-off-by: Julius Hemanth Pitti <jpitti@cisco.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
At present, pages not in the target zone are added to cc->migratepages
list in isolate_migratepages_block(). As a result, pages may migrate
between nodes unintentionally.
This would be a serious problem for older kernels without commit
a984226f457f849e ("mm: memcontrol: remove the pgdata parameter of
mem_cgroup_page_lruvec"), because it can corrupt the lru list by
handling pages in list without holding proper lru_lock.
Avoid returning a pfn outside the target zone in the case that it is
not aligned with a pageblock boundary. Otherwise
isolate_migratepages_block() will handle pages not in the target zone.
Link: https://lkml.kernel.org/r/20220511044300.4069-1-yamamoto.rei@jp.fujitsu.com
Fixes: 70b44595eafe ("mm, compaction: use free lists to quickly locate a migration source")
Signed-off-by: Rei Yamamoto <yamamoto.rei@jp.fujitsu.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Wonhyuk Yang <vvghjk1234@gmail.com>
Cc: Rei Yamamoto <yamamoto.rei@jp.fujitsu.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Fix the warning - "Enum value 'NR_DAMON_OPS' not described in enum
'damon_ops_id'" generated by the command "make pdfdocs"
Link: https://lkml.kernel.org/r/20220508073316.141401-1-gautammenghani201@gmail.com
Signed-off-by: Gautam Menghani <gautammenghani201@gmail.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We run a lot of automated tests when building our software and run into
OOM scenarios when the tests run unbounded. v1 memcg exports
memcg->watermark as "memory.max_usage_in_bytes" in sysfs. We use this
metric to heuristically limit the number of tests that can run in parallel
based on per test historical data.
This metric is currently not exported for v2 memcg and there is no other
easy way of getting this information. getrusage() syscall returns
"ru_maxrss" which can be used as an approximation but that's the max RSS
of a single child process across all children instead of the aggregated
max for all child processes. The only work around is to periodically poll
"memory.current" but that's not practical for short-lived one-off cgroups.
Hence, expose memcg->watermark as "memory.peak" for v2 memcg.
Link: https://lkml.kernel.org/r/20220507050916.GA13577@us192.sjc.aristanetworks.com
Signed-off-by: Ganesan Rajagopal <rganesan@arista.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We must add hugetlb_free_vmemmap=on (or "off") to the boot cmdline and
reboot the server to enable or disable the feature of optimizing vmemmap
pages associated with HugeTLB pages. However, rebooting usually takes a
long time. So add a sysctl to enable or disable the feature at runtime
without rebooting. Why we need this? There are 3 use cases.
1) The feature of minimizing overhead of struct page associated with
each HugeTLB is disabled by default without passing
"hugetlb_free_vmemmap=on" to the boot cmdline. When we (ByteDance)
deliver the servers to the users who want to enable this feature, they
have to configure the grub (change boot cmdline) and reboot the
servers, whereas rebooting usually takes a long time (we have thousands
of servers). It's a very bad experience for the users. So we need a
approach to enable this feature after rebooting. This is a use case in
our practical environment.
2) Some use cases are that HugeTLB pages are allocated 'on the fly'
instead of being pulled from the HugeTLB pool, those workloads would be
affected with this feature enabled. Those workloads could be
identified by the characteristics of they never explicitly allocating
huge pages with 'nr_hugepages' but only set 'nr_overcommit_hugepages'
and then let the pages be allocated from the buddy allocator at fault
time. We can confirm it is a real use case from the commit
099730d67417. For those workloads, the page fault time could be ~2x
slower than before. We suspect those users want to disable this
feature if the system has enabled this before and they don't think the
memory savings benefit is enough to make up for the performance drop.
3) If the workload which wants vmemmap pages to be optimized and the
workload which wants to set 'nr_overcommit_hugepages' and does not want
the extera overhead at fault time when the overcommitted pages be
allocated from the buddy allocator are deployed in the same server.
The user could enable this feature and set 'nr_hugepages' and
'nr_overcommit_hugepages', then disable the feature. In this case, the
overcommited HugeTLB pages will not encounter the extra overhead at
fault time.
Link: https://lkml.kernel.org/r/20220512041142.39501-5-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Use kstrtobool rather than open coding "on" and "off" parsing in
mm/hugetlb_vmemmap.c, which is more powerful to handle all kinds of
parameters like 'Yy1Nn0' or [oO][NnFf] for "on" and "off".
Link: https://lkml.kernel.org/r/20220512041142.39501-4-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Optimizing HugeTLB vmemmap pages is not compatible with allocating memmap
on hot added memory. If "hugetlb_free_vmemmap=on" and
memory_hotplug.memmap_on_memory" are both passed on the kernel command
line, optimizing hugetlb pages takes precedence. However, the global
variable memmap_on_memory will still be set to 1, even though we will not
try to allocate memmap on hot added memory.
Also introduce mhp_memmap_on_memory() helper to move the definition of
"memmap_on_memory" to the scope of CONFIG_MHP_MEMMAP_ON_MEMORY. In the
next patch, mhp_memmap_on_memory() will also be exported to be used in
hugetlb_vmemmap.c.
Link: https://lkml.kernel.org/r/20220512041142.39501-3-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
crosses page boundaries
Patch series "add hugetlb_optimize_vmemmap sysctl", v11.
This series aims to add hugetlb_optimize_vmemmap sysctl to enable or
disable the feature of optimizing vmemmap pages associated with HugeTLB
pages.
This patch (of 4):
If the size of "struct page" is not the power of two but with the feature
of minimizing overhead of struct page associated with each HugeTLB is
enabled, then the vmemmap pages of HugeTLB will be corrupted after
remapping (panic is about to happen in theory). But this only exists when
!CONFIG_MEMCG && !CONFIG_SLUB on x86_64. However, it is not a
conventional configuration nowadays. So it is not a real word issue, just
the result of a code review.
But we cannot prevent anyone from configuring that combined configure.
This hugetlb_optimize_vmemmap should be disable in this case to fix this
issue.
Link: https://lkml.kernel.org/r/20220512041142.39501-1-songmuchun@bytedance.com
Link: https://lkml.kernel.org/r/20220512041142.39501-2-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb: 2M and
1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size specified.
When unmapping a hugetlb page, we will get the relevant page table entry
by huge_pte_offset() only once to nuke it. This is correct for PMD or PUD
size hugetlb, since they always contain only one pmd entry or pud entry in
the page table.
However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, since
they can contain several continuous pte or pmd entry with same page table
attributes, so we will nuke only one pte or pmd entry for this
CONT-PTE/PMD size hugetlb page.
And now try_to_unmap() is only passed a hugetlb page in the case where the
hugetlb page is poisoned. Which means now we will unmap only one pte
entry for a CONT-PTE or CONT-PMD size poisoned hugetlb page, and we can
still access other subpages of a CONT-PTE or CONT-PMD size poisoned
hugetlb page, which will cause serious issues possibly.
So we should change to use huge_ptep_clear_flush() to nuke the hugetlb
page table to fix this issue, which already considered CONT-PTE and
CONT-PMD size hugetlb.
We've already used set_huge_swap_pte_at() to set a poisoned swap entry for
a poisoned hugetlb page. Meanwhile adding a VM_BUG_ON() to make sure the
passed hugetlb page is poisoned in try_to_unmap().
Link: https://lkml.kernel.org/r/0a2e547238cad5bc153a85c3e9658cb9d55f9cac.1652270205.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/730ea4b6d292f32fb10b7a4e87dad49b0eb30474.1652147571.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb: 2M and
1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size specified.
When migrating a hugetlb page, we will get the relevant page table entry
by huge_pte_offset() only once to nuke it and remap it with a migration
pte entry. This is correct for PMD or PUD size hugetlb, since they always
contain only one pmd entry or pud entry in the page table.
However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, since
they can contain several continuous pte or pmd entry with same page table
attributes. So we will nuke or remap only one pte or pmd entry for this
CONT-PTE/PMD size hugetlb page, which is not expected for hugetlb
migration. The problem is we can still continue to modify the subpages'
data of a hugetlb page during migrating a hugetlb page, which can cause a
serious data consistent issue, since we did not nuke the page table entry
and set a migration pte for the subpages of a hugetlb page.
To fix this issue, we should change to use huge_ptep_clear_flush() to nuke
a hugetlb page table, and remap it with set_huge_pte_at() and
set_huge_swap_pte_at() when migrating a hugetlb page, which already
considered the CONT-PTE or CONT-PMD size hugetlb.
[akpm@linux-foundation.org: fix nommu build]
[baolin.wang@linux.alibaba.com: fix build errors for !CONFIG_MMU]
Link: https://lkml.kernel.org/r/a4baca670aca637e7198d9ae4543b8873cb224dc.1652270205.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/ea5abf529f0997b5430961012bfda6166c1efc8c.1652147571.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating", v4.
presently, migrating a hugetlb page or unmapping a poisoned hugetlb page,
we'll use ptep_clear_flush() and set_pte_at() to nuke the page table entry
and remap it, and this is incorrect for CONT-PTE or CONT-PMD size hugetlb
page, which will cause potential data consistent issue. This patch set
will change to use hugetlb related APIs to fix this issue.
Note: Mike pointed out the huge_ptep_get() will only return the one
specific value, and it would not take into account the dirty or young bits
of CONT-PTE/PMDs like the huge_ptep_get_and_clear() [1]. This
inconsistent issue is not introduced by this patch set, and this issue
will be addressed in another thread [2]. Meanwhile the uffd for hugetlb
case [3] pointed out by Gerald also needs another patch to address.
[1] https://lore.kernel.org/linux-mm/85bd80b4-b4fd-0d3f-a2e5-149559f2f387@oracle.com/
[2] https://lore.kernel.org/all/cover.1651998586.git.baolin.wang@linux.alibaba.com/
[3] https://lore.kernel.org/linux-mm/20220503120343.6264e126@thinkpad/
This patch (of 3):
It is incorrect to use ptep_clear_flush() to nuke a hugetlb page table
when unmapping or migrating a hugetlb page, and will change to use
huge_ptep_clear_flush() instead in the following patches.
So this is a preparation patch, which changes the huge_ptep_clear_flush()
to return the original pte to help to nuke a hugetlb page table.
[baolin.wang@linux.alibaba.com: fix build in several more architectures]
Link: https://lkml.kernel.org/r/0009a4cd-2826-e8be-e671-f050d4f18d5d@linux.alibaba.com
[sfr@canb.auug.org.au: fixup]
Link: https://lkml.kernel.org/r/20220511181531.7f27a5c1@canb.auug.org.au
Link: https://lkml.kernel.org/r/cover.1652270205.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/20f77ddab90baa249bd24504c413189b82acde69.1652270205.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/cover.1652147571.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/dcf065868cce35bceaf138613ad27f17bb7c0c19.1652147571.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Cc: Rich Felker <dalias@libc.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Extend and rework the "Temporary Virtual Mappings" section of the
highmem.rst documentation.
Despite the local kmaps were introduced by Thomas Gleixner in October
2020, documentation was still missing information about them. These
additions rely largely on Gleixner's patches, Jonathan Corbet's LWN
articles, comments by Ira Weiny and Matthew Wilcox, and in-code comments
from ./include/linux/highmem.h.
1) Add a paragraph to document kmap_local_page().
2) Reorder the list of functions by decreasing order of preference
of use.
3) Rework part of the kmap() entry in list.
Link: https://lkml.kernel.org/r/20220428212455.892-5-fmdefrancesco@gmail.com
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Suggested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The use of kmap_atomic() is new code is being deprecated in favor of
kmap_local_page(). For this reason the "Using kmap_atomic" section in
highmem.rst is obsolete and unnecessary, but it can still help developers
if it were moved to kdocs in highmem.h.
Therefore, move the relevant parts of this section from highmem.rst and
merge them with the kdocs in highmem.h.
Link: https://lkml.kernel.org/r/20220428212455.892-4-fmdefrancesco@gmail.com
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Suggested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
kernel-docs that are in include/linux/highmem.h and in
include/linux/highmem-internal.h should be included in highmem.rst.
Use kdocs directives to include the above-mentioned comments into
highmem.rst.
Link: https://lkml.kernel.org/r/20220428212455.892-3-fmdefrancesco@gmail.com
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Suggested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "Extend and reorganize Highmem's documentation", v4.
This series has the purpose to extend and reorganize Highmem's
documentation.
This is a work in progress because some information should still be moved
from highmem.rst to highmem.h and highmem-internal.h. Specifically I'm
talking about moving the "how to" information to the relevant headers, as
it as been suggested by Ira Weiny (Intel).
Also, this is a work in progress because some kdocs in highmem.h and
highmem-internal.h should be improved.
This patch (of 4):
`scripts/kernel-doc -v -none include/linux/highmem*` reports the following
warnings:
include/linux/highmem.h:160: warning: expecting prototype for kunmap_atomic(). Prototype was for nr_free_highpages() instead
include/linux/highmem.h:204: warning: No description found for return value of 'alloc_zeroed_user_highpage_movable'
include/linux/highmem-internal.h:256: warning: Function parameter or member '__addr' not described in 'kunmap_atomic'
include/linux/highmem-internal.h:256: warning: Excess function parameter 'addr' description in 'kunmap_atomic'
Fix these warnings by (1) moving the kernel-doc comments from highmem.h to
highmem-internal.h (which is the file were the kunmap_atomic() macro is
actually defined), (2) extending and merging it with the comment which was
already in highmem-internal.h, and (3) using correct parameter names (4)
correcting a few technical inaccuracies in comments, and (5) adding a
deprecation notice in kunmap_atomic() for consistency with kmap_atomic().
Link: https://lkml.kernel.org/r/20220428212455.892-1-fmdefrancesco@gmail.com
Link: https://lkml.kernel.org/r/20220428212455.892-2-fmdefrancesco@gmail.com
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Pull more drm fixes from Dave Airlie:
"Turns out I was right, some fixes hadn't made it to me yet. The vmwgfx
ones also popped up later, but all seem like bad enough things to fix.
The dma-buf, vc4 and nouveau ones are all pretty small.
The fbdev fixes are a bit more complicated: a fix to cleanup fbdev
devices properly, uncovered some use-after-free bugs in existing
drivers. Then the fix for those bugs wasn't correct. This reverts that
fix, and puts the proper fixes in place in the drivers to avoid the
use-after-frees.
This has had a fair number of eyes on it at this stage, and I'm
confident enough that it puts things in the right place, and is less
dangerous than reverting our way out of the initial change at this
stage.
fbdev:
- revert NULL deref fix that turned into a use-after-free
- prevent use-after-free in fbdev
- efifb/simplefb/vesafb: fix cleanup paths to avoid use-after-frees
dma-buf:
- fix panic in stats setup
vc4:
- fix hdmi build
nouveau:
- tegra iommu present fix
- fix leak in backlight name
vmwgfx:
- Black screen due to fences using FIFO checks on SVGA3
- Random black screens on boot due to uninitialized drm_mode_fb_cmd2
- Hangs on SVGA3 due to command buffers being used with gbobjects"
* tag 'drm-fixes-2022-05-14' of git://anongit.freedesktop.org/drm/drm:
drm/vmwgfx: Disable command buffers on svga3 without gbobjects
drm/vmwgfx: Initialize drm_mode_fb_cmd2
drm/vmwgfx: Fix fencing on SVGAv3
drm/vc4: hdmi: Fix build error for implicit function declaration
dma-buf: call dma_buf_stats_setup after dmabuf is in valid list
fbdev: efifb: Fix a use-after-free due early fb_info cleanup
drm/nouveau: Fix a potential theorical leak in nouveau_get_backlight_name()
drm/nouveau/tegra: Stop using iommu_present()
fbdev: vesafb: Cleanup fb_info in .fb_destroy rather than .remove
fbdev: efifb: Cleanup fb_info in .fb_destroy rather than .remove
fbdev: simplefb: Cleanup fb_info in .fb_destroy rather than .remove
fbdev: Prevent possible use-after-free in fb_release()
Revert "fbdev: Make fb_release() return -ENODEV if fbdev was unregistered"
|
|
The np variable is used globally for stm32_pctl_probe() and in one of
its code branches. cppcheck is not happy with that:
pinctrl-stm32.c:1530:23: warning: Local variable 'np' shadows outer variable [shadowVariable]
Instead of simply renaming one of the variables convert some code to
use a device pointer directly.
Fixes: bb949ed9b16b ("pinctrl: stm32: Switch to use for_each_gpiochip_node() helper")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Fabien Dessenne <fabien.dessenne@foss.st.com>
Link: https://lore.kernel.org/r/20220507102257.26414-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Currently, dumping almost all BTFs specified by id requires
using the -B option to pass the base BTF. For kernel module
BTFs the vmlinux BTF sysfs path should work.
This patch simplifies dumping by ID usage by loading
vmlinux BTF from sysfs as base, if base BTF was not specified
and the ID corresponds to a kernel module BTF.
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Link: https://lore.kernel.org/bpf/20220513121743.12411-1-larysa.zaremba@intel.com
|
|
The manual describes function 0x6 of pin PA2 as "SPI1_CLK", so change
the comment to reflect that.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://lore.kernel.org/r/20220504170736.2669595-1-andre.przywara@arm.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Change suniv f1c100s pinctrl,PD14 multiplexing function lvds1 to uart2
When the pin PD13 and PD14 is setting up to uart2 function in dts,
there's an error occurred:
1c20800.pinctrl: unsupported function uart2 on pin PD14
Because 'uart2' is not any one multiplexing option of PD14,
and pinctrl don't know how to configure it.
So change the pin PD14 lvds1 function to uart2.
Signed-off-by: IotaHydrae <writeforever@foxmail.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Link: https://lore.kernel.org/r/tencent_70C1308DDA794C81CAEF389049055BACEC09@qq.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Before commit 322cff70d46c the fifo_time member of requests on a dispatch
list was not used. Commit 322cff70d46c introduces code that reads the
fifo_time member of requests on dispatch lists. Hence this patch that sets
the fifo_time member when adding a request to a dispatch list.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Fixes: 322cff70d46c ("block/mq-deadline: Prioritize high-priority requests")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220513171307.32564-1-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers into devel
pinctrl: renesas: Updates for v5.19 (take two)
- Reserved field optimizations,
- Miscellaneous fixes and improvements.
|
|
Instead of having uninitialized versions of arguments as separate
bpf_arg_types (eg ARG_PTR_TO_UNINIT_MEM as the uninitialized version
of ARG_PTR_TO_MEM), we can instead use MEM_UNINIT as a bpf_type_flag
modifier to denote that the argument is uninitialized.
Doing so cleans up some of the logic in the verifier. We no longer
need to do two checks against an argument type (eg "if
(base_type(arg_type) == ARG_PTR_TO_MEM || base_type(arg_type) ==
ARG_PTR_TO_UNINIT_MEM)"), since uninitialized and initialized
versions of the same argument type will now share the same base type.
In the near future, MEM_UNINIT will be used by dynptr helper functions
as well.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20220509224257.3222614-2-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Multiple fixes to fbdev to address a regression at unregistration, an
iommu detection improvement for nouveau, a memory leak fix for nouveau,
pointer dereference fix for dma_buf_file_release(), and a build breakage
fix for vc4
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220513073044.ymayac7x7bzatrt7@houat
|
|
https://gitlab.freedesktop.org/zack/vmwgfx into drm-fixes
vmwgfx fixes for:
- Black screen due to fences using FIFO checks on SVGA3
- Random black screens on boot due to uninitialized drm_mode_fb_cmd2
- Hangs on SVGA3 due to command buffers being used with gbobjects
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Zack Rusin <zackr@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/a1d32799e4c74b8540216376d7576bb783ca07ba.camel@vmware.com
|
|
The asynchronous zspage free worker tries to lock a zspage's entire page
list without defending against page migration. Since pages which haven't
yet been locked can concurrently migrate off the zspage page list while
lock_zspage() churns away, lock_zspage() can suffer from a few different
lethal races.
It can lock a page which no longer belongs to the zspage and unsafely
dereference page_private(), it can unsafely dereference a torn pointer to
the next page (since there's a data race), and it can observe a spurious
NULL pointer to the next page and thus not lock all of the zspage's pages
(since a single page migration will reconstruct the entire page list, and
create_page_chain() unconditionally zeroes out each list pointer in the
process).
Fix the races by using migrate_read_lock() in lock_zspage() to synchronize
with page migration.
Link: https://lkml.kernel.org/r/20220509024703.243847-1-sultan@kerneltoast.com
Fixes: 77ff465799c602 ("zsmalloc: zs_page_migrate: skip unnecessary loops but not return -EBUSY if zspage is not inuse")
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This reverts commit a4efc174b382fcdb which introduced a regression issue
that when there're multiple processes allocating dma memory in parallel by
calling dma_alloc_coherent(), it may fail sometimes as follows:
Error log:
cma: cma_alloc: linux,cma: alloc failed, req-size: 148 pages, ret: -16
cma: number of available pages:
3@125+20@172+12@236+4@380+32@736+17@2287+23@2473+20@36076+99@40477+108@40852+44@41108+20@41196+108@41364+108@41620+
108@42900+108@43156+483@44061+1763@45341+1440@47712+20@49324+20@49388+5076@49452+2304@55040+35@58141+20@58220+20@58284+
7188@58348+84@66220+7276@66452+227@74525+6371@75549=> 33161 free of 81920 total pages
When issue happened, we saw there were still 33161 pages (129M) free CMA
memory and a lot available free slots for 148 pages in CMA bitmap that we
want to allocate.
When dumping memory info, we found that there was also ~342M normal
memory, but only 1352K CMA memory left in buddy system while a lot of
pageblocks were isolated.
Memory info log:
Normal free:351096kB min:30000kB low:37500kB high:45000kB reserved_highatomic:0KB
active_anon:98060kB inactive_anon:98948kB active_file:60864kB inactive_file:31776kB
unevictable:0kB writepending:0kB present:1048576kB managed:1018328kB mlocked:0kB
bounce:0kB free_pcp:220kB local_pcp:192kB free_cma:1352kB lowmem_reserve[]: 0 0 0
Normal: 78*4kB (UECI) 1772*8kB (UMECI) 1335*16kB (UMECI) 360*32kB (UMECI) 65*64kB (UMCI)
36*128kB (UMECI) 16*256kB (UMCI) 6*512kB (EI) 8*1024kB (UEI) 4*2048kB (MI) 8*4096kB (EI)
8*8192kB (UI) 3*16384kB (EI) 8*32768kB (M) = 489288kB
The root cause of this issue is that since commit a4efc174b382 ("mm/cma.c:
remove redundant cma_mutex lock"), CMA supports concurrent memory
allocation. It's possible that the memory range process A trying to alloc
has already been isolated by the allocation of process B during memory
migration.
The problem here is that the memory range isolated during one allocation
by start_isolate_page_range() could be much bigger than the real size we
want to alloc due to the range is aligned to MAX_ORDER_NR_PAGES.
Taking an ARMv7 platform with 1G memory as an example, when
MAX_ORDER_NR_PAGES is big (e.g. 32M with max_order 14) and CMA memory is
relatively small (e.g. 128M), there're only 4 MAX_ORDER slot, then it's
very easy that all CMA memory may have already been isolated by other
processes when one trying to allocate memory using dma_alloc_coherent().
Since current CMA code will only scan one time of whole available CMA
memory, then dma_alloc_coherent() may easy fail due to contention with
other processes.
This patch simply falls back to the original method that using cma_mutex
to make alloc_contig_range() run sequentially to avoid the issue.
Link: https://lkml.kernel.org/r/20220509094551.3596244-1-aisheng.dong@nxp.com
Link: https://lore.kernel.org/all/20220315144521.3810298-2-aisheng.dong@nxp.com/
Fixes: a4efc174b382 ("mm/cma.c: remove redundant cma_mutex lock")
Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org> [5.11+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Before, the first 64 bytes of input, regardless of how entropic it was,
would be used to mutate the crng base key directly, and none of those
bytes would be credited as having entropy. Then 256 bits of credited
input would be accumulated, and only then would the rng transition from
the earlier "fast init" phase into being actually initialized.
The thinking was that by mixing and matching fast init and real init, an
attacker who compromised the fast init state, considered easy to do
given how little entropy might be in those first 64 bytes, would then be
able to bruteforce bits from the actual initialization. By keeping these
separate, bruteforcing became impossible.
However, by not crediting potentially creditable bits from those first 64
bytes of input, we delay initialization, and actually make the problem
worse, because it means the user is drawing worse random numbers for a
longer period of time.
Instead, we can take the first 128 bits as fast init, and allow them to
be credited, and then hold off on the next 128 bits until they've
accumulated. This is still a wide enough margin to prevent bruteforcing
the rng state, while still initializing much faster.
Then, rather than trying to piecemeal inject into the base crng key at
various points, instead just extract from the pool when we need it, for
the crng_init==0 phase. Performance may even be better for the various
inputs here, since there are likely more calls to mix_pool_bytes() then
there are to get_random_bytes() during this phase of system execution.
Since the preinit injection code is gone, bootloader randomness can then
do something significantly more straight forward, removing the weird
system_wq hack in hwgenerator randomness.
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
It's too hard to keep the batches synchronized, and pointless anyway,
since in !crng_ready(), we're updating the base_crng key really often,
where batching only hurts. So instead, if the crng isn't ready, just
call into get_random_bytes(). At this stage nothing is performance
critical anyhow.
Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Since the RNG loses freshness with system suspend/hibernation, when we
resume, immediately reseed using whatever data we can, which for this
particular case is the various timestamps regarding system suspend time,
in addition to more generally the RDSEED/RDRAND/RDTSC values that happen
whenever the crng reseeds.
On systems that suspend and resume automatically all the time -- such as
Android -- we skip the reseeding on suspend resumption, since that could
wind up being far too busy. This is the same trade-off made in
WireGuard.
In addition to reseeding upon resumption always mix into the pool these
various stamps on every power notification event.
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Currently, we do the jitter dance if two consecutive reads to the cycle
counter return different values. If they do, then we consider the cycle
counter to be fast enough that one trip through the scheduler will yield
one "bit" of credited entropy. If those two reads return the same value,
then we assume the cycle counter is too slow to show meaningful
differences.
This methodology is flawed for a variety of reasons, one of which Eric
posted a patch to fix in [1]. The issue that patch solves is that on a
system with a slow counter, you might be [un]lucky and read the counter
_just_ before it changes, so that the second cycle counter you read
differs from the first, even though there's usually quite a large period
of time in between the two. For example:
| real time | cycle counter |
| --------- | ------------- |
| 3 | 5 |
| 4 | 5 |
| 5 | 5 |
| 6 | 5 |
| 7 | 5 | <--- a
| 8 | 6 | <--- b
| 9 | 6 | <--- c
If we read the counter at (a) and compare it to (b), we might be fooled
into thinking that it's a fast counter, when in reality it is not. The
solution in [1] is to also compare counter (b) to counter (c), on the
theory that if the counter is _actually_ slow, and (a)!=(b), then
certainly (b)==(c).
This helps solve this particular issue, in one sense, but in another
sense, it mostly functions to disallow jitter entropy on these systems,
rather than simply taking more samples in that case.
Instead, this patch takes a different approach. Right now we assume that
a difference in one set of consecutive samples means one "bit" of
credited entropy per scheduler trip. We can extend this so that a
difference in two sets of consecutive samples means one "bit" of
credited entropy per /two/ scheduler trips, and three for three, and
four for four. In other words, we can increase the amount of jitter
"work" we require for each "bit", depending on how slow the cycle
counter is.
So this patch takes whole bunch of samples, sees how many of them are
different, and divides to find the amount of work required per "bit",
and also requires that at least some minimum of them are different in
order to attempt any jitter entropy.
Note that this approach is still far from perfect. It's not a real
statistical estimate on how much these samples vary; it's not a
real-time analysis of the relevant input data. That remains a project
for another time. However, it makes the same (partly flawed) assumptions
as the code that's there now, so it's probably not worse than the status
quo, and it handles the issue Eric mentioned in [1]. But, again, it's
probably a far cry from whatever a really robust version of this would
be.
[1] https://lore.kernel.org/lkml/20220421233152.58522-1-ebiggers@kernel.org/
https://lore.kernel.org/lkml/20220421192939.250680-1-ebiggers@kernel.org/
Cc: Eric Biggers <ebiggers@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
All platforms are now guaranteed to provide some value for
random_get_entropy(). In case some bug leads to this not being so, we
print a warning, because that indicates that something is really very
wrong (and likely other things are impacted too). This should never be
hit, but it's a good and cheap way of finding out if something ever is
problematic.
Since we now have viable fallback code for random_get_entropy() on all
platforms, which is, in the worst case, not worse than jiffies, we can
count on getting the best possible value out of it. That means there's
no longer a use for using jiffies as entropy input. It also means we no
longer have a reason for doing the round-robin register flow in the IRQ
handler, which was always of fairly dubious value.
Instead we can greatly simplify the IRQ handler inputs and also unify
the construction between 64-bits and 32-bits. We now collect the cycle
counter and the return address, since those are the two things that
matter. Because the return address and the irq number are likely
related, to the extent we mix in the irq number, we can just xor it into
the top unchanging bytes of the return address, rather than the bottom
changing bytes of the cycle counter as before. Then, we can do a fixed 2
rounds of SipHash/HSipHash. Finally, we use the same construction of
hashing only half of the [H]SipHash state on 32-bit and 64-bit. We're
not actually discarding any entropy, since that entropy is carried
through until the next time. And more importantly, it lets us do the
same sponge-like construction everywhere.
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
In the event that random_get_entropy() can't access a cycle counter or
similar, falling back to returning 0 is really not the best we can do.
Instead, at least calling random_get_entropy_fallback() would be
preferable, because that always needs to return _something_, even
falling back to jiffies eventually. It's not as though
random_get_entropy_fallback() is super high precision or guaranteed to
be entropic, but basically anything that's not zero all the time is
better than returning zero all the time.
This is accomplished by just including the asm-generic code like on
other architectures, which means we can get rid of the empty stub
function here.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
In the event that random_get_entropy() can't access a cycle counter or
similar, falling back to returning 0 is really not the best we can do.
Instead, at least calling random_get_entropy_fallback() would be
preferable, because that always needs to return _something_, even
falling back to jiffies eventually. It's not as though
random_get_entropy_fallback() is super high precision or guaranteed to
be entropic, but basically anything that's not zero all the time is
better than returning zero all the time.
This is accomplished by just including the asm-generic code like on
other architectures, which means we can get rid of the empty stub
function here.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
In the event that random_get_entropy() can't access a cycle counter or
similar, falling back to returning 0 is really not the best we can do.
Instead, at least calling random_get_entropy_fallback() would be
preferable, because that always needs to return _something_, even
falling back to jiffies eventually. It's not as though
random_get_entropy_fallback() is super high precision or guaranteed to
be entropic, but basically anything that's not zero all the time is
better than returning zero all the time.
This is accomplished by just including the asm-generic code like on
other architectures, which means we can get rid of the empty stub
function here.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
In the event that random_get_entropy() can't access a cycle counter or
similar, falling back to returning 0 is suboptimal. Instead, fallback
to calling random_get_entropy_fallback(), which isn't extremely high
precision or guaranteed to be entropic, but is certainly better than
returning zero all the time.
If CONFIG_X86_TSC=n, then it's possible for the kernel to run on systems
without RDTSC, such as 486 and certain 586, so the fallback code is only
required for that case.
As well, fix up both the new function and the get_cycles() function from
which it was derived to use cpu_feature_enabled() rather than
boot_cpu_has(), and use !IS_ENABLED() instead of #ifndef.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
|
|
In the event that random_get_entropy() can't access a cycle counter or
similar, falling back to returning 0 is really not the best we can do.
Instead, at least calling random_get_entropy_fallback() would be
preferable, because that always needs to return _something_, even
falling back to jiffies eventually. It's not as though
random_get_entropy_fallback() is super high precision or guaranteed to
be entropic, but basically anything that's not zero all the time is
better than returning zero all the time.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
In the event that random_get_entropy() can't access a cycle counter or
similar, falling back to returning 0 is really not the best we can do.
Instead, at least calling random_get_entropy_fallback() would be
preferable, because that always needs to return _something_, even
falling back to jiffies eventually. It's not as though
random_get_entropy_fallback() is super high precision or guaranteed to
be entropic, but basically anything that's not zero all the time is
better than returning zero all the time.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
For situations in which we don't have a c0 counter register available,
we've been falling back to reading the c0 "random" register, which is
usually bounded by the amount of TLB entries and changes every other
cycle or so. This means it wraps extremely often. We can do better by
combining this fast-changing counter with a potentially slower-changing
counter from random_get_entropy_fallback() in the more significant bits.
This commit combines the two, taking into account that the changing bits
are in a different bit position depending on the CPU model. In addition,
we previously were falling back to 0 for ancient CPUs that Linux does
not support anyway; remove that dead path entirely.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Tested-by: Maciej W. Rozycki <macro@orcam.me.uk>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
In the event that random_get_entropy() can't access a cycle counter or
similar, falling back to returning 0 is really not the best we can do.
Instead, at least calling random_get_entropy_fallback() would be
preferable, because that always needs to return _something_, even
falling back to jiffies eventually. It's not as though
random_get_entropy_fallback() is super high precision or guaranteed to
be entropic, but basically anything that's not zero all the time is
better than returning zero all the time.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
In the event that random_get_entropy() can't access a cycle counter or
similar, falling back to returning 0 is really not the best we can do.
Instead, at least calling random_get_entropy_fallback() would be
preferable, because that always needs to return _something_, even
falling back to jiffies eventually. It's not as though
random_get_entropy_fallback() is super high precision or guaranteed to
be entropic, but basically anything that's not zero all the time is
better than returning zero all the time.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
The addition of random_get_entropy_fallback() provides access to
whichever time source has the highest frequency, which is useful for
gathering entropy on platforms without available cycle counters. It's
not necessarily as good as being able to quickly access a cycle counter
that the CPU has, but it's still something, even when it falls back to
being jiffies-based.
In the event that a given arch does not define get_cycles(), falling
back to the get_cycles() default implementation that returns 0 is really
not the best we can do. Instead, at least calling
random_get_entropy_fallback() would be preferable, because that always
needs to return _something_, even falling back to jiffies eventually.
It's not as though random_get_entropy_fallback() is super high precision
or guaranteed to be entropic, but basically anything that's not zero all
the time is better than returning zero all the time.
Finally, since random_get_entropy_fallback() is used during extremely
early boot when randomizing freelists in mm_init(), it can be called
before timekeeping has been initialized. In that case there really is
nothing we can do; jiffies hasn't even started ticking yet. So just give
up and return 0.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Theodore Ts'o <tytso@mit.edu>
|
|
In order to measure the boot process, the timer should be switched on as
early in boot as possible. As well, the commit defines the get_cycles
macro, like the previous patches in this series, so that generic code is
aware that it's implemented by the platform, as is done on other archs.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Acked-by: Stafford Horne <shorne@gmail.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
PowerPC defines a get_cycles() function, but it does not do the usual
`#define get_cycles get_cycles` dance, making it impossible for generic
code to see if an arch-specific function was defined. While the
get_cycles() ifdef is not currently used, the following timekeeping
patch in this series will depend on the macro existing (or not existing)
when defining random_get_entropy().
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@ozlabs.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Alpha defines a get_cycles() function, but it does not do the usual
`#define get_cycles get_cycles` dance, making it impossible for generic
code to see if an arch-specific function was defined. While the
get_cycles() ifdef is not currently used, the following timekeeping
patch in this series will depend on the macro existing (or not existing)
when defining random_get_entropy().
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Acked-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
PA-RISC defines a get_cycles() function, but it does not do the usual
`#define get_cycles get_cycles` dance, making it impossible for generic
code to see if an arch-specific function was defined. While the
get_cycles() ifdef is not currently used, the following timekeeping
patch in this series will depend on the macro existing (or not existing)
when defining random_get_entropy().
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
S390x defines a get_cycles() function, but it does not do the usual
`#define get_cycles get_cycles` dance, making it impossible for generic
code to see if an arch-specific function was defined. While the
get_cycles() ifdef is not currently used, the following timekeeping
patch in this series will depend on the macro existing (or not existing)
when defining random_get_entropy().
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Itanium defines a get_cycles() function, but it does not do the usual
`#define get_cycles get_cycles` dance, making it impossible for generic
code to see if an arch-specific function was defined. While the
get_cycles() ifdef is not currently used, the following timekeeping
patch in this series will depend on the macro existing (or not existing)
when defining random_get_entropy().
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|