Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Reduce alignment constraints on STRICT_KERNEL_RWX and speed-up TLB
misses on 8xx and 603
- Replace kretprobe code with rethook and enable fprobe
- Remove the "fast endian switch" syscall
- Handle DLPAR device tree updates in kernel, allowing the deprecation
of the binary /proc/powerpc/ofdt interface
Thanks to Abhishek Dubey, Alex Shi, Benjamin Gray, Christophe Leroy,
Gaosheng Cui, Gautam Menghani, Geert Uytterhoeven, Haren Myneni, Hari
Bathini, Huang Xiaojia, Jinjie Ruan, Madhavan Srinivasan, Miguel Ojeda,
Mina Almasry, Narayana Murty N, Naveen Rao, Rob Herring (Arm), Scott
Cheloha, Segher Boessenkool, Stephen Rothwell, Thomas Zimmermann, Uwe
Kleine-König, Vaibhav Jain, and Zhang Zekun.
* tag 'powerpc-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (59 commits)
powerpc/atomic: Use YZ constraints for DS-form instructions
MAINTAINERS: powerpc: Add Maddy
powerpc: Switch back to struct platform_driver::remove()
powerpc/pseries/eeh: Fix pseries_eeh_err_inject
selftests/powerpc: Allow building without static libc
macintosh/via-pmu: register_pmu_pm_ops() can be __init
powerpc: Stop using no_llseek
powerpc/64s: Remove the "fast endian switch" syscall
powerpc/mm/64s: Restrict THP to Radix or HPT w/64K pages
powerpc/mm/64s: Move THP reqs into a separate symbol
powerpc/64s: Make mmu_hash_ops __ro_after_init
powerpc: Replace kretprobe code with rethook on powerpc
powerpc: pseries: Constify struct kobj_type
powerpc: powernv: Constify struct kobj_type
powerpc: Constify struct kobj_type
powerpc/pseries/dlpar: Add device tree nodes for DLPAR IO add
powerpc/pseries/dlpar: Remove device tree node for DLPAR IO remove
powerpc/pseries: Use correct data types from pseries_hp_errorlog struct
powerpc/vdso: Inconditionally use CFUNC macro
powerpc/32: Implement validation of emergency stack
...
|
|
8xx boards don't have much memory, the two I know have respectively
32Mbytes and 128Mbytes, so there is no point in having 256 Mbytes of
memory for module text.
Reduce it to 32Mbytes for 8xx, that's more than enough.
Nevertheless, make it a configurable value so that it can be customised
if needed.
Also add a build verification for overlap of module execmem space
with user PMD.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/8db23b61e33a0d1913d814f94bfe71ba7ac78b0f.1724173828.git.christophe.leroy@csgroup.eu
|
|
Commit 6b0e82791bd0 ("powerpc/e500: switch to 64 bits PGD on 85xx
(32 bits)") switched PGD entries to 64 bits, but pgd_val() returns
an unsigned long which is 32 bits on PPC32. This is not a problem
for regular PMD entries because the upper part is always NULL, but
when PMD entries are leaf they contain 64 bits values, so pgd_val()
must return an unsigned long long instead of an unsigned long.
Also change the condition to CONFIG_PPC_85xx instead of CONFIG_PPC_E500
as the change was meant for 32 bits only. Allthough this should be
harmless on PPC64, it generates a warning with pgd_ERROR print.
Fixes: 6b0e82791bd0 ("powerpc/e500: switch to 64 bits PGD on 85xx (32 bits)")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/45f8fdf298ec3df7573b66d21b03a5cda92e2cb1.1724313510.git.christophe.leroy@csgroup.eu
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- In the series "mm: Avoid possible overflows in dirty throttling" Jan
Kara addresses a couple of issues in the writeback throttling code.
These fixes are also targetted at -stable kernels.
- Ryusuke Konishi's series "nilfs2: fix potential issues related to
reserved inodes" does that. This should actually be in the
mm-nonmm-stable tree, along with the many other nilfs2 patches. My
bad.
- More folio conversions from Kefeng Wang in the series "mm: convert to
folio_alloc_mpol()"
- Kemeng Shi has sent some cleanups to the writeback code in the series
"Add helper functions to remove repeated code and improve readability
of cgroup writeback"
- Kairui Song has made the swap code a little smaller and a little
faster in the series "mm/swap: clean up and optimize swap cache
index".
- In the series "mm/memory: cleanly support zeropage in
vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David
Hildenbrand has reworked the rather sketchy handling of the use of
the zeropage in MAP_SHARED mappings. I don't see any runtime effects
here - more a cleanup/understandability/maintainablity thing.
- Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling
of higher addresses, for aarch64. The (poorly named) series is
"Restructure va_high_addr_switch".
- The core TLB handling code gets some cleanups and possible slight
optimizations in Bang Li's series "Add update_mmu_tlb_range() to
simplify code".
- Jane Chu has improved the handling of our
fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in
the series "Enhance soft hwpoison handling and injection".
- Jeff Johnson has sent a billion patches everywhere to add
MODULE_DESCRIPTION() to everything. Some landed in this pull.
- In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang
has simplified migration's use of hardware-offload memory copying.
- Yosry Ahmed performs more folio API conversions in his series "mm:
zswap: trivial folio conversions".
- In the series "large folios swap-in: handle refault cases first",
Chuanhua Han inches us forward in the handling of large pages in the
swap code. This is a cleanup and optimization, working toward the end
objective of full support of large folio swapin/out.
- In the series "mm,swap: cleanup VMA based swap readahead window
calculation", Huang Ying has contributed some cleanups and a possible
fixlet to his VMA based swap readahead code.
- In the series "add mTHP support for anonymous shmem" Baolin Wang has
taught anonymous shmem mappings to use multisize THP. By default this
is a no-op - users must opt in vis sysfs controls. Dramatic
improvements in pagefault latency are realized.
- David Hildenbrand has some cleanups to our remaining use of
page_mapcount() in the series "fs/proc: move page_mapcount() to
fs/proc/internal.h".
- David also has some highmem accounting cleanups in the series
"mm/highmem: don't track highmem pages manually".
- Build-time fixes and cleanups from John Hubbard in the series
"cleanups, fixes, and progress towards avoiding "make headers"".
- Cleanups and consolidation of the core pagemap handling from Barry
Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers
and utilize them".
- Lance Yang's series "Reclaim lazyfree THP without splitting" has
reduced the latency of the reclaim of pmd-mapped THPs under fairly
common circumstances. A 10x speedup is seen in a microbenchmark.
It does this by punting to aother CPU but I guess that's a win unless
all CPUs are pegged.
- hugetlb_cgroup cleanups from Xiu Jianfeng in the series
"mm/hugetlb_cgroup: rework on cftypes".
- Miaohe Lin's series "Some cleanups for memory-failure" does just that
thing.
- Someone other than SeongJae has developed a DAMON feature in Honggyu
Kim's series "DAMON based tiered memory management for CXL memory".
This adds DAMON features which may be used to help determine the
efficiency of our placement of CXL/PCIe attached DRAM.
- DAMON user API centralization and simplificatio work in SeongJae
Park's series "mm/damon: introduce DAMON parameters online commit
function".
- In the series "mm: page_type, zsmalloc and page_mapcount_reset()"
David Hildenbrand does some maintenance work on zsmalloc - partially
modernizing its use of pageframe fields.
- Kefeng Wang provides more folio conversions in the series "mm: remove
page_maybe_dma_pinned() and page_mkclean()".
- More cleanup from David Hildenbrand, this time in the series
"mm/memory_hotplug: use PageOffline() instead of PageReserved() for
!ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline()
pages" and permits the removal of some virtio-mem hacks.
- Barry Song's series "mm: clarify folio_add_new_anon_rmap() and
__folio_add_anon_rmap()" is a cleanup to the anon folio handling in
preparation for mTHP (multisize THP) swapin.
- Kefeng Wang's series "mm: improve clear and copy user folio"
implements more folio conversions, this time in the area of large
folio userspace copying.
- The series "Docs/mm/damon/maintaier-profile: document a mailing tool
and community meetup series" tells people how to get better involved
with other DAMON developers. From SeongJae Park.
- A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does
that.
- David Hildenbrand sends along more cleanups, this time against the
migration code. The series is "mm/migrate: move NUMA hinting fault
folio isolation + checks under PTL".
- Jan Kara has found quite a lot of strangenesses and minor errors in
the readahead code. He addresses this in the series "mm: Fix various
readahead quirks".
- SeongJae Park's series "selftests/damon: test DAMOS tried regions and
{min,max}_nr_regions" adds features and addresses errors in DAMON's
self testing code.
- Gavin Shan has found a userspace-triggerable WARN in the pagecache
code. The series "mm/filemap: Limit page cache size to that supported
by xarray" addresses this. The series is marked cc:stable.
- Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations
and cleanup" cleans up and slightly optimizes KSM.
- Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of
code motion. The series (which also makes the memcg-v1 code
Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put
under config option" and "mm: memcg: put cgroup v1-specific memcg
data under CONFIG_MEMCG_V1"
- Dan Schatzberg's series "Add swappiness argument to memory.reclaim"
adds an additional feature to this cgroup-v2 control file.
- The series "Userspace controls soft-offline pages" from Jiaqi Yan
permits userspace to stop the kernel's automatic treatment of
excessive correctable memory errors. In order to permit userspace to
monitor and handle this situation.
- Kefeng Wang's series "mm: migrate: support poison recover from
migrate folio" teaches the kernel to appropriately handle migration
from poisoned source folios rather than simply panicing.
- SeongJae Park's series "Docs/damon: minor fixups and improvements"
does those things.
- In the series "mm/zsmalloc: change back to per-size_class lock"
Chengming Zhou improves zsmalloc's scalability and memory
utilization.
- Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for
pinning memfd folios" makes the GUP code use FOLL_PIN rather than
bare refcount increments. So these paes can first be moved aside if
they reside in the movable zone or a CMA block.
- Andrii Nakryiko has added a binary ioctl()-based API to
/proc/pid/maps for much faster reading of vma information. The series
is "query VMAs from /proc/<pid>/maps".
- In the series "mm: introduce per-order mTHP split counters" Lance
Yang improves the kernel's presentation of developer information
related to multisize THP splitting.
- Michael Ellerman has developed the series "Reimplement huge pages
without hugepd on powerpc (8xx, e500, book3s/64)". This permits
userspace to use all available huge page sizes.
- In the series "revert unconditional slab and page allocator fault
injection calls" Vlastimil Babka removes a performance-affecting and
not very useful feature from slab fault injection.
* tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits)
mm/mglru: fix ineffective protection calculation
mm/zswap: fix a white space issue
mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio
mm/hugetlb: fix possible recursive locking detected warning
mm/gup: clear the LRU flag of a page before adding to LRU batch
mm/numa_balancing: teach mpol_to_str about the balancing mode
mm: memcg1: convert charge move flags to unsigned long long
alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting
lib: reuse page_ext_data() to obtain codetag_ref
lib: add missing newline character in the warning message
mm/mglru: fix overshooting shrinker memory
mm/mglru: fix div-by-zero in vmpressure_calc_level()
mm/kmemleak: replace strncpy() with strscpy()
mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC
mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB
mm: ignore data-race in __swap_writepage
hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr
mm: shmem: rename mTHP shmem counters
mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async()
mm/migrate: putback split folios when numa hint migration fails
...
|
|
On 8xx, only the shift field is used in struct mmu_psize_def
Remove other fields and related macros.
Link: https://lkml.kernel.org/r/dd0587a9e8354005858c7f8c9a775ad05523b314.1719928057.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In order to fit better with standard Linux page tables layout, add support
for 8M pages using contiguous PTE entries in a standard page table. Page
tables will then be populated with 1024 similar entries and two PMD
entries will point to that page table.
The PMD entries also get a flag to tell it is addressing an 8M page, this
is required for the HW tablewalk assistance.
Link: https://lkml.kernel.org/r/8693d9a0408371043ca63bf9e4a9c140667af63e.1719928057.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
_PAGE_PSIZE macro is never used outside the place it is defined and is
used only on 8xx and e500.
Remove indirection, remove it and use its content directly.
Link: https://lkml.kernel.org/r/c41da3b0ceda7311a50f0391cc4d54302ae15b74.1719928057.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Now that 40x platforms have gone, remove support
for 40x in the core of powerpc arch.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240628121201.130802-4-mpe@ellerman.id.au
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Add support for KVM running as a nested hypervisor under development
versions of PowerVM, using the new PAPR nested virtualisation API
- Add support for the BPF prog pack allocator
- A rework of the non-server MMU handling to support execute-only on
all platforms
- Some optimisations & cleanups for the powerpc qspinlock code
- Various other small features and fixes
Thanks to Aboorva Devarajan, Aditya Gupta, Amit Machhiwal, Benjamin
Gray, Christophe Leroy, Dr. David Alan Gilbert, Gaurav Batra, Gautam
Menghani, Geert Uytterhoeven, Haren Myneni, Hari Bathini, Joel Stanley,
Jordan Niethe, Julia Lawall, Kautuk Consul, Kuan-Wei Chiu, Michael
Neuling, Minjie Du, Muhammad Muzammil, Naveen N Rao, Nicholas Piggin,
Nick Child, Nysal Jan K.A, Peter Lafreniere, Rob Herring, Sachin Sant,
Sebastian Andrzej Siewior, Shrikanth Hegde, Srikar Dronamraju, Stanislav
Kinsburskii, Vaibhav Jain, Wang Yufen, Yang Yingliang, and Yuan Tan.
* tag 'powerpc-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (100 commits)
powerpc/vmcore: Add MMU information to vmcoreinfo
Revert "powerpc: add `cur_cpu_spec` symbol to vmcoreinfo"
powerpc/bpf: use bpf_jit_binary_pack_[alloc|finalize|free]
powerpc/bpf: rename powerpc64_jit_data to powerpc_jit_data
powerpc/bpf: implement bpf_arch_text_invalidate for bpf_prog_pack
powerpc/bpf: implement bpf_arch_text_copy
powerpc/code-patching: introduce patch_instructions()
powerpc/32s: Implement local_flush_tlb_page_psize()
powerpc/pseries: use kfree_sensitive() in plpks_gen_password()
powerpc/code-patching: Perform hwsync in __patch_instruction() in case of failure
powerpc/fsl_msi: Use device_get_match_data()
powerpc: Remove cpm_dp...() macros
powerpc/qspinlock: Rename yield_propagate_owner tunable
powerpc/qspinlock: Propagate sleepy if previous waiter is preempted
powerpc/qspinlock: don't propagate the not-sleepy state
powerpc/qspinlock: propagate owner preemptedness rather than CPU number
powerpc/qspinlock: stop queued waiters trying to set lock sleepy
powerpc/perf: Fix disabling BHRB and instruction sampling
powerpc/trace: Add support for HAVE_FUNCTION_ARG_ACCESS_API
powerpc/tools: Pass -mabi=elfv2 to gcc-check-mprofile-kernel.sh
...
|
|
Introduce PAGE_EXECONLY_X macro which provides exec-only rights.
The _X may be seen as redundant with the EXECONLY but it helps
keep consistency, all macros having the EXEC right have _X.
And put it next to PAGE_NONE as PAGE_EXECONLY_X is
somehow PAGE_NONE + EXEC just like all other SOMETHING_X are
just SOMETHING + EXEC.
On book3s/64 PAGE_EXECONLY becomes PAGE_READONLY_X.
On book3s/64, as PAGE_EXECONLY is only valid for Radix add
VM_READ flag in vm_get_page_prot() for non-Radix.
And update access_error() so that a non exec fault on a VM_EXEC only
mapping is always invalid, even when the underlying layer don't
always generate a fault for that.
For 8xx, set PAGE_EXECONLY_X as _PAGE_NA | _PAGE_EXEC.
For others, only set it as just _PAGE_EXEC
With that change, 8xx, e500 and 44x fully honor execute-only
protection.
On 40x that is a partial implementation of execute-only. The
implementation won't be complete because once a TLB has been loaded
via the Instruction TLB miss handler, it will be possible to read
the page. But at least it can't be read unless it is executed first.
On 603 MMU, TLB missed are handled by SW and there are separate
DTLB and ITLB. Execute-only is therefore now supported by not loading
DTLB when read access is not permitted.
On hash (604) MMU it is more tricky because hash table is common to
load/store and execute. Nevertheless it is still possible to check
whether _PAGE_READ is set before loading hash table for a load/store
access. At least it can't be read unless it is executed first.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/4283ea9cbef9ff2fbee468904800e1962bc8fc18.1695659959.git.christophe.leroy@csgroup.eu
|
|
_PAGE_USER is used to select the zone. Today zone 0 is kernel
and zone 1 is user.
To implement _PAGE_NONE, _PAGE_USER is cleared, leading to no access
for user but kernel still has access to the page so it's possible for
a user application to write in that page by using a kernel function
as trampoline.
What is really wanted is to have user rights on pages below TASK_SIZE
and no user rights on pages above TASK_SIZE. Use zones for that.
There are 16 zones so lets use the 4 upper address bits to set the
zone and declare zone rights based on TASK_SIZE.
Then drop _PAGE_USER and reuse it as _PAGE_READ that will be checked
in Data TLB miss handler. That will properly handle PAGE_NONE for
both kernel and user.
In addition, it partially implements execute-only right. The
implementation won't be complete because once a TLB has been loaded
via the Instruction TLB miss handler, it will be possible to read
the page. But at least it can't be read unless it is executed first.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/2a13e3ba8a5dec43143cc1f9a91ec71ea1529f3c.1695659959.git.christophe.leroy@csgroup.eu
|
|
44x MMU has 6 page protection bits:
- R, W, X for supervisor
- R, W, X for user
It means that it can support X without R.
To do that, _PAGE_READ flag is needed but there is no bit available
for it in PTE. On the other hand the only real use of _PAGE_USER is
to implement PAGE_NONE by clearing _PAGE_USER.
As _PAGE_NONE can also be implemented by clearing _PAGE_READ,
remove _PAGE_USER and add _PAGE_READ. In order to insert bits in
one go during TLB miss, move _PAGE_ACCESSED and put _PAGE_READ
just after _PAGE_DIRTY so that _PAGE_DIRTY is copied into SW and
_PAGE_READ into SR at once.
With that change, 44x now also honors execute-only protection.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/043e17987b260b99b45094138c6cb2e89e63d499.1695659959.git.christophe.leroy@csgroup.eu
|
|
e500 MMU has 6 page protection bits:
- R, W, X for supervisor
- R, W, X for user
It means that it can support X without R.
To do that, _PAGE_READ flag is needed.
With 32 bits PTE there is no bit available for it in PTE. On the
other hand the only real use of _PAGE_USER is to implement PAGE_NONE
by clearing _PAGE_USER. As _PAGE_NONE can also be implemented by
clearing _PAGE_READ, remove _PAGE_USER and add _PAGE_READ. Move
_PAGE_PRESENT into bit 30 so that _PAGE_READ can match SR bit.
With 64 bits PTE _PAGE_USER is already the combination of SR and UR
so all we need to do is to rename it _PAGE_READ.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/0849ab6bf7ae2af23f94b0457fa40d0ea3983fe4.1695659959.git.christophe.leroy@csgroup.eu
|
|
pte_user() is now only used in pte_access_permitted() to check
access on vmas. User flag is cleared to make a page unreadable.
So rename it pte_read() and remove pte_user() which isn't used
anymore.
For the time being it checks _PAGE_USER but in the near futur
all plateforms will be converted to _PAGE_READ so lets support
both for now.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/72cbb5be595e9ef884140def73815ed0b0b37010.1695659959.git.christophe.leroy@csgroup.eu
|
|
Several places, _PAGE_RW maps to write permission and don't
always imply read. To make it more clear, do as book3s/64 in
commit c7d54842deb1 ("powerpc/mm: Use _PAGE_READ to indicate
Read access") and use _PAGE_WRITE when more relevant.
For the time being _PAGE_WRITE is equivalent to _PAGE_RW but that
will change when _PAGE_READ gets added in following patches.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/1f79b88db54d030ada776dc9845e0e88345bfc28.1695659959.git.christophe.leroy@csgroup.eu
|
|
8xx already has _PAGE_NA and _PAGE_RO. So add _PAGE_ROX, _PAGE_RW and
_PAGE_RWX and remove specific permission masks.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/5d0b5ce43485f697313eee4326ddff97157fb219.1695659959.git.christophe.leroy@csgroup.eu
|
|
pte_mkuser() is never used. Remove it.
pte_mkpriviledged() is not used anymore. Remove it too.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/1a1dc18816456c637dc8a9c38d532f7598b60ac4.1695659959.git.christophe.leroy@csgroup.eu
|
|
nohash/32 version of __ptep_set_access_flags() does the same
as nohash/64 version, the only difference is that nohash/32
version is more complete and uses pte_update().
Make it common and remove the nohash/64 version.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/e296885df46289d3e5f4cb51efeefe593f76ef24.1695659959.git.christophe.leroy@csgroup.eu
|
|
pte_clear() are doing the same on nohash/32 and nohash/64,
Keep the static inline version of nohash/64, make it common and
remove the macro version of nohash/32.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/818f7df83d7e9e18f55e274cd3c44f2871ade4dd.1695659959.git.christophe.leroy@csgroup.eu
|
|
ptep_set_wrprotect() and ptep_get_and_clear are identical for
nohash/32 and nohash/64.
Make them common.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/ffe46edecdabce915e2d1a4b79a3b2ab770f2248.1695659959.git.christophe.leroy@csgroup.eu
|
|
Remove ptep_test_and_clear_young() macro, make
__ptep_test_and_clear_young() common to nohash/32 and nohash/64
and change it to become ptep_test_and_clear_young()
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/de90679ae169104fa3c98d0a8828d7ede228fc52.1695659959.git.christophe.leroy@csgroup.eu
|
|
Deduplicate following helpers that are identical on
nohash/32 and nohash/64:
pte_mkwrite_novma()
pte_mkdirty()
pte_mkyoung()
pte_wrprotect()
pte_mkexec()
pte_young()
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/485f3cbafaa948143f92692e17ca652dfed68da2.1695659959.git.christophe.leroy@csgroup.eu
|
|
_PAGE_CHG_MASK is identical between nohash/32 and nohash/64,
deduplicate it.
While at it, clean the #ifdef for PTE_RPN_MASK in nohash/32 as
it is already CONFIG_PPC32.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/c1a1893dbba4afb825bfa78b262f0b0e0fc3b490.1695659959.git.christophe.leroy@csgroup.eu
|
|
pte_update() is similar.
Take the nohash/32 version which works on nohash/64 and add the debug
call to assert_pte_locked() which is only on nohash/64.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/e01cb630cad42f645915ce7702d23985241b71fc.1695659959.git.christophe.leroy@csgroup.eu
|
|
No need of a #ifdef, use IS_ENABLED(CONFIG_44x)
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/7c7d97322a6a05a9842b1e8c4b41265916f542ca.1695659959.git.christophe.leroy@csgroup.eu
|
|
No point in having 8xx special pte_update() in common header,
move it into pte-8xx.h
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/17e209b1a1a43ed219e9e1f2947ec594ed4f9394.1695659959.git.christophe.leroy@csgroup.eu
|
|
map_kernel_page() and unmap_kernel_page() have the same prototypes
on nohash/32 and nohash/64, keep only one declaration.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/7fec5f3288cf0d0eac61b1b3f48c3ea54eb80cad.1695659959.git.christophe.leroy@csgroup.eu
|
|
fixmap.h need pgtable.h for [un]map_kernel_page()
pgtable.h need fixmap.h for FIXADDR_TOP.
Untangle the two files by moving FIXADDR_TOP into pgtable.h
Also move VIRT_IMMR_BASE to fixmap.h to avoid fixmap.h in mmu.h
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/5eba12392a018be28ad0a02ed844767b132589e7.1695659959.git.christophe.leroy@csgroup.eu
|
|
pte_ERROR() is used neither in powerpc code nor in common mm code.
Remove it.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/bec9eb973ecc1cba091e5c9201d877a7797f3242.1695659959.git.christophe.leroy@csgroup.eu
|
|
40x TLB handlers were reworked by commit 2c74e2586bb9 ("powerpc/40x:
Rework 40x PTE access and TLB miss") to not require PTE_ATOMIC_UPDATES
anymore.
Then commit 4e1df545e2fa ("powerpc/pgtable: Drop PTE_ATOMIC_UPDATES")
removed all code related to PTE_ATOMIC_UPDATES.
Remove left over PTE_ATOMIC_UPDATES macro.
Fixes: 2c74e2586bb9 ("powerpc/40x: Rework 40x PTE access and TLB miss")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/f061db5857fcd748f84a6707aad01754686ce97e.1695659959.git.christophe.leroy@csgroup.eu
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix softlockup/crash when using hcall tracing
- Fix pte_access_permitted() for PAGE_NONE on 8xx
- Fix inverted pte_young() test in __ptep_test_and_clear_young()
on 64-bit BookE
- Fix unhandled math emulation exception on 85xx
- Fix kernel crash on syscall return on 476
Thanks to Athira Rajeev, Christophe Leroy, Eddie James, and Naveen N
Rao.
* tag 'powerpc-6.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/47x: Fix 47x syscall return crash
powerpc/85xx: Fix math emulation exception
powerpc/64e: Fix wrong test in __ptep_test_and_clear_young()
powerpc/8xx: Fix pte_access_permitted() for PAGE_NONE
powerpc/pseries: Remove unused r0 in the hcall tracing code
powerpc/pseries: Fix STK_PARAM access in the hcall tracing code
|
|
On 8xx, PAGE_NONE is handled by setting _PAGE_NA instead of clearing
_PAGE_USER.
But then pte_user() returns 1 also for PAGE_NONE.
As _PAGE_NA prevent reads, add a specific version of pte_read()
that returns 0 when _PAGE_NA is set instead of always returning 1.
Fixes: 351750331fc1 ("powerpc/mm: Introduce _PAGE_NA")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/57bcfbe578e43123f9ed73e040229b80f1ad56ec.1695659959.git.christophe.leroy@csgroup.eu
|
|
Patch series "Fix set_huge_pte_at() panic on arm64", v2.
This series fixes a bug in arm64's implementation of set_huge_pte_at(),
which can result in an unprivileged user causing a kernel panic. The
problem was triggered when running the new uffd poison mm selftest for
HUGETLB memory. This test (and the uffd poison feature) was merged for
v6.5-rc7.
Ideally, I'd like to get this fix in for v6.6 and I've cc'ed stable
(correctly this time) to get it backported to v6.5, where the issue first
showed up.
Description of Bug
==================
arm64's huge pte implementation supports multiple huge page sizes, some of
which are implemented in the page table with multiple contiguous entries.
So set_huge_pte_at() needs to work out how big the logical pte is, so that
it can also work out how many physical ptes (or pmds) need to be written.
It previously did this by grabbing the folio out of the pte and querying
its size.
However, there are cases when the pte being set is actually a swap entry.
But this also used to work fine, because for huge ptes, we only ever saw
migration entries and hwpoison entries. And both of these types of swap
entries have a PFN embedded, so the code would grab that and everything
still worked out.
But over time, more calls to set_huge_pte_at() have been added that set
swap entry types that do not embed a PFN. And this causes the code to go
bang. The triggering case is for the uffd poison test, commit
99aa77215ad0 ("selftests/mm: add uffd unit test for UFFDIO_POISON"), which
causes a PTE_MARKER_POISONED swap entry to be set, coutesey of commit
8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") -
added in v6.5-rc7. Although review shows that there are other call sites
that set PTE_MARKER_UFFD_WP (which also has no PFN), these don't trigger
on arm64 because arm64 doesn't support UFFD WP.
If CONFIG_DEBUG_VM is enabled, we do at least get a BUG(), but otherwise,
it will dereference a bad pointer in page_folio():
static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry)
{
VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry));
return page_folio(pfn_to_page(swp_offset_pfn(entry)));
}
Fix
===
The simplest fix would have been to revert the dodgy cleanup commit
18f3962953e4 ("mm: hugetlb: kill set_huge_swap_pte_at()"), but since
things have moved on, this would have required an audit of all the new
set_huge_pte_at() call sites to see if they should be converted to
set_huge_swap_pte_at(). As per the original intent of the change, it
would also leave us open to future bugs when people invariably get it
wrong and call the wrong helper.
So instead, I've added a huge page size parameter to set_huge_pte_at().
This means that the arm64 code has the size in all cases. It's a bigger
change, due to needing to touch the arches that implement the function,
but it is entirely mechanical, so in my view, low risk.
I've compile-tested all touched arches; arm64, parisc, powerpc, riscv,
s390, sparc (and additionally x86_64). I've additionally booted and run
mm selftests against arm64, where I observe the uffd poison test is fixed,
and there are no other regressions.
This patch (of 2):
In order to fix a bug, arm64 needs to be told the size of the huge page
for which the pte is being set in set_huge_pte_at(). Provide for this by
adding an `unsigned long sz` parameter to the function. This follows the
same pattern as huge_pte_clear().
This commit makes the required interface modifications to the core mm as
well as all arches that implement this function (arm64, parisc, powerpc,
riscv, s390, sparc). The actual arm64 bug will be fixed in a separate
commit.
No behavioral changes intended.
Link: https://lkml.kernel.org/r/20230922115804.2043771-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20230922115804.2043771-2-ryan.roberts@arm.com
Fixes: 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> [powerpc 8xx]
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> [vmalloc change]
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org> [6.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Add HOTPLUG_SMT support (/sys/devices/system/cpu/smt) and honour the
configured SMT state when hotplugging CPUs into the system
- Combine final TLB flush and lazy TLB mm shootdown IPIs when using the
Radix MMU to avoid a broadcast TLBIE flush on exit
- Drop the exclusion between ptrace/perf watchpoints, and drop the now
unused associated arch hooks
- Add support for the "nohlt" command line option to disable CPU idle
- Add support for -fpatchable-function-entry for ftrace, with GCC >=
13.1
- Rework memory block size determination, and support 256MB size on
systems with GPUs that have hotpluggable memory
- Various other small features and fixes
Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira
Rajeev, Benjamin Gray, Christophe Leroy, Frederic Barrat, Gautam
Menghani, Geoff Levand, Hari Bathini, Immad Mir, Jialin Zhang, Joel
Stanley, Jordan Niethe, Justin Stitt, Kajol Jain, Kees Cook, Krzysztof
Kozlowski, Laurent Dufour, Liang He, Linus Walleij, Mahesh Salgaonkar,
Masahiro Yamada, Michal Suchanek, Nageswara R Sastry, Nathan Chancellor,
Nathan Lynch, Naveen N Rao, Nicholas Piggin, Nick Desaulniers, Omar
Sandoval, Randy Dunlap, Reza Arbab, Rob Herring, Russell Currey, Sourabh
Jain, Thomas Gleixner, Trevor Woerner, Uwe Kleine-König, Vaibhav Jain,
Xiongfeng Wang, Yuan Tan, Zhang Rui, and Zheng Zengkai.
* tag 'powerpc-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (135 commits)
macintosh/ams: linux/platform_device.h is needed
powerpc/xmon: Reapply "Relax frame size for clang"
powerpc/mm/book3s64: Use 256M as the upper limit with coherent device memory attached
powerpc/mm/book3s64: Fix build error with SPARSEMEM disabled
powerpc/iommu: Fix notifiers being shared by PCI and VIO buses
powerpc/mpc5xxx: Add missing fwnode_handle_put()
powerpc/config: Disable SLAB_DEBUG_ON in skiroot
powerpc/pseries: Remove unused hcall tracing instruction
powerpc/pseries: Fix hcall tracepoints with JUMP_LABEL=n
powerpc: dts: add missing space before {
powerpc/eeh: Use pci_dev_id() to simplify the code
powerpc/64s: Move CPU -mtune options into Kconfig
powerpc/powermac: Fix unused function warning
powerpc/pseries: Rework lppaca_shared_proc() to avoid DEBUG_PREEMPT
powerpc: Don't include lppaca.h in paca.h
powerpc/pseries: Move hcall_vphn() prototype into vphn.h
powerpc/pseries: Move VPHN constants into vphn.h
cxl: Drop unused detach_spa()
powerpc: Drop zalloc_maybe_bootmem()
powerpc/powernv: Use struct opal_prd_msg in more places
...
|
|
Making virt_to_pfn() a static inline taking a strongly typed
(const void *) makes the contract of a passing a pointer of that
type to the function explicit and exposes any misuse of the
macro virt_to_pfn() acting polymorphic and accepting many types
such as (void *), (unitptr_t) or (unsigned long) as arguments
without warnings.
Move the virt_to_pfn() and related functions below the
declaration of __pa() so it compiles.
For symmetry do the same with pfn_to_kaddr().
As the file is included right into the linker file, we need
to surround the functions with ifndef __ASSEMBLY__ so we
don't cause compilation errors.
The conversion moreover exposes the fact that pmd_page_vaddr()
was returning an unsigned long rather than a const void * as
could be expected, so all the sites defining pmd_page_vaddr()
had to be augmented as well.
Finally the KVM code in book3s_64_mmu_hv.c was passing an
unsigned int to virt_to_phys() so fix that up with a cast so the
result compiles.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
[mpe: Fixup kfence.h, simplify pfn_to_kaddr() & pmd_page_vaddr()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230809-virt-to-phys-powerpc-v1-1-12e912a7d439@linaro.org
|
|
To avoid a useless nop on top of every uaccess enable/disable and
make life easier for objtool, replace static branches by ASM feature
fixups that will nop KUAP enabling instructions out in the unlikely
case KUAP is disabled at boottime.
Leave it as is on book3s/64 for now, it will be handled later when
objtool is activated on PPC64.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/671948788024fd890ec4ed175bc332dab8664ea5.1689091022.git.christophe.leroy@csgroup.eu
|
|
Objtool reports following warnings:
arch/powerpc/kernel/signal_32.o: warning: objtool:
__prevent_user_access.constprop.0+0x4 (.text+0x4):
redundant UACCESS disable
arch/powerpc/kernel/signal_32.o: warning: objtool: user_access_begin+0x2c
(.text+0x4c): return with UACCESS enabled
arch/powerpc/kernel/signal_32.o: warning: objtool: handle_rt_signal32+0x188
(.text+0x360): call to __prevent_user_access.constprop.0() with UACCESS enabled
arch/powerpc/kernel/signal_32.o: warning: objtool: handle_signal32+0x150
(.text+0x4d4): call to __prevent_user_access.constprop.0() with UACCESS enabled
This is due to some KUAP enabling/disabling functions being outline
allthough they are marked inline. Use __always_inline instead.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/ca5e50ddbec3867db5146ebddbc9a1dc0e443bc8.1689091022.git.christophe.leroy@csgroup.eu
|
|
All but book3s/64 use a static branch key for disabling kuap.
book3s/64 uses an mmu feature.
Refactor all targets to use MMU_FTR_KUAP like book3s/64.
For PPC32 that implies updating mmu features fixups once KUAP
has been initialised.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/6b3d7c977bad73378ea368bc6818e9c94ea95ab0.1689091022.git.christophe.leroy@csgroup.eu
|
|
Disassembly of interrupt_enter_prepare() shows a pointless nop
before the mftb
c000abf0 <interrupt_enter_prepare>:
c000abf0: 81 23 00 84 lwz r9,132(r3)
c000abf4: 71 29 40 00 andi. r9,r9,16384
c000abf8: 41 82 00 28 beq- c000ac20 <interrupt_enter_prepare+0x30>
c000abfc: ===> 60 00 00 00 nop <====
c000ac00: 7d 0c 42 e6 mftb r8
c000ac04: 80 e2 00 08 lwz r7,8(r2)
c000ac08: 81 22 00 28 lwz r9,40(r2)
c000ac0c: 91 02 00 24 stw r8,36(r2)
c000ac10: 7d 29 38 50 subf r9,r9,r7
c000ac14: 7d 29 42 14 add r9,r9,r8
c000ac18: 91 22 00 08 stw r9,8(r2)
c000ac1c: 4e 80 00 20 blr
c000ac20: 60 00 00 00 nop
c000ac24: 7d 5a c2 a6 mfmd_ap r10
c000ac28: 3d 20 de 00 lis r9,-8704
c000ac2c: 91 43 00 b0 stw r10,176(r3)
c000ac30: 7d 3a c3 a6 mtspr 794,r9
c000ac34: 4e 80 00 20 blr
That comes from the call to kuap_loc(), allthough __kuap_lock() is an
empty function on the 8xx.
To avoid that, only perform kuap_is_disabled() check when there is
something to do with __kuap_lock().
Do the same with __kuap_save_and_lock() and
__kuap_get_and_assert_locked().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/a854d25bea375d4ba6ca9c2617f9edbba397100a.1689091022.git.christophe.leroy@csgroup.eu
|
|
A disassembly of interrupt_exit_kernel_prepare() shows a useless read
of MD_AP register. This is shown by r9 being re-used immediately without
doing anything with the value read.
c000e0e0: 60 00 00 00 nop
c000e0e4: ===> 7d 3a c2 a6 mfmd_ap r9 <====
c000e0e8: 7d 20 00 a6 mfmsr r9
c000e0ec: 7c 51 13 a6 mtspr 81,r2
c000e0f0: 81 3f 00 84 lwz r9,132(r31)
c000e0f4: 71 29 80 00 andi. r9,r9,32768
kuap_get_and_assert_locked() is paired with kuap_kernel_restore()
and are only used in interrupt_exit_kernel_prepare(). The value
returned by kuap_get_and_assert_locked() is only used by
kuap_kernel_restore().
On 8xx, kuap_kernel_restore() doesn't use the value read by
kuap_get_and_assert_locked() so modify kuap_get_and_assert_locked()
to not perform the read of MD_AP and return 0 instead.
The same applies on BOOKE.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/bcbc84c2dd90bb1021da792b1968cdc22112dad8.1689091022.git.christophe.leroy@csgroup.eu
|
|
The x86 Shadow stack feature includes a new type of memory called shadow
stack. This shadow stack memory has some unusual properties, which requires
some core mm changes to function properly.
One of these unusual properties is that shadow stack memory is writable,
but only in limited ways. These limits are applied via a specific PTE
bit combination. Nevertheless, the memory is writable, and core mm code
will need to apply the writable permissions in the typical paths that
call pte_mkwrite(). The goal is to make pte_mkwrite() take a VMA, so
that the x86 implementation of it can know whether to create regular
writable or shadow stack mappings.
But there are a couple of challenges to this. Modifying the signatures of
each arch pte_mkwrite() implementation would be error prone because some
are generated with macros and would need to be re-implemented. Also, some
pte_mkwrite() callers operate on kernel memory without a VMA.
So this can be done in a three step process. First pte_mkwrite() can be
renamed to pte_mkwrite_novma() in each arch, with a generic pte_mkwrite()
added that just calls pte_mkwrite_novma(). Next callers without a VMA can
be moved to pte_mkwrite_novma(). And lastly, pte_mkwrite() and all callers
can be changed to take/pass a VMA.
Start the process by renaming pte_mkwrite() to pte_mkwrite_novma() and
adding the pte_mkwrite() wrapper in linux/pgtable.h. Apply the same
pattern for pmd_mkwrite(). Since not all archs have a pmd_mkwrite_novma(),
create a new arch config HAS_HUGE_PAGE that can be used to tell if
pmd_mkwrite() should be defined. Otherwise in the !HAS_HUGE_PAGE cases the
compiler would not be able to find pmd_mkwrite_novma().
No functional change.
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/lkml/CAHk-=wiZjSu7c9sFYZb3q04108stgHff2wfbokGCCgW7riz+8Q@mail.gmail.com/
Link: https://lore.kernel.org/all/20230613001108.3040476-2-rick.p.edgecombe%40intel.com
|
|
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit and 64bit.
On 64bit, let's use MSB 56 (LSB 7), located right next to the page type.
On 32bit, let's use LSB 2 to avoid stealing one bit from the swap offset.
There seems to be no real reason why these bits cannot be used for swap
PTEs. The important part is that _PAGE_PRESENT and _PAGE_HASHPTE remain
0.
While at it, mask the type in __swp_entry() and remove _PAGE_BIT_SWAP_TYPE
from pte-e500.h: while it was used in 64bit code it was ignored in 32bit
code.
Link: https://lkml.kernel.org/r/20230113171026.582290-19-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Add powerpc qspinlock implementation optimised for large system
scalability and paravirt. See the merge message for more details
- Enable objtool to be built on powerpc to generate mcount locations
- Use a temporary mm for code patching with the Radix MMU, so the
writable mapping is restricted to the patching CPU
- Add an option to build the 64-bit big-endian kernel with the ELFv2
ABI
- Sanitise user registers on interrupt entry on 64-bit Book3S
- Many other small features and fixes
Thanks to Aboorva Devarajan, Angel Iglesias, Benjamin Gray, Bjorn
Helgaas, Bo Liu, Chen Lifu, Christoph Hellwig, Christophe JAILLET,
Christophe Leroy, Christopher M. Riedl, Colin Ian King, Deming Wang,
Disha Goel, Dmitry Torokhov, Finn Thain, Geert Uytterhoeven, Gustavo A.
R. Silva, Haowen Bai, Joel Stanley, Jordan Niethe, Julia Lawall, Kajol
Jain, Laurent Dufour, Li zeming, Miaoqian Lin, Michael Jeanson, Nathan
Lynch, Naveen N. Rao, Nayna Jain, Nicholas Miehlbradt, Nicholas Piggin,
Pali Rohár, Randy Dunlap, Rohan McLure, Russell Currey, Sathvika
Vasireddy, Shaomin Deng, Stephen Kitt, Stephen Rothwell, Thomas
Weißschuh, Tiezhu Yang, Uwe Kleine-König, Xie Shaowen, Xiu Jianfeng,
XueBing Chen, Yang Yingliang, Zhang Jiaming, ruanjinjie, Jessica Yu,
and Wolfram Sang.
* tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (181 commits)
powerpc/code-patching: Fix oops with DEBUG_VM enabled
powerpc/qspinlock: Fix 32-bit build
powerpc/prom: Fix 32-bit build
powerpc/rtas: mandate RTAS syscall filtering
powerpc/rtas: define pr_fmt and convert printk call sites
powerpc/rtas: clean up includes
powerpc/rtas: clean up rtas_error_log_max initialization
powerpc/pseries/eeh: use correct API for error log size
powerpc/rtas: avoid scheduling in rtas_os_term()
powerpc/rtas: avoid device tree lookups in rtas_os_term()
powerpc/rtasd: use correct OF API for event scan rate
powerpc/rtas: document rtas_call()
powerpc/pseries: unregister VPA when hot unplugging a CPU
powerpc/pseries: reset the RCU watchdogs after a LPM
powerpc: Take in account addition CPU node when building kexec FDT
powerpc: export the CPU node count
powerpc/cpuidle: Set CPUIDLE_FLAG_POLLING for snooze state
powerpc/dts/fsl: Fix pca954x i2c-mux node names
cxl: Remove unnecessary cxl_pci_window_alignment()
selftests/powerpc: Fix resource leaks
...
|
|
Since __HAVE_ARCH_* style guards have been depricated in favour of
defining the function name onto itself, convert pxxp_get().
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/Y2EUEBlQXNgaJgoI@hirez.programming.kicks-ass.net
|
|
While looking at code generated for code patching, I saw that
pte_clear generated:
2d8: 38 a0 00 00 li r5,0
2dc: 38 e0 10 00 li r7,4096
2e0: 39 00 20 00 li r8,8192
2e4: 39 40 30 00 li r10,12288
2e8: 90 a9 00 00 stw r5,0(r9)
2ec: 90 e9 00 04 stw r7,4(r9)
2f0: 91 09 00 08 stw r8,8(r9)
2f4: 91 49 00 0c stw r10,12(r9)
With 16k pages, only the first entry is used by the kernel, so no need
to adapt the address of other entries. Only duplicate the first entry
for hardware.
Now it is:
2cc: 39 40 00 00 li r10,0
2d0: 91 49 00 00 stw r10,0(r9)
2d4: 91 49 00 04 stw r10,4(r9)
2d8: 91 49 00 08 stw r10,8(r9)
2dc: 91 49 00 0c stw r10,12(r9)
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/65f76300de07091a59a042a3db2d0ce9b939a05c.1664346532.git.christophe.leroy@csgroup.eu
|
|
CONFIG_PPC_FSL_BOOK3E is redundant with CONFIG_PPC_E500.
Remove it.
And rename five files accordingly.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Rename include guards to match new file names]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/795cb93b88c9a0279289712e674f39e3b108a1b4.1663606876.git.christophe.leroy@csgroup.eu
|
|
PPC_85xx is PPC32 only.
PPC_85xx always selects E500 and is the only PPC32 that
selects E500.
FSL_BOOKE is selected when E500 and PPC32 are selected.
So FSL_BOOKE is redundant with PPC_85xx.
Remove FSL_BOOKE.
And rename four files accordingly.
cpu_setup_fsl_booke.S is not renamed because it is linked to
PPC_FSL_BOOK3E and not to FSL_BOOKE as suggested by its name.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/08e3e15594e66d63b9e89c5b4f9c35153913c28f.1663606875.git.christophe.leroy@csgroup.eu
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Livepatch support for 32-bit is probably the standout new feature,
otherwise mostly just lots of bits and pieces all over the board.
There's a series of commits cleaning up function descriptor handling,
which touches a few other arches as well as LKDTM. It has acks from
Arnd, Kees and Helge.
Summary:
- Enforce kernel RO, and implement STRICT_MODULE_RWX for 603.
- Add support for livepatch to 32-bit.
- Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS.
- Merge vdso64 and vdso32 into a single directory.
- Fix build errors with newer binutils.
- Add support for UADDR64 relocations, which are emitted by some
toolchains. This allows powerpc to build with the latest lld.
- Fix (another) potential userspace r13 corruption in transactional
memory handling.
- Cleanups of function descriptor handling & related fixes to LKDTM.
Thanks to Abdul Haleem, Alexey Kardashevskiy, Anders Roxell, Aneesh
Kumar K.V, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Bhaskar
Chowdhury, Cédric Le Goater, Chen Jingwen, Christophe JAILLET,
Christophe Leroy, Corentin Labbe, Daniel Axtens, Daniel Henrique
Barboza, David Dai, Fabiano Rosas, Ganesh Goudar, Guo Zhengkui, Hangyu
Hua, Haren Myneni, Hari Bathini, Igor Zhbanov, Jakob Koschel, Jason
Wang, Jeremy Kerr, Joachim Wiberg, Jordan Niethe, Julia Lawall, Kajol
Jain, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Mamatha Inamdar,
Maxime Bizon, Maxim Kiselev, Maxim Kochetkov, Michal Suchanek,
Nageswara R Sastry, Nathan Lynch, Naveen N. Rao, Nicholas Piggin,
Nour-eddine Taleb, Paul Menzel, Ping Fang, Pratik R. Sampat, Randy
Dunlap, Ritesh Harjani, Rohan McLure, Russell Currey, Sachin Sant,
Segher Boessenkool, Shivaprasad G Bhat, Sourabh Jain, Thierry Reding,
Tobias Waldekranz, Tyrel Datwyler, Vaibhav Jain, Vladimir Oltean,
Wedson Almeida Filho, and YueHaibing"
* tag 'powerpc-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (179 commits)
powerpc/pseries: Fix use after free in remove_phb_dynamic()
powerpc/time: improve decrementer clockevent processing
powerpc/time: Fix KVM host re-arming a timer beyond decrementer range
powerpc/tm: Fix more userspace r13 corruption
powerpc/xive: fix return value of __setup handler
powerpc/64: Add UADDR64 relocation support
powerpc: 8xx: fix a return value error in mpc8xx_pic_init
powerpc/ps3: remove unneeded semicolons
powerpc/64: Force inlining of prevent_user_access() and set_kuap()
powerpc/bitops: Force inlining of fls()
powerpc: declare unmodified attribute_group usages const
powerpc/spufs: Fix build warning when CONFIG_PROC_FS=n
powerpc/secvar: fix refcount leak in format_show()
powerpc/64e: Tie PPC_BOOK3E_64 to PPC_FSL_BOOK3E
powerpc: Move C prototypes out of asm-prototypes.h
powerpc/kexec: Declare kexec_paca static
powerpc/smp: Declare current_set static
powerpc: Cleanup asm-prototypes.c
powerpc/ftrace: Use STK_GOT in ftrace_mprofile.S
powerpc/ftrace: Regroup PPC64 specific operations in ftrace_mprofile.S
...
|
|
Pull folio updates from Matthew Wilcox:
- Rewrite how munlock works to massively reduce the contention on
i_mmap_rwsem (Hugh Dickins):
https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/
- Sort out the page refcount mess for ZONE_DEVICE pages (Christoph
Hellwig):
https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/
- Convert GUP to use folios and make pincount available for order-1
pages. (Matthew Wilcox)
- Convert a few more truncation functions to use folios (Matthew
Wilcox)
- Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew
Wilcox)
- Convert rmap_walk to use folios (Matthew Wilcox)
- Convert most of shrink_page_list() to use a folio (Matthew Wilcox)
- Add support for creating large folios in readahead (Matthew Wilcox)
* tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache: (114 commits)
mm/damon: minor cleanup for damon_pa_young
selftests/vm/transhuge-stress: Support file-backed PMD folios
mm/filemap: Support VM_HUGEPAGE for file mappings
mm/readahead: Switch to page_cache_ra_order
mm/readahead: Align file mappings for non-DAX
mm/readahead: Add large folio readahead
mm: Support arbitrary THP sizes
mm: Make large folios depend on THP
mm: Fix READ_ONLY_THP warning
mm/filemap: Allow large folios to be added to the page cache
mm: Turn can_split_huge_page() into can_split_folio()
mm/vmscan: Convert pageout() to take a folio
mm/vmscan: Turn page_check_references() into folio_check_references()
mm/vmscan: Account large folios correctly
mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios
mm/vmscan: Free non-shmem folios without splitting them
mm/rmap: Constify the rmap_walk_control argument
mm/rmap: Convert rmap_walk() to take a folio
mm: Turn page_anon_vma() into folio_anon_vma()
mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read()
...
|
|
Each call into pte_mkhuge() is invariably followed by
arch_make_huge_pte(). Instead arch_make_huge_pte() can accommodate
pte_mkhuge() at the beginning. This updates generic fallback stub for
arch_make_huge_pte() and available platforms definitions. This makes huge
pte creation much cleaner and easier to follow.
Link: https://lkml.kernel.org/r/1643860669-26307-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|