Age | Commit message (Collapse) | Author |
|
Marek reported a BUG resulting from the recent parallel faults changes,
as the hyp stage-1 map walker attempted to allocate table memory while
holding the RCU read lock:
BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:274
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
2 locks held by swapper/0/1:
#0: ffff80000a8a44d0 (kvm_hyp_pgd_mutex){+.+.}-{3:3}, at:
__create_hyp_mappings+0x80/0xc4
#1: ffff80000a927720 (rcu_read_lock){....}-{1:2}, at:
kvm_pgtable_walk+0x0/0x1f4
CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc3+ #5918
Hardware name: Raspberry Pi 3 Model B (DT)
Call trace:
dump_backtrace.part.0+0xe4/0xf0
show_stack+0x18/0x40
dump_stack_lvl+0x8c/0xb8
dump_stack+0x18/0x34
__might_resched+0x178/0x220
__might_sleep+0x48/0xa0
prepare_alloc_pages+0x178/0x1a0
__alloc_pages+0x9c/0x109c
alloc_page_interleave+0x1c/0xc4
alloc_pages+0xec/0x160
get_zeroed_page+0x1c/0x44
kvm_hyp_zalloc_page+0x14/0x20
hyp_map_walker+0xd4/0x134
kvm_pgtable_visitor_cb.isra.0+0x38/0x5c
__kvm_pgtable_walk+0x1a4/0x220
kvm_pgtable_walk+0x104/0x1f4
kvm_pgtable_hyp_map+0x80/0xc4
__create_hyp_mappings+0x9c/0xc4
kvm_mmu_init+0x144/0x1cc
kvm_arch_init+0xe4/0xef4
kvm_init+0x3c/0x3d0
arm_init+0x20/0x30
do_one_initcall+0x74/0x400
kernel_init_freeable+0x2e0/0x350
kernel_init+0x24/0x130
ret_from_fork+0x10/0x20
Since the hyp stage-1 table walkers are serialized by kvm_hyp_pgd_mutex,
RCU protection really doesn't add anything. Don't acquire the RCU read
lock for an exclusive walk.
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221118182222.3932898-3-oliver.upton@linux.dev
|
|
Rather than passing through the state of the KVM_PGTABLE_WALK_SHARED
flag, just take a pointer to the whole walker structure instead. Move
around struct kvm_pgtable and the RCU indirection such that the
associated ifdeffery remains in one place while ensuring the walker +
flags definitions precede their use.
No functional change intended.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221118182222.3932898-2-oliver.upton@linux.dev
|
|
The stage-2 map walker has been made parallel-aware, and as such can be
called while only holding the read side of the MMU lock. Rip out the
conditional locking in user_mem_abort() and instead grab the read lock.
Continue to take the write lock from other callsites to
kvm_pgtable_stage2_map().
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221107220033.1895655-1-oliver.upton@linux.dev
|
|
stage2_map_walker_try_leaf() and friends now handle stage-2 PTEs
generically, and perform the correct flush when a table PTE is removed.
Additionally, they've been made parallel-aware, using an atomic break
to take ownership of the PTE.
Stop clearing the PTE in the pre-order callback and instead let
stage2_map_walker_try_leaf() deal with it.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221107220006.1895572-1-oliver.upton@linux.dev
|
|
Convert stage2_map_walker_try_leaf() to use the new break-before-make
helpers, thereby making the handler parallel-aware. As before, avoid the
break-before-make if recreating the existing mapping. Additionally,
retry execution if another vCPU thread is modifying the same PTE.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221107215934.1895478-1-oliver.upton@linux.dev
|
|
In order to service stage-2 faults in parallel, stage-2 table walkers
must take exclusive ownership of the PTE being worked on. An additional
requirement of the architecture is that software must perform a
'break-before-make' operation when changing the block size used for
mapping memory.
Roll these two concepts together into helpers for performing a
'break-before-make' sequence. Use a special PTE value to indicate a PTE
has been locked by a software walker. Additionally, use an atomic
compare-exchange to 'break' the PTE when the stage-2 page tables are
possibly shared with another software walker. Elide the DSB + TLBI if
the evicted PTE was invalid (and thus not subject to break-before-make).
All of the atomics do nothing for now, as the stage-2 walker isn't fully
ready to perform parallel walks.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221107215855.1895367-1-oliver.upton@linux.dev
|
|
Create a helper to initialize a table and directly call
smp_store_release() to install it (for now). Prepare for a subsequent
change that generalizes PTE writes with a helper.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221107215644.1895162-11-oliver.upton@linux.dev
|
|
The stage2 attr walker is already used for parallel walks. Since commit
f783ef1c0e82 ("KVM: arm64: Add fast path to handle permission relaxation
during dirty logging"), KVM acquires the read lock when
write-unprotecting a PTE. However, the walker only uses a simple store
to update the PTE. This is safe as the only possible race is with
hardware updates to the access flag, which is benign.
However, a subsequent change to KVM will allow more changes to the stage
2 page tables to be done in parallel. Prepare the stage 2 attribute
walker by performing atomic updates to the PTE when walking in parallel.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221107215644.1895162-10-oliver.upton@linux.dev
|
|
Use RCU to safely walk the stage-2 page tables in parallel. Acquire and
release the RCU read lock when traversing the page tables. Defer the
freeing of table memory to an RCU callback. Indirect the calls into RCU
and provide stubs for hypervisor code, as RCU is not available in such a
context.
The RCU protection doesn't amount to much at the moment, as readers are
already protected by the read-write lock (all walkers that free table
memory take the write lock). Nonetheless, a subsequent change will
futher relax the locking requirements around the stage-2 MMU, thereby
depending on RCU.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221107215644.1895162-9-oliver.upton@linux.dev
|
|
The break-before-make sequence is a bit annoying as it opens a window
wherein memory is unmapped from the guest. KVM should replace the PTE
as quickly as possible and avoid unnecessary work in between.
Presently, the stage-2 map walker tears down a removed table before
installing a block mapping when coalescing a table into a block. As the
removed table is no longer visible to hardware walkers after the
DSB+TLBI, it is possible to move the remaining cleanup to happen after
installing the new PTE.
Reshuffle the stage-2 map walker to install the new block entry in
the pre-order callback. Unwire all of the teardown logic and replace
it with a call to kvm_pgtable_stage2_free_removed() after fixing
the PTE. The post-order visitor is now completely unnecessary, so drop
it. Finally, touch up the comments to better represent the now
simplified map walker.
Note that the call to tear down the unlinked stage-2 is indirected
as a subsequent change will use an RCU callback to trigger tear down.
RCU is not available to pKVM, so there is a need to use different
implementations on pKVM and non-pKVM VMs.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221107215644.1895162-8-oliver.upton@linux.dev
|
|
Use an opaque type for pteps and require visitors explicitly dereference
the pointer before using. Protecting page table memory with RCU requires
that KVM dereferences RCU-annotated pointers before using. However, RCU
is not available for use in the nVHE hypervisor and the opaque type can
be conditionally annotated with RCU for the stage-2 MMU.
Call the type a 'pteref' to avoid a naming collision with raw pteps. No
functional change intended.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221107215644.1895162-7-oliver.upton@linux.dev
|
|
A subsequent change to KVM will move the tear down of an unlinked
stage-2 subtree out of the critical path of the break-before-make
sequence.
Introduce a new helper for tearing down unlinked stage-2 subtrees.
Leverage the existing stage-2 free walkers to do so, with a deep call
into __kvm_pgtable_walk() as the subtree is no longer reachable from the
root.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221107215644.1895162-6-oliver.upton@linux.dev
|
|
In order to tear down page tables from outside the context of
kvm_pgtable (such as an RCU callback), stop passing a pointer through
kvm_pgtable_walk_data.
No functional change intended.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221107215644.1895162-5-oliver.upton@linux.dev
|
|
As a prerequisite for getting visitors off of struct kvm_pgtable, pass
mm_ops through the visitor context.
No functional change intended.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221107215644.1895162-4-oliver.upton@linux.dev
|
|
Rather than reading the ptep all over the shop, read the ptep once from
__kvm_pgtable_visit() and stick it in the visitor context. Reread the
ptep after visiting a leaf in case the callback installed a new table
underneath.
No functional change intended.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221107215644.1895162-3-oliver.upton@linux.dev
|
|
Passing new arguments by value to the visitor callbacks is extremely
inflexible for stuffing new parameters used by only some of the
visitors. Use a context structure instead and pass the pointer through
to the visitor callback.
While at it, redefine the 'flags' parameter to the visitor to contain
the bit indicating the phase of the walk. Pass the entire set of flags
through the context structure such that the walker can communicate
additional state to the visitor callback.
No functional change intended.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221107215644.1895162-2-oliver.upton@linux.dev
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev
Pull fbdev fixes from Helge Deller:
"A use-after-free bugfix in the smscufx driver and various minor error
path fixes, smaller build fixes, sysfs fixes and typos in comments in
the stifb, sisfb, da8xxfb, xilinxfb, sm501fb, gbefb and cyber2000fb
drivers"
* tag 'fbdev-for-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: cyber2000fb: fix missing pci_disable_device()
fbdev: sisfb: use explicitly signed char
fbdev: smscufx: Fix several use-after-free bugs
fbdev: xilinxfb: Make xilinxfb_release() return void
fbdev: sisfb: fix repeated word in comment
fbdev: gbefb: Convert sysfs snprintf to sysfs_emit
fbdev: sm501fb: Convert sysfs snprintf to sysfs_emit
fbdev: stifb: Fall back to cfb_fillrect() on 32-bit HCRX cards
fbdev: da8xx-fb: Fix error handling in .remove()
fbdev: MIPS supports iomem addresses
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg KH:
"Some small driver fixes for 6.1-rc3. They include:
- iio driver bugfixes
- counter driver bugfixes
- coresight bugfixes, including a revert and then a second fix to get
it right.
All of these have been in linux-next with no reported problems"
* tag 'char-misc-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (21 commits)
misc: sgi-gru: use explicitly signed char
coresight: cti: Fix hang in cti_disable_hw()
Revert "coresight: cti: Fix hang in cti_disable_hw()"
counter: 104-quad-8: Fix race getting function mode and direction
counter: microchip-tcb-capture: Handle Signal1 read and Synapse
coresight: cti: Fix hang in cti_disable_hw()
coresight: Fix possible deadlock with lock dependency
counter: ti-ecap-capture: fix IS_ERR() vs NULL check
counter: Reduce DEFINE_COUNTER_ARRAY_POLARITY() to defining counter_array
iio: bmc150-accel-core: Fix unsafe buffer attributes
iio: adxl367: Fix unsafe buffer attributes
iio: adxl372: Fix unsafe buffer attributes
iio: at91-sama5d2_adc: Fix unsafe buffer attributes
iio: temperature: ltc2983: allocate iio channels once
tools: iio: iio_utils: fix digit calculation
iio: adc: stm32-adc: fix channel sampling time init
iio: adc: mcp3911: mask out device ID in debug prints
iio: adc: mcp3911: use correct id bits
iio: adc: mcp3911: return proper error code on failure to allocate trigger
iio: adc: mcp3911: fix sizeof() vs ARRAY_SIZE() bug
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"A few small USB fixes for 6.1-rc3. Include in here are:
- MAINTAINERS update, including a big one for the USB gadget
subsystem. Many thanks to Felipe for all of the years of hard work
he has done on this codebase, it was greatly appreciated.
- dwc3 driver fixes for reported problems.
- xhci driver fixes for reported problems.
- typec driver fixes for minor issues
- uvc gadget driver change, and then revert as it wasn't relevant for
6.1-final, as it is a new feature and people are still reviewing
and modifying it.
All of these have been in the linux-next tree with no reported issues"
* tag 'usb-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: dwc3: gadget: Don't set IMI for no_interrupt
usb: dwc3: gadget: Stop processing more requests on IMI
Revert "usb: gadget: uvc: limit isoc_sg to super speed gadgets"
xhci: Remove device endpoints from bandwidth list when freeing the device
xhci-pci: Set runtime PM as default policy on all xHC 1.2 or later devices
xhci: Add quirk to reset host back to default state at shutdown
usb: xhci: add XHCI_SPURIOUS_SUCCESS to ASM1042 despite being a V0.96 controller
usb: dwc3: st: Rely on child's compatible instead of name
usb: gadget: uvc: limit isoc_sg to super speed gadgets
usb: bdc: change state when port disconnected
usb: typec: ucsi: acpi: Implement resume callback
usb: typec: ucsi: Check the connection on resume
usb: gadget: aspeed: Fix probe regression
usb: gadget: uvc: fix sg handling during video encode
usb: gadget: uvc: fix sg handling in error case
usb: gadget: uvc: fix dropped frame after missed isoc
usb: dwc3: gadget: Don't delay End Transfer on delayed_status
usb: dwc3: Don't switch OTG -> peripheral if extcon is present
MAINTAINERS: Update maintainers for broadcom USB
MAINTAINERS: move USB gadget and phy entries under the main USB entry
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- convert gpio-tegra to using an immutable irqchip
- MAINTAINERS update
* tag 'gpio-fixes-for-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
MAINTAINERS: Change myself to a maintainer
gpio: tegra: Convert to immutable irq chip
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Rename a perf memory level event define to denote it is of CXL type
- Add Alder and Raptor Lakes support to RAPL
- Make sure raw sample data is output with tracepoints
* tag 'perf_urgent_for_v6.1_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to PERF_MEM_LVLNUM_CXL
perf/x86/rapl: Add support for Intel Raptor Lake
perf/x86/rapl: Add support for Intel AlderLake-N
perf: Fix missing raw data on tracepoint events
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Remove unused kernel stack padding, fix some build errors/warnings and
two bugs in laptop platform driver"
* tag 'loongarch-fixes-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
platform/loongarch: laptop: Fix possible UAF and simplify generic_acpi_laptop_init()
platform/loongarch: laptop: Adjust resume order for loongson_hotkey_resume()
LoongArch: BPF: Avoid declare variables in switch-case
LoongArch: Use flexible-array member instead of zero-length array
LoongArch: Remove unused kernel stack padding
|
|
Pull cifs fixes from Steve French:
- use after free fix for reconnect race
- two memory leak fixes
* tag '6.1-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix use-after-free caused by invalid pointer `hostname`
cifs: Fix pages leak when writedata alloc failed in cifs_write_from_iter()
cifs: Fix pages array leak when writedata alloc failed in cifs_writedata_alloc()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator fix from Jason Donenfeld:
"One fix from Jean-Philippe Brucker, addressing a regression in which
early boot code on ARM64 would use the non-_early variant of the
arch_get_random family of functions, resulting in the architectural
random number generator appearing unavailable during that early phase
of boot.
The fix simply changes arch_get_random*() to arch_get_random*_early().
This distinction between these two functions is a bit of an old wart
I'm not a fan of, and for 6.2 I'll see if I can make obsolete the
_early variant, so that one function does the right thing in all
contexts without overhead"
* tag 'random-6.1-rc3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
random: use arch_get_random*_early() in random_init()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Varions small fixes, all in drivers.
Some of these arrived during the merge window and got held over to
make sure of testing on the -rc tree.
The biggest change is for standards conformance in the target driver,
closely followed by a set of bug fixes in megaraid_sas"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (21 commits)
scsi: ufs: core: Fix typo in comment
scsi: mpi3mr: Select CONFIG_SCSI_SAS_ATTRS
scsi: ufs: core: Fix typo for register name in comments
scsi: pm80xx: Display proc_name in sysfs
scsi: ufs: core: Fix the error log in ufshcd_query_flag_retry()
scsi: ufs: core: Remove unneeded casts from void *
scsi: lpfc: Fix spelling mistake "unsolicted" -> "unsolicited"
scsi: qla2xxx: Use transport-defined speed mask for supported_speeds
scsi: target: iblock: Fold iblock_emulate_read_cap_with_block_size() into iblock_get_blocks()
scsi: qla2xxx: Fix serialization of DCBX TLV data request
scsi: ufs: qcom: Remove redundant dev_err() call
scsi: megaraid_sas: Move megasas_dbg_lvl init to megasas_init()
scsi: megaraid_sas: Remove unnecessary memset()
scsi: megaraid_sas: Simplify megasas_update_device_list
scsi: megaraid_sas: Correct an error message
scsi: megaraid_sas: Correct value passed to scsi_device_lookup()
scsi: target: core: UA on all LUNs after reset
scsi: target: core: New key must be used for moved PR
scsi: target: core: Abort all preempted regs if requested
scsi: target: core: Fix memory leak in preempt_and_abort
...
|
|
Pull block fixes from Jens Axboe:
- NVMe pull request via Christoph:
- make the multipath dma alignment match the non-multipath one
(Keith Busch)
- fix a bogus use of sg_init_marker() (Nam Cao)
- fix circulr locking in nvme-tcp (Sagi Grimberg)
- Initialization fix for requests allocated via the special hw queue
allocator (John)
- Fix for a regression added in this release with the batched
completions of end_io backed requests (Ming)
- Error handling leak fix for rbd (Yang)
- Error handling leak fix for add_disk() failure (Yu)
* tag 'block-6.1-2022-10-28' of git://git.kernel.dk/linux:
blk-mq: Properly init requests from blk_mq_alloc_request_hctx()
blk-mq: don't add non-pt request with ->end_io to batch
rbd: fix possible memory leak in rbd_sysfs_init()
nvme-multipath: set queue dma alignment to 3
nvme-tcp: fix possible circular locking when deleting a controller under memory pressure
nvme-tcp: replace sg_init_marker() with sg_init_table()
block: fix memory leak for elevator on add_disk failure
|
|
Pull io_uring fix from Jens Axboe:
"Just a fix for a locking regression introduced with the deferred
task_work running from this merge window"
* tag 'io_uring-6.1-2022-10-28' of git://git.kernel.dk/linux:
io_uring: unlock if __io_run_local_work locked inside
io_uring: use io_run_local_work_locked helper
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"Eight fix pre-6.0 bugs and the remainder address issues which were
introduced in the 6.1-rc merge cycle, or address issues which aren't
considered sufficiently serious to warrant a -stable backport"
* tag 'mm-hotfixes-stable-2022-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits)
mm: multi-gen LRU: move lru_gen_add_mm() out of IRQ-off region
lib: maple_tree: remove unneeded initialization in mtree_range_walk()
mmap: fix remap_file_pages() regression
mm/shmem: ensure proper fallback if page faults
mm/userfaultfd: replace kmap/kmap_atomic() with kmap_local_page()
x86: fortify: kmsan: fix KMSAN fortify builds
x86: asm: make sure __put_user_size() evaluates pointer once
Kconfig.debug: disable CONFIG_FRAME_WARN for KMSAN by default
x86/purgatory: disable KMSAN instrumentation
mm: kmsan: export kmsan_copy_page_meta()
mm: migrate: fix return value if all subpages of THPs are migrated successfully
mm/uffd: fix vma check on userfault for wp
mm: prep_compound_tail() clear page->private
mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs
mm/page_isolation: fix clang deadcode warning
fs/ext4/super.c: remove unused `deprecated_msg'
ipc/msg.c: fix percpu_counter use after free
memory tier, sysfs: rename attribute "nodes" to "nodelist"
MAINTAINERS: git://github.com -> https://github.com for nilfs2
mm/kmemleak: prevent soft lockup in kmemleak_scan()'s object iteration loops
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix a case of rescheduling with user access unlocked, when preempt is
enabled.
- A follow-up fix for a recent fix, which could lead to IRQ state
assertions firing incorrectly.
- Two fixes for lockdep warnings seen when using kfence with the Hash
MMU.
- Two fixes for preempt warnings seen when using the Hash MMU.
- Two fixes for the VAS coprocessor mechanism used on pseries.
- Prevent building some of our older KVM backends when
CONTEXT_TRACKING_USER is enabled, as it's known to cause crashes.
- A couple of fixes for issues seen with PMU NMIs.
Thanks to Nicholas Piggin, Guenter Roeck, Frederic Barrat Haren Myneni,
Sachin Sant, and Samuel Holland.
* tag 'powerpc-6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s/interrupt: Fix clear of PACA_IRQS_HARD_DIS when returning to soft-masked context
powerpc/64s/interrupt: Perf NMI should not take normal exit path
powerpc/64/interrupt: Prevent NMI PMI causing a dangerous warning
KVM: PPC: BookS PR-KVM and BookE do not support context tracking
powerpc: Fix reschedule bug in KUAP-unlocked user copy
powerpc/64s: Fix hash__change_memory_range preemption warning
powerpc/64s: Disable preemption in hash lazy mmu mode
powerpc/64s: make linear_map_hash_lock a raw spinlock
powerpc/64s: make HPTE lock and native_tlbie_lock irq-safe
powerpc/64s: Add lockdep for HPTE lock
powerpc/pseries: Use lparcfg to reconfig VAS windows for DLPAR CPU
powerpc/pseries/vas: Add VAS IRQ primary handler
|
|
generic_acpi_laptop_init()
Currently the return value of 'sub_driver->init' is not checked. If
sparse_keymap_setup() called in the init function fails, 'generic_
inputdev' is freed, then it will lead a UAF when using it in generic_
acpi_laptop_init(). Fix it by checking the return value and setting
generic_inputdev to NULL after free, so as to avoid double free it.
The error code in generic_subdriver_init() is always negative, so the
return of generic_subdriver_init() can be simplified.
Fixes: 6246ed09111f ("LoongArch: Add ACPI-based generic laptop driver")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Some laptops don't support SW_LID, but still have backlight control,
move backlight resuming before SW_LID event handling so as to avoid
backlight mistake due to early return.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Not all compilers support declare variables in switch-case, so move
declarations to the beginning of a function. Otherwise we may get such
build errors:
arch/loongarch/net/bpf_jit.c: In function ‘emit_atomic’:
arch/loongarch/net/bpf_jit.c:362:3: error: a label can only be part of a statement and a declaration is not a statement
u8 r0 = regmap[BPF_REG_0];
^~
arch/loongarch/net/bpf_jit.c: In function ‘build_insn’:
arch/loongarch/net/bpf_jit.c:727:3: error: a label can only be part of a statement and a declaration is not a statement
u8 t7 = -1;
^~
arch/loongarch/net/bpf_jit.c:778:3: error: a label can only be part of a statement and a declaration is not a statement
int ret;
^~~
arch/loongarch/net/bpf_jit.c:779:3: error: expected expression before ‘u64’
u64 func_addr;
^~~
arch/loongarch/net/bpf_jit.c:780:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
bool func_addr_fixed;
^~~~
arch/loongarch/net/bpf_jit.c:784:11: error: ‘func_addr’ undeclared (first use in this function); did you mean ‘in_addr’?
&func_addr, &func_addr_fixed);
^~~~~~~~~
in_addr
arch/loongarch/net/bpf_jit.c:784:11: note: each undeclared identifier is reported only once for each function it appears in
arch/loongarch/net/bpf_jit.c:814:3: error: a label can only be part of a statement and a declaration is not a statement
u64 imm64 = (u64)(insn + 1)->imm << 32 | (u32)insn->imm;
^~~
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Eliminate the following coccicheck warning:
./arch/loongarch/include/asm/ptrace.h:32:15-21: WARNING use flexible-array member instead
Reviewed-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Yushan Zhou <katrinzhou@tencent.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
The current LoongArch kernel stack is padded as if obeying the MIPS o32
calling convention (32 bytes), signifying the port's MIPS lineage but no
longer making sense. Remove the padding for clarity.
Reviewed-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Remove outdated linux390 link from MAINTAINERS
- Add few missing EX_TABLE entries to inline assemblies
- Fix raw data collection for pai_ext PMU
- Add kernel image secure boot trailer for future firmware versions
- Fix out-of-bounds access on cio_ignore free
- Fix memory allocation of mdev_types array in vfio-ap
* tag 's390-6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/vfio-ap: Fix memory allocation for mdev_types array
s390/cio: fix out-of-bounds access on cio_ignore free
s390/pai: fix raw data collection for PMU pai_ext
s390/boot: add secure boot trailer
s390/pci: add missing EX_TABLE entries to __pcistg_mio_inuser()/__pcilg_mio_inuser()
s390/futex: add missing EX_TABLE entry to __futex_atomic_op()
s390/uaccess: add missing EX_TABLE entries to __clear_user()
MAINTAINERS: remove outdated linux390 link
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for a build warning in the jump_label code
- One of the git://github -> https://github cleanups, for the SiFive
drivers
- A fix for the kasan initialization code, this still likely warrants
some cleanups but that's a bigger problem and at least this fixes the
crashes in the short term
- A pair of fixes for extension support detection on mixed LLVM/GNU
toolchains
- A fix for a runtime warning in the /proc/cpuinfo code
* tag 'riscv-for-linus-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: Fix /proc/cpuinfo cpumask warning
riscv: fix detection of toolchain Zihintpause support
riscv: fix detection of toolchain Zicbom support
riscv: mm: add missing memcpy in kasan_init
MAINTAINERS: git://github.com -> https://github.com for sifive
riscv: jump_label: mark arguments as const to satisfy asm constraints
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and device properties fixes from Rafael Wysocki:
"These fix device properties documentation and the ACPI PCC code, add a
new IRQ override quirk for resource handling and add one more item to
the list of device IDs to be ignored when returned by _DEP.
Specifics:
- Fix the documentation of the *_match_string() family of functions
to properly cover the return value (Andy Shevchenko)
- Fix a possible integer overflow during multiplication in the ACPI
PCC code (Manank Patel)
- Make the ACPI device resources code skip IRQ override on Asus
Vivobook S5602ZA (Tamim Khan)
- Add LATT2021 to the list of device IDs that are ignored when
returned by _DEP, because there are no drivers for them in the
kernel and no plans to add such drivers (Hans de Goede)"
* tag 'acpi-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: scan: Add LATT2021 to acpi_ignore_dep_ids[]
ACPI: resource: Skip IRQ override on Asus Vivobook S5602ZA
ACPI: PCC: Fix unintentional integer overflow
device property: Fix documentation for *_match_string() APIs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These make the intel_pstate driver work as expected on all hybrid
platforms to date (regardless of possible platform firmware issues),
fix hybrid sleep on systems using suspend-to-idle by default, make the
generic power domains code handle disabled idle states properly and
update pm-graph.
Specifics:
- Make intel_pstate use what is known about the hardware instead of
relying on information from the platform firmware (ACPI CPPC in
particular) to establish the relationship between the HWP CPU
performance levels and frequencies on all hybrid platforms
available to date (Rafael Wysocki)
- Allow hybrid sleep to use suspend-to-idle as a system suspend
method if it is the current suspend method of choice (Mario
Limonciello)
- Fix handling of unavailable/disabled idle states in the generic
power domains code (Sudeep Holla)
- Update the pm-graph suite of utilities to version 5.10 which is
fixes-mostly and does not add any new features (Todd Brandt)"
* tag 'pm-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: domains: Fix handling of unavailable/disabled idle states
pm-graph v5.10
cpufreq: intel_pstate: hybrid: Use known scaling factor for P-cores
cpufreq: intel_pstate: Read all MSRs on the target CPU
PM: hibernate: Allow hybrid sleep to work with s2idle
|
|
While reworking the archrandom handling, commit d349ab99eec7 ("random:
handle archrandom with multiple longs") switched to the non-early
archrandom helpers in random_init(), which broke initialization of the
entropy pool from the arm64 random generator.
Indeed at that point the arm64 CPU features, which verify that all CPUs
have compatible capabilities, are not finalized so arch_get_random_seed_longs()
is unsuccessful. Instead random_init() should use the _early functions,
which check only the boot CPU on arm64. On other architectures the
_early functions directly call the normal ones.
Fixes: d349ab99eec7 ("random: handle archrandom with multiple longs")
Cc: stable@vger.kernel.org
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
lru_gen_add_mm() has been added within an IRQ-off region in the commit
mentioned below. The other invocations of lru_gen_add_mm() are not within
an IRQ-off region.
The invocation within IRQ-off region is problematic on PREEMPT_RT because
the function is using a spin_lock_t which must not be used within
IRQ-disabled regions.
The other invocations of lru_gen_add_mm() occur while
task_struct::alloc_lock is acquired. Move lru_gen_add_mm() after
interrupts are enabled and before task_unlock().
Link: https://lkml.kernel.org/r/20221026134830.711887-1-bigeasy@linutronix.de
Fixes: bd74fdaea1460 ("mm: multi-gen LRU: support page table walks")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Yu Zhao <yuzhao@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Before the do-while loop in mtree_range_walk(), the variables next, min,
max need to be initialized. The variables last, prev_min and prev_max are
set within the loop body before they are eventually used after exiting the
loop body.
As it is a do-while loop, the loop body is executed at least once, so the
variables last, prev_min and prev_max do not need to be initialized before
the loop body.
Remove unneeded initialization of last and prev_min.
The needless initialization was reported by clang-analyzer as Dead Stores.
As the compiler already identifies these assignments as unneeded, it
optimizes the assignments away. Hence:
No functional change. No change in object code.
Link: https://lkml.kernel.org/r/20221026120029.12555-2-lukas.bulwahn@gmail.com
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When using the VMA iterator, the final execution will set the variable
'next' to NULL which causes the function to fail out. Restore the break
in the loop to exit the VMA iterator early without clearing NULL fixes the
issue.
Link: https://lore.kernel.org/lkml/29344.1666681759@jrobl/
Link: https://lkml.kernel.org/r/20221025161222.2634030-1-Liam.Howlett@oracle.com
Fixes: 763ecb035029 (mm: remove the vma linked list)
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: "J. R. Okajima" <hooanon05g@gmail.com>
Tested-by: "J. R. Okajima" <hooanon05g@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The kernel test robot flagged a recursive lock as a result of a conversion
from kmap_atomic() to kmap_local_folio()[Link]
The cause was due to the code depending on the kmap_atomic() side effect
of disabling page faults. In that case the code expects the fault to fail
and take the fallback case.
git archaeology implied that the recursion may not be an actual bug.[1]
However, depending on the implementation of the mmap_lock and the
condition of the call there may still be a deadlock.[2] So this is not
purely a lockdep issue. Considering a single threaded call stack there
are 3 options.
1) Different mm's are in play (no issue)
2) Readlock implementation is recursive and same mm is in play
(no issue)
3) Readlock implementation is _not_ recursive (issue)
The mmap_lock is recursive so with a single thread there is no issue.
However, Matthew pointed out a deadlock scenario when you consider
additional process' and threads thusly.
"The readlock implementation is only recursive if nobody else has taken a
write lock. If you have a multithreaded process, one of the other threads
can call mmap() and that will prevent recursion (due to fairness). Even
if it's a different process that you're trying to acquire the mmap read
lock on, you can still get into a deadly embrace. eg:
process A thread 1 takes read lock on own mmap_lock
process A thread 2 calls mmap, blocks taking write lock
process B thread 1 takes page fault, read lock on own mmap lock
process B thread 2 calls mmap, blocks taking write lock
process A thread 1 blocks taking read lock on process B
process B thread 1 blocks taking read lock on process A
Now all four threads are blocked waiting for each other."
Regardless using pagefault_disable() ensures that no matter what locking
implementation is used a deadlock will not occur. Add an explicit
pagefault_disable() and a big comment to explain this for future souls
looking at this code.
[1] https://lore.kernel.org/all/Y1MymJ%2FINb45AdaY@iweiny-desk3/
[2] https://lore.kernel.org/lkml/Y1bXBtGTCym77%2FoD@casper.infradead.org/
Link: https://lkml.kernel.org/r/20221025220108.2366043-1-ira.weiny@intel.com
Link: https://lore.kernel.org/r/202210211215.9dc6efb5-yujie.liu@intel.com
Fixes: 7a7256d5f512 ("shmem: convert shmem_mfill_atomic_pte() to use a folio")
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: kernel test robot <yujie.liu@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
kmap() and kmap_atomic() are being deprecated in favor of
kmap_local_page() which is appropriate for any thread local context.[1]
A recent locking bug report with userfaultfd showed that the conversion of
the kmap_atomic()'s in those code flows requires care with regard to the
prevention of deadlock.[2]
git archaeology implied that the recursion may not be an actual bug.[3]
However, depending on the implementation of the mmap_lock and the
condition of the call there may still be a deadlock.[4] So this is not
purely a lockdep issue. Considering a single threaded call stack there
are 3 options.
1) Different mm's are in play (no issue)
2) Readlock implementation is recursive and same mm is in play
(no issue)
3) Readlock implementation is _not_ recursive (issue)
The mmap_lock is recursive so with a single thread there is no issue.
However, Matthew pointed out a deadlock scenario when you consider
additional process' and threads thusly.
"The readlock implementation is only recursive if nobody else has taken a
write lock. If you have a multithreaded process, one of the other threads
can call mmap() and that will prevent recursion (due to fairness). Even
if it's a different process that you're trying to acquire the mmap read
lock on, you can still get into a deadly embrace. eg:
process A thread 1 takes read lock on own mmap_lock
process A thread 2 calls mmap, blocks taking write lock
process B thread 1 takes page fault, read lock on own mmap lock
process B thread 2 calls mmap, blocks taking write lock
process A thread 1 blocks taking read lock on process B
process B thread 1 blocks taking read lock on process A
Now all four threads are blocked waiting for each other."
Regardless using pagefault_disable() ensures that no matter what locking
implementation is used a deadlock will not occur.
Complete kmap conversion in userfaultfd by replacing the kmap() and
kmap_atomic() calls with kmap_local_page(). When replacing the
kmap_atomic() call ensure page faults continue to be disabled to support
the correct fall back behavior and add a comment to inform future souls of
the requirement.
[1] https://lore.kernel.org/all/20220813220034.806698-1-ira.weiny@intel.com/
[2] https://lore.kernel.org/all/Y1Mh2S7fUGQ%2FiKFR@iweiny-desk3/
[3] https://lore.kernel.org/all/Y1MymJ%2FINb45AdaY@iweiny-desk3/
[4] https://lore.kernel.org/lkml/Y1bXBtGTCym77%2FoD@casper.infradead.org/
[ira.weiny@intel.com: v2]
Link: https://lkml.kernel.org/r/20221025220136.2366143-1-ira.weiny@intel.com
Link: https://lkml.kernel.org/r/20221024043452.1491677-1-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Ensure that KMSAN builds replace memset/memcpy/memmove calls with the
respective __msan_XXX functions, and that none of the macros are redefined
twice. This should allow building kernel with both CONFIG_KMSAN and
CONFIG_FORTIFY_SOURCE.
Link: https://lkml.kernel.org/r/20221024212144.2852069-5-glider@google.com
Link: https://github.com/google/kmsan/issues/89
Signed-off-by: Alexander Potapenko <glider@google.com>
Reported-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
User access macros must ensure their arguments are evaluated only once if
they are used more than once in the macro body. Adding
instrument_put_user() to __put_user_size() resulted in double evaluation
of the `ptr` argument, which led to correctness issues when performing
e.g. unsafe_put_user(..., p++, ...).
To fix those issues, evaluate the `ptr` argument of __put_user_size() at
the beginning of the macro.
Link: https://lkml.kernel.org/r/20221024212144.2852069-4-glider@google.com
Fixes: 888f84a6da4d ("x86: asm: instrument usercopy in get_user() and put_user()")
Signed-off-by: Alexander Potapenko <glider@google.com>
Reported-by: youling257 <youling257@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
KMSAN adds a lot of instrumentation to the code, which results in
increased stack usage (up to 2048 bytes and more in some cases). It's
hard to predict how big the stack frames can be, so we disable the
warnings for KMSAN instead.
Link: https://lkml.kernel.org/r/20221024212144.2852069-3-glider@google.com
Link: https://github.com/google/kmsan/issues/89
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The stand-alone purgatory.ro does not contain the KMSAN runtime, therefore
it can't be built with KMSAN compiler instrumentation.
Link: https://lkml.kernel.org/r/20221024212144.2852069-2-glider@google.com
Link: https://github.com/google/kmsan/issues/89
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Certain modules call copy_user_highpage(), which calls
kmsan_copy_page_meta() under KMSAN, so we need to export the latter.
Link: https://lkml.kernel.org/r/20221024212144.2852069-1-glider@google.com
Link: https://github.com/google/kmsan/issues/89
Fixes: b073d7f8aee4 ("mm: kmsan: maintain KMSAN metadata for page operations")
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|