summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-06-28Merge tag 'mmc-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmcLinus Torvalds
Pull MMC and MEMSTICK updates from Ulf Hansson: "MMC core: - Add support for Cache Ctrl for SD cards - Add support for Power Off Notification for SD cards - Add support for read/write of the SD function extension registers - Allow broken eMMC HS400 mode to be disabled via DT - Allow UHS-I voltage switch for SDSC cards if supported - Disable command queueing in the ioctl path - Enable eMMC sleep commands to use HW busy polling to minimize delay - Extend re-use of the common polling loop to standardize behaviour - Take into account MMC_CAP_NEED_RSP_BUSY for eMMC HPI commands MMC host: - jz4740: Add support for the JZ4775 variant - sdhci-acpi: Disable write protect detection on Toshiba Encore 2 WT8-B - sdhci-esdhc-imx: Advertise HS400 support through MMC caps - sdhci-esdhc-imx: Enable support for system wakeup for SDIO - sdhci-iproc: Add support for the legacy sdhci controller on the BCM7211 - vub3000: Fix control-request direction MEMSTICK: - A couple of fixes/cleanups" * tag 'mmc-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: (54 commits) mmc: sdhci-iproc: Add support for the legacy sdhci controller on the BCM7211 dt-bindings: mmc: sdhci-iproc: Add brcm,bcm7211a0-sdhci mmc: JZ4740: Add support for JZ4775 dt-bindings: mmc: JZ4740: Add bindings for JZ4775 mmc: sdhci-esdhc-imx: Enable support for system wakeup for SDIO mmc: Improve function name when aborting a tuning cmd mmc: sdhci-of-aspeed: Turn down a phase correction warning mmc: debugfs: add description for module parameter mmc: via-sdmmc: add a check against NULL pointer dereference mmc: sdhci-sprd: use sdhci_sprd_writew mmc: sdhci-esdhc-imx: remove unused is_imx6q_usdhc mmc: core: Allow UHS-I voltage switch for SDSC cards if supported mmc: mmc_spi: Imply container_of() to be no-op mmc: mmc_spi: Drop duplicate 'mmc_spi' in the debug messages mmc: dw_mmc-pltfm: Remove unused <linux/clk.h> mmc: sdhci-of-aspeed: Configure the SDHCIs as specified by the devicetree. mmc: core: Add a missing SPDX license header mmc: vub3000: fix control-request direction mmc: sdhci-omap: Use pm_runtime_resume_and_get() to replace open coding mmc: sdhci_am654: Use pm_runtime_resume_and_get() to replace open coding ...
2021-06-28Merge tag 'for-5.14/libata-2021-06-27' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull libata updates from Jens Axboe: "The big change in this round is that we're finally in a position where we can sanely remove the old drivers/ide/ code, as libata covers everything we need by now. This is exciting for two reasons: 1) we delete a lot of legacy code that doesn't really meet the standards we have today, and 2) it enables us to clean up various bits in the block layer that exist only because of the old IDE code. Outside of that, just a few minor fixes here, fixups for warnings, etc" * tag 'for-5.14/libata-2021-06-27' of git://git.kernel.dk/linux-block: (29 commits) ata: rb532_cf: remove redundant codes ide: remove the legacy ide driver m68k: use libata instead of the legacy ide driver ARM: disable CONFIG_IDE in pxa_defconfig ARM: disable CONFIG_IDE in footbridge_defconfig alpha: use libata instead of the legacy ide driver pata_cypress: add a module option to disable BM-DMA ata: pata_macio: Avoid overwriting initialised field in 'pata_macio_sht' ata: pata_serverworks: Avoid overwriting initialised field in 'serverworks_osb4_sht ata: pata_sc1200: sc1200_sht'Avoid overwriting initialised field in ' ata: pata_cs5530: Avoid overwriting initialised field in 'cs5530_sht' ata: pata_cs5520: Avoid overwriting initialised field in 'cs5520_sht' ata: pata_atiixp: Avoid overwriting initialised field in 'atiixp_sht' ata: sata_nv: Do not over-write initialise fields in 'nv_adma_sht' and 'nv_swncq_sht' ata: sata_mv: Do not over-write initialise fields in 'mv6_sht' ata: sata_sil24: Do not over-write initialise fields in 'sil24_sht' ata: ahci: Ensure initialised fields are not overwritten in AHCI_SHT() ata: include: libata: Move fields commonly over-written to separate MACRO ahci: Add support for Dell S140 and later controllers ata: ahci_sunxi: Disable DIPM ...
2021-06-28mm/page_alloc: Correct return value of populated elements if bulk array is ↵Mel Gorman
populated Dave Jones reported the following This made it into 5.13 final, and completely breaks NFSD for me (Serving tcp v3 mounts). Existing mounts on clients hang, as do new mounts from new clients. Rebooting the server back to rc7 everything recovers. The commit b3b64ebd3822 ("mm/page_alloc: do bulk array bounds check after checking populated elements") returns the wrong value if the array is already populated which is interpreted as an allocation failure. Dave reported this fixes his problem and it also passed a test running dbench over NFS. Fixes: b3b64ebd3822 ("mm/page_alloc: do bulk array bounds check after checking populated elements") Reported-and-tested-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Cc: <stable@vger.kernel.org> [5.13+] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-28media: s5p-mfc: Fix display delay control creationMarek Szyprowski
v4l2_ctrl_new_std() fails if the caller provides no 'step' parameter for integer control, so define it to fix following error: s5p_mfc_dec_ctrls_setup:1166: Adding control (1) failed Fixes: c3042bff918a ("media: s5p-mfc: Use display delay and display enable std controls") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-06-28media: mtk-vpu: on suspend, read/write regs only if vpu is runningDafna Hirschfeld
If the vpu is not running, we should not rely on VPU_IDLE_REG value. In this case, the suspend cb should only unprepare the clock. This fixes a system-wide suspend to ram failure: [ 273.073363] PM: suspend entry (deep) [ 273.410502] mtk-msdc 11230000.mmc: phase: [map:ffffffff] [maxlen:32] [final:10] [ 273.455926] Filesystems sync: 0.378 seconds [ 273.589707] Freezing user space processes ... (elapsed 0.003 seconds) done. [ 273.600104] OOM killer disabled. [ 273.603409] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 273.613361] mwifiex_sdio mmc2:0001:1: None of the WOWLAN triggers enabled [ 274.784952] mtk_vpu 10020000.vpu: vpu idle timeout [ 274.789764] PM: dpm_run_callback(): platform_pm_suspend+0x0/0x70 returns -5 [ 274.796740] mtk_vpu 10020000.vpu: PM: failed to suspend: error -5 [ 274.802842] PM: Some devices failed to suspend, or early wake event detected [ 275.426489] OOM killer enabled. [ 275.429718] Restarting tasks ... [ 275.435765] done. [ 275.447510] PM: suspend exit Fixes: 1f565e263c3e ("media: mtk-vpu: VPU should be in idle state before system is suspended") Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-06-28media: video-mux: Skip dangling endpointsPhilipp Zabel
i.MX6 device tree include files contain dangling endpoints for the board device tree writers' convenience. These are still included in many existing device trees. Treat dangling endpoints as non-existent to support them. Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Fixes: 612b385efb1e ("media: video-mux: Create media links in bound notifier") Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-06-28Merge tag 'irqchip-5.14' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates from Marc Zyngier: - Revamped the irqdomain internals to consistently cache irqdata - Expose a new API to simplify IRQ handling involving an irqdomain by not using the IRQ number - Convert all the irqchip drivers to this new API - Allow the Qualcomm PDC driver to be compiled as a module - Fix HiSi MBIGEN compile warning when CONFIG_ACPI isn't selected - Remove a bunch of spurious printks on error paths - The obligatory couple of DT updates
2021-06-27Linux 5.13Linus Torvalds
2021-06-28MAINTAINERS: erofs: update my email addressChao Yu
Old email address will be invalid after a few days, update it to kernel.org one. Link: https://lore.kernel.org/r/20210627133229.8025-1-chao@kernel.org Signed-off-by: Chao Yu <chao@kernel.org> Acked-by: Gao Xiang <xiang@kernel.org> Signed-off-by: Gao Xiang <xiang@kernel.org>
2021-06-27Revert "signal: Allow tasks to cache one sigqueue struct"Linus Torvalds
This reverts commits 4bad58ebc8bc4f20d89cff95417c9b4674769709 (and 399f8dd9a866e107639eabd3c1979cd526ca3a98, which tried to fix it). I do not believe these are correct, and I'm about to release 5.13, so am reverting them out of an abundance of caution. The locking is odd, and appears broken. On the allocation side (in __sigqueue_alloc()), the locking is somewhat straightforward: it depends on sighand->siglock. Since one caller doesn't hold that lock, it further then tests 'sigqueue_flags' to avoid the case with no locks held. On the freeing side (in sigqueue_cache_or_free()), there is no locking at all, and the logic instead depends on 'current' being a single thread, and not able to race with itself. To make things more exciting, there's also the data race between freeing a signal and allocating one, which is handled by using WRITE_ONCE() and READ_ONCE(), and being mutually exclusive wrt the initial state (ie freeing will only free if the old state was NULL, while allocating will obviously only use the value if it was non-NULL, so only one or the other will actually act on the value). However, while the free->alloc paths do seem mutually exclusive thanks to just the data value dependency, it's not clear what the memory ordering constraints are on it. Could writes from the previous allocation possibly be delayed and seen by the new allocation later, causing logical inconsistencies? So it's all very exciting and unusual. And in particular, it seems that the freeing side is incorrect in depending on "current" being single-threaded. Yes, 'current' is a single thread, but in the presense of asynchronous events even a single thread can have data races. And such asynchronous events can and do happen, with interrupts causing signals to be flushed and thus free'd (for example - sending a SIGCONT/SIGSTOP can happen from interrupt context, and can flush previously queued process control signals). So regardless of all the other questions about the memory ordering and locking for this new cached allocation, the sigqueue_cache_or_free() assumptions seem to be fundamentally incorrect. It may be that people will show me the errors of my ways, and tell me why this is all safe after all. We can reinstate it if so. But my current belief is that the WRITE_ONCE() that sets the cached entry needs to be a smp_store_release(), and the READ_ONCE() that finds a cached entry needs to be a smp_load_acquire() to handle memory ordering correctly. And the sequence in sigqueue_cache_or_free() would need to either use a lock or at least be interrupt-safe some way (perhaps by using something like the percpu 'cmpxchg': it doesn't need to be SMP-safe, but like the percpu operations it needs to be interrupt-safe). Fixes: 399f8dd9a866 ("signal: Prevent sigqueue caching after task got released") Fixes: 4bad58ebc8bc ("signal: Allow tasks to cache one sigqueue struct") Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-26Merge tag 's390-5.13-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Fix a couple of late pt_regs flags handling findings of conversion to generic entry. - Fix potential register clobbering in stack switch helper. - Fix thread/group masks for offline cpus. - Fix cleanup of mdev resources when remove callback is invoked in vfio-ap code. * tag 's390-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/stack: fix possible register corruption with stack switch helper s390/topology: clear thread/group maps for offline cpus s390/vfio-ap: clean up mdev resources when remove callback invoked s390: clear pt_regs::flags on irq entry s390: fix system call restart with multiple signals
2021-06-25Merge tag 'pinctrl-v5.13-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "Two last-minute fixes: - Put an fwnode in the errorpath in the SGPIO driver - Fix the number of GPIO lines per bank in the STM32 driver" * tag 'pinctrl-v5.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: stm32: fix the reported number of GPIO lines per bank pinctrl: microchip-sgpio: Put fwnode in error case during ->probe()
2021-06-25Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two small fixes, both in upper layer drivers (scsi disk and cdrom). The sd one is fixing a commit changing revalidation that came from the block tree a while ago (5.10) and the sr one adds handling of a condition we didn't previously handle for manually removed media" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: sd: Call sd_revalidate_disk() for ioctl(BLKRRPART) scsi: sr: Return appropriate error code when disk is ejected
2021-06-25Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "24 patches, based on 4a09d388f2ab382f217a764e6a152b3f614246f6. Subsystems affected by this patch series: mm (thp, vmalloc, hugetlb, memory-failure, and pagealloc), nilfs2, kthread, MAINTAINERS, and mailmap" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (24 commits) mailmap: add Marek's other e-mail address and identity without diacritics MAINTAINERS: fix Marek's identity again mm/page_alloc: do bulk array bounds check after checking populated elements mm/page_alloc: __alloc_pages_bulk(): do bounds check before accessing array mm/hwpoison: do not lock page again when me_huge_page() successfully recovers mm,hwpoison: return -EHWPOISON to denote that the page has already been poisoned mm/memory-failure: use a mutex to avoid memory_failure() races mm, futex: fix shared futex pgoff on shmem huge page kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync() kthread_worker: split code for canceling the delayed work timer mm/vmalloc: unbreak kasan vmalloc support KVM: s390: prepare for hugepage vmalloc mm/vmalloc: add vmalloc_no_huge nilfs2: fix memory leak in nilfs_sysfs_delete_device_group mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes mm: page_vma_mapped_walk(): get vma_address_end() earlier mm: page_vma_mapped_walk(): use goto instead of while (1) mm: page_vma_mapped_walk(): add a level of indentation mm: page_vma_mapped_walk(): crossing page table boundary ...
2021-06-25userfaultfd: uapi: fix UFFDIO_CONTINUE ioctl request definitionGleb Fotengauer-Malinovskiy
This ioctl request reads from uffdio_continue structure written by userspace which justifies _IOC_WRITE flag. It also writes back to that structure which justifies _IOC_READ flag. See NOTEs in include/uapi/asm-generic/ioctl.h for more information. Fixes: f619147104c8 ("userfaultfd: add UFFDIO_CONTINUE ioctl") Signed-off-by: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Reviewed-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-25Merge branch 'i2c/for-current' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Three more driver bugfixes and an annotation fix for the core" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: robotfuzz-osif: fix control-request directions i2c: dev: Add __user annotation i2c: cp2615: check for allocation failure in cp2615_i2c_recv() i2c: i801: Ensure that SMBHSTSTS_INUSE_STS is cleared when leaving i801_access
2021-06-25Merge tag 'devprop-5.13-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull device properties framework fix from Rafael Wysocki: "Fix a NULL pointer dereference introduced by a recent commit and occurring when device_remove_software_node() is used with a device that has never been registered (Heikki Krogerus)" * tag 'devprop-5.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: software node: Handle software node injection to an existing device properly
2021-06-25Merge tag 'for-linus-5.13b-rc8-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fix from Juergen Gross: "A fix for a regression introduced in 5.12: when migrating an irq related to a Xen user event to another cpu, a race might result in a WARN() triggering" * tag 'for-linus-5.13b-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/events: reset active flag for lateeoi events later
2021-06-25Merge tag 'for-linus-urgent' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "A selftests fix for ARM, and the fix for page reference count underflow. This is a very small fix that was provided by Nick Piggin and tested by myself" * tag 'for-linus-urgent' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: do not allow mapping valid but non-reference-counted pages KVM: selftests: Fix mapping length truncation in m{,un}map()
2021-06-25Merge tag 'x86_urgent_for_v5.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "Two more urgent FPU fixes: - prevent unprivileged userspace from reinitializing supervisor states - prepare init_fpstate, which is the buffer used when initializing FPU state, properly in case the skip-writing-state-components XSAVE* variants are used" * tag 'x86_urgent_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Make init_fpstate correct with optimized XSAVE x86/fpu: Preserve supervisor states in sanitize_restored_user_xstate()
2021-06-25Merge tag 'ceph-for-5.13-rc8' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "Two regression fixes from the merge window: one in the auth code affecting old clusters and one in the filesystem for proper propagation of MDS request errors. Also included a locking fix for async creates, marked for stable" * tag 'ceph-for-5.13-rc8' of https://github.com/ceph/ceph-client: libceph: set global_id as soon as we get an auth ticket libceph: don't pass result into ac->ops->handle_reply() ceph: fix error handling in ceph_atomic_open and ceph_lookup ceph: must hold snap_rwsem when filling inode for async create
2021-06-25Merge tag 'netfs-fixes-20210621' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull netfs fixes from David Howells: "This contains patches to fix netfs_write_begin() and afs_write_end() in the following ways: (1) In netfs_write_begin(), extract the decision about whether to skip a page out to its own helper and have that clear around the region to be written, but not clear that region. This requires the filesystem to patch it up afterwards if the hole doesn't get completely filled. (2) Use offset_in_thp() in (1) rather than manually calculating the offset into the page. (3) Due to (1), afs_write_end() now needs to handle short data write into the page by generic_perform_write(). I've adopted an analogous approach to ceph of just returning 0 in this case and letting the caller go round again. It also adds a note that (in the future) the len parameter may extend beyond the page allocated. This is because the page allocation is deferred to write_begin() and that gets to decide what size of THP to allocate." Jeff Layton points out: "The netfs fix in particular fixes a data corruption bug in cephfs" * tag 'netfs-fixes-20210621' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: netfs: fix test for whether we can skip read when writing beyond EOF afs: Fix afs_write_end() to handle short writes
2021-06-25Merge tag 'gpio-fixes-for-v5.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix wake-up interrupt support on gpio-mxc - zero the padding bytes in a structure passed to user-space in the GPIO character device - require HAS_IOPORT_MAP in two drivers that need it to fix a Kbuild issue * tag 'gpio-fixes-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: AMD8111 and TQMX86 require HAS_IOPORT_MAP gpiolib: cdev: zero padding during conversion to gpioline_info_changed gpio: mxc: Fix disabled interrupt wake-up support
2021-06-25Merge tag 'sound-5.13-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Two small changes have been cherry-picked as a last material for 5.13: a coverage after UMN revert action and a stale MAINTAINERS entry fix" * tag 'sound-5.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: MAINTAINERS: remove Timur Tabi from Freescale SOC sound drivers ASoC: rt5645: Avoid upgrading static warnings to errors
2021-06-25Merge tag 'kvmarm-5.14' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for v5.14. - Add MTE support in guests, complete with tag save/restore interface - Reduce the impact of CMOs by moving them in the page-table code - Allow device block mappings at stage-2 - Reduce the footprint of the vmemmap in protected mode - Support the vGIC on dumb systems such as the Apple M1 - Add selftest infrastructure to support multiple configuration and apply that to PMU/non-PMU setups - Add selftests for the debug architecture - The usual crop of PMU fixes
2021-06-25Merge tag 'kvm-s390-next-5.14-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Features for 5.14 - new HW facilities for guests - make inline assembly more robust with KASAN and co
2021-06-25Merge branch kvm-arm64/mmu/mte into kvmarm-master/nextMarc Zyngier
Last minute fix for MTE, making sure the pages are flagged as MTE before they are released. * kvm-arm64/mmu/mte: KVM: arm64: Set the MTE tag bit before releasing the page Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-25Merge remote-tracking branch 'spi/for-5.14' into spi-nextMark Brown
2021-06-25Merge remote-tracking branch 'spi/for-5.13' into spi-linusMark Brown
2021-06-25Merge remote-tracking branch 'spi/for-5.12' into spi-linusMark Brown
2021-06-25spi: core: add dma_map_dev for dma deviceVinod Koul
Some controllers like qcom geni need the parent device to be used for dma mapping, so add a dma_map_dev field and let drivers fill this to be used as mapping device Signed-off-by: Vinod Koul <vkoul@kernel.org> Link: https://lore.kernel.org/r/20210625052213.32260-4-vkoul@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
2021-06-25spi: convert Xilinx Zynq UltraScale+ MPSoC GQSPI bindings to YAMLNobuhiro Iwamatsu
Convert spi for Xilinx Zynq UltraScale+ MPSoC GQSPI bindings documentation to YAML. Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20210613214317.296667-1-iwamatsu@nigauri.org Signed-off-by: Mark Brown <broonie@kernel.org>
2021-06-25gpio: AMD8111 and TQMX86 require HAS_IOPORT_MAPJohannes Berg
Both of these drivers use ioport_map(), so they need to depend on HAS_IOPORT_MAP. Otherwise, they cannot be built even with COMPILE_TEST on architectures without an ioport implementation, such as ARCH=um. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
2021-06-24mailmap: add Marek's other e-mail address and identity without diacriticsMarek Behún
Some of my commits were sent with identities Marek Behun <marek.behun@nic.cz> Marek Behún <marek.behun@nic.cz> while the correct one is Marek Behún <kabel@kernel.org> Put this into mailmap so that git shortlog prints all my commits under one identity. Link: https://lkml.kernel.org/r/20210616113624.19351-2-kabel@kernel.org Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-24MAINTAINERS: fix Marek's identity againMarek Behún
Fix my name to use diacritics, since MAINTAINERS supports it. Fix my e-mail address in MAINTAINERS' marvell10g PHY driver description, I accidentally put my other e-mail address here. Link: https://lkml.kernel.org/r/20210616113624.19351-1-kabel@kernel.org Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-24mm/page_alloc: do bulk array bounds check after checking populated elementsMel Gorman
Dan Carpenter reported the following The patch 0f87d9d30f21: "mm/page_alloc: add an array-based interface to the bulk page allocator" from Apr 29, 2021, leads to the following static checker warning: mm/page_alloc.c:5338 __alloc_pages_bulk() warn: potentially one past the end of array 'page_array[nr_populated]' The problem can occur if an array is passed in that is fully populated. That potentially ends up allocating a single page and storing it past the end of the array. This patch returns 0 if the array is fully populated. Link: https://lkml.kernel.org/r/20210618125102.GU30378@techsingularity.net Fixes: 0f87d9d30f21 ("mm/page_alloc: add an array-based interface to the bulk page allocator") Signed-off-by: Mel Gorman <mgorman@techsinguliarity.net> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-24mm/page_alloc: __alloc_pages_bulk(): do bounds check before accessing arrayRasmus Villemoes
In the event that somebody would call this with an already fully populated page_array, the last loop iteration would do an access beyond the end of page_array. It's of course extremely unlikely that would ever be done, but this triggers my internal static analyzer. Also, if it really is not supposed to be invoked this way (i.e., with no NULL entries in page_array), the nr_populated<nr_pages check could simply be removed instead. Link: https://lkml.kernel.org/r/20210507064504.1712559-1-linux@rasmusvillemoes.dk Fixes: 0f87d9d30f21 ("mm/page_alloc: add an array-based interface to the bulk page allocator") Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-24mm/hwpoison: do not lock page again when me_huge_page() successfully recoversNaoya Horiguchi
Currently me_huge_page() temporary unlocks page to perform some actions then locks it again later. My testcase (which calls hard-offline on some tail page in a hugetlb, then accesses the address of the hugetlb range) showed that page allocation code detects this page lock on buddy page and printed out "BUG: Bad page state" message. check_new_page_bad() does not consider a page with __PG_HWPOISON as bad page, so this flag works as kind of filter, but this filtering doesn't work in this case because the "bad page" is not the actual hwpoisoned page. So stop locking page again. Actions to be taken depend on the page type of the error, so page unlocking should be done in ->action() callbacks. So let's make it assumed and change all existing callbacks that way. Link: https://lkml.kernel.org/r/20210609072029.74645-1-nao.horiguchi@gmail.com Fixes: commit 78bb920344b8 ("mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error") Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Tony Luck <tony.luck@intel.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-24mm,hwpoison: return -EHWPOISON to denote that the page has already been poisonedAili Yao
When memory_failure() is called with MF_ACTION_REQUIRED on the page that has already been hwpoisoned, memory_failure() could fail to send SIGBUS to the affected process, which results in infinite loop of MCEs. Currently memory_failure() returns 0 if it's called for already hwpoisoned page, then the caller, kill_me_maybe(), could return without sending SIGBUS to current process. An action required MCE is raised when the current process accesses to the broken memory, so no SIGBUS means that the current process continues to run and access to the error page again soon, so running into MCE loop. This issue can arise for example in the following scenarios: - Two or more threads access to the poisoned page concurrently. If local MCE is enabled, MCE handler independently handles the MCE events. So there's a race among MCE events, and the second or latter threads fall into the situation in question. - If there was a precedent memory error event and memory_failure() for the event failed to unmap the error page for some reason, the subsequent memory access to the error page triggers the MCE loop situation. To fix the issue, make memory_failure() return an error code when the error page has already been hwpoisoned. This allows memory error handler to control how it sends signals to userspace. And make sure that any process touching a hwpoisoned page should get a SIGBUS even in "already hwpoisoned" path of memory_failure() as is done in page fault path. Link: https://lkml.kernel.org/r/20210521030156.2612074-3-nao.horiguchi@gmail.com Signed-off-by: Aili Yao <yaoaili@kingsoft.com> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Jue Wang <juew@google.com> Cc: Tony Luck <tony.luck@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-24mm/memory-failure: use a mutex to avoid memory_failure() racesTony Luck
Patch series "mm,hwpoison: fix sending SIGBUS for Action Required MCE", v5. I wrote this patchset to materialize what I think is the current allowable solution mentioned by the previous discussion [1]. I simply borrowed Tony's mutex patch and Aili's return code patch, then I queued another one to find error virtual address in the best effort manner. I know that this is not a perfect solution, but should work for some typical case. [1]: https://lore.kernel.org/linux-mm/20210331192540.2141052f@alex-virtual-machine/ This patch (of 2): There can be races when multiple CPUs consume poison from the same page. The first into memory_failure() atomically sets the HWPoison page flag and begins hunting for tasks that map this page. Eventually it invalidates those mappings and may send a SIGBUS to the affected tasks. But while all that work is going on, other CPUs see a "success" return code from memory_failure() and so they believe the error has been handled and continue executing. Fix by wrapping most of the internal parts of memory_failure() in a mutex. [akpm@linux-foundation.org: make mf_mutex local to memory_failure()] Link: https://lkml.kernel.org/r/20210521030156.2612074-1-nao.horiguchi@gmail.com Link: https://lkml.kernel.org/r/20210521030156.2612074-2-nao.horiguchi@gmail.com Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Aili Yao <yaoaili@kingsoft.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Hildenbrand <david@redhat.com> Cc: Jue Wang <juew@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-24mm, futex: fix shared futex pgoff on shmem huge pageHugh Dickins
If more than one futex is placed on a shmem huge page, it can happen that waking the second wakes the first instead, and leaves the second waiting: the key's shared.pgoff is wrong. When 3.11 commit 13d60f4b6ab5 ("futex: Take hugepages into account when generating futex_key"), the only shared huge pages came from hugetlbfs, and the code added to deal with its exceptional page->index was put into hugetlb source. Then that was missed when 4.8 added shmem huge pages. page_to_pgoff() is what others use for this nowadays: except that, as currently written, it gives the right answer on hugetlbfs head, but nonsense on hugetlbfs tails. Fix that by calling hugetlbfs-specific hugetlb_basepage_index() on PageHuge tails as well as on head. Yes, it's unconventional to declare hugetlb_basepage_index() there in pagemap.h, rather than in hugetlb.h; but I do not expect anything but page_to_pgoff() ever to need it. [akpm@linux-foundation.org: give hugetlb_basepage_index() prototype the correct scope] Link: https://lkml.kernel.org/r/b17d946b-d09-326e-b42a-52884c36df32@google.com Fixes: 800d8c63b2e9 ("shmem: add huge pages support") Reported-by: Neel Natu <neelnatu@google.com> Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Zhang Yi <wetpzy@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-24kthread: prevent deadlock when kthread_mod_delayed_work() races with ↵Petr Mladek
kthread_cancel_delayed_work_sync() The system might hang with the following backtrace: schedule+0x80/0x100 schedule_timeout+0x48/0x138 wait_for_common+0xa4/0x134 wait_for_completion+0x1c/0x2c kthread_flush_work+0x114/0x1cc kthread_cancel_work_sync.llvm.16514401384283632983+0xe8/0x144 kthread_cancel_delayed_work_sync+0x18/0x2c xxxx_pm_notify+0xb0/0xd8 blocking_notifier_call_chain_robust+0x80/0x194 pm_notifier_call_chain_robust+0x28/0x4c suspend_prepare+0x40/0x260 enter_state+0x80/0x3f4 pm_suspend+0x60/0xdc state_store+0x108/0x144 kobj_attr_store+0x38/0x88 sysfs_kf_write+0x64/0xc0 kernfs_fop_write_iter+0x108/0x1d0 vfs_write+0x2f4/0x368 ksys_write+0x7c/0xec It is caused by the following race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync(): CPU0 CPU1 Context: Thread A Context: Thread B kthread_mod_delayed_work() spin_lock() __kthread_cancel_work() spin_unlock() del_timer_sync() kthread_cancel_delayed_work_sync() spin_lock() __kthread_cancel_work() spin_unlock() del_timer_sync() spin_lock() work->canceling++ spin_unlock spin_lock() queue_delayed_work() // dwork is put into the worker->delayed_work_list spin_unlock() kthread_flush_work() // flush_work is put at the tail of the dwork wait_for_completion() Context: IRQ kthread_delayed_work_timer_fn() spin_lock() list_del_init(&work->node); spin_unlock() BANG: flush_work is not longer linked and will never get proceed. The problem is that kthread_mod_delayed_work() checks work->canceling flag before canceling the timer. A simple solution is to (re)check work->canceling after __kthread_cancel_work(). But then it is not clear what should be returned when __kthread_cancel_work() removed the work from the queue (list) and it can't queue it again with the new @delay. The return value might be used for reference counting. The caller has to know whether a new work has been queued or an existing one was replaced. The proper solution is that kthread_mod_delayed_work() will remove the work from the queue (list) _only_ when work->canceling is not set. The flag must be checked after the timer is stopped and the remaining operations can be done under worker->lock. Note that kthread_mod_delayed_work() could remove the timer and then bail out. It is fine. The other canceling caller needs to cancel the timer as well. The important thing is that the queue (list) manipulation is done atomically under worker->lock. Link: https://lkml.kernel.org/r/20210610133051.15337-3-pmladek@suse.com Fixes: 9a6b06c8d9a220860468a ("kthread: allow to modify delayed kthread work") Signed-off-by: Petr Mladek <pmladek@suse.com> Reported-by: Martin Liu <liumartin@google.com> Cc: <jenhaochen@google.com> Cc: Minchan Kim <minchan@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-24kthread_worker: split code for canceling the delayed work timerPetr Mladek
Patch series "kthread_worker: Fix race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync()". This patchset fixes the race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync() including proper return value handling. This patch (of 2): Simple code refactoring as a preparation step for fixing a race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync(). It does not modify the existing behavior. Link: https://lkml.kernel.org/r/20210610133051.15337-2-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: <jenhaochen@google.com> Cc: Martin Liu <liumartin@google.com> Cc: Minchan Kim <minchan@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-24mm/vmalloc: unbreak kasan vmalloc supportDaniel Axtens
In commit 121e6f3258fe ("mm/vmalloc: hugepage vmalloc mappings"), __vmalloc_node_range was changed such that __get_vm_area_node was no longer called with the requested/real size of the vmalloc allocation, but rather with a rounded-up size. This means that __get_vm_area_node called kasan_unpoision_vmalloc() with a rounded up size rather than the real size. This led to it allowing access to too much memory and so missing vmalloc OOBs and failing the kasan kunit tests. Pass the real size and the desired shift into __get_vm_area_node. This allows it to round up the size for the underlying allocators while still unpoisioning the correct quantity of shadow memory. Adjust the other call-sites to pass in PAGE_SHIFT for the shift value. Link: https://lkml.kernel.org/r/20210617081330.98629-1-dja@axtens.net Link: https://bugzilla.kernel.org/show_bug.cgi?id=213335 Fixes: 121e6f3258fe ("mm/vmalloc: hugepage vmalloc mappings") Signed-off-by: Daniel Axtens <dja@axtens.net> Tested-by: David Gow <davidgow@google.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Tested-by: Andrey Konovalov <andreyknvl@gmail.com> Acked-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-24KVM: s390: prepare for hugepage vmallocClaudio Imbrenda
The Create Secure Configuration Ultravisor Call does not support using large pages for the virtual memory area. This is a hardware limitation. This patch replaces the vzalloc call with an almost equivalent call to the newly introduced vmalloc_no_huge function, which guarantees that only small pages will be used for the backing. The new call will not clear the allocated memory, but that has never been an actual requirement. Link: https://lkml.kernel.org/r/20210614132357.10202-3-imbrenda@linux.ibm.com Fixes: 121e6f3258fe3 ("mm/vmalloc: hugepage vmalloc mappings") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-24mm/vmalloc: add vmalloc_no_hugeClaudio Imbrenda
Patch series "mm: add vmalloc_no_huge and use it", v4. Add vmalloc_no_huge() and export it, so modules can allocate memory with small pages. Use the newly added vmalloc_no_huge() in KVM on s390 to get around a hardware limitation. This patch (of 2): Commit 121e6f3258fe3 ("mm/vmalloc: hugepage vmalloc mappings") added support for hugepage vmalloc mappings, it also added the flag VM_NO_HUGE_VMAP for __vmalloc_node_range to request the allocation to be performed with 0-order non-huge pages. This flag is not accessible when calling vmalloc, the only option is to call directly __vmalloc_node_range, which is not exported. This means that a module can't vmalloc memory with small pages. Case in point: KVM on s390x needs to vmalloc a large area, and it needs to be mapped with non-huge pages, because of a hardware limitation. This patch adds the function vmalloc_no_huge, which works like vmalloc, but it is guaranteed to always back the mapping using small pages. This new function is exported, therefore it is usable by modules. [akpm@linux-foundation.org: whitespace fixes, per Christoph] Link: https://lkml.kernel.org/r/20210614132357.10202-1-imbrenda@linux.ibm.com Link: https://lkml.kernel.org/r/20210614132357.10202-2-imbrenda@linux.ibm.com Fixes: 121e6f3258fe3 ("mm/vmalloc: hugepage vmalloc mappings") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Acked-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-24nilfs2: fix memory leak in nilfs_sysfs_delete_device_groupPavel Skripkin
My local syzbot instance hit memory leak in nilfs2. The problem was in missing kobject_put() in nilfs_sysfs_delete_device_group(). kobject_del() does not call kobject_cleanup() for passed kobject and it leads to leaking duped kobject name if kobject_put() was not called. Fail log: BUG: memory leak unreferenced object 0xffff8880596171e0 (size 8): comm "syz-executor379", pid 8381, jiffies 4294980258 (age 21.100s) hex dump (first 8 bytes): 6c 6f 6f 70 30 00 00 00 loop0... backtrace: kstrdup+0x36/0x70 mm/util.c:60 kstrdup_const+0x53/0x80 mm/util.c:83 kvasprintf_const+0x108/0x190 lib/kasprintf.c:48 kobject_set_name_vargs+0x56/0x150 lib/kobject.c:289 kobject_add_varg lib/kobject.c:384 [inline] kobject_init_and_add+0xc9/0x160 lib/kobject.c:473 nilfs_sysfs_create_device_group+0x150/0x800 fs/nilfs2/sysfs.c:999 init_nilfs+0xe26/0x12b0 fs/nilfs2/the_nilfs.c:637 Link: https://lkml.kernel.org/r/20210612140559.20022-1-paskripkin@gmail.com Fixes: da7141fb78db ("nilfs2: add /sys/fs/nilfs2/<device> group") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-24mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk()Hugh Dickins
Aha! Shouldn't that quick scan over pte_none()s make sure that it holds ptlock in the PVMW_SYNC case? That too might have been responsible for BUGs or WARNs in split_huge_page_to_list() or its unmap_page(), though I've never seen any. Link: https://lkml.kernel.org/r/1bdf384c-8137-a149-2a1e-475a4791c3c@google.com Link: https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/ Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Wang Yugui <wangyugui@e16-tech.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-24mm/thp: fix page_vma_mapped_walk() if THP mapped by ptesHugh Dickins
Running certain tests with a DEBUG_VM kernel would crash within hours, on the total_mapcount BUG() in split_huge_page_to_list(), while trying to free up some memory by punching a hole in a shmem huge page: split's try_to_unmap() was unable to find all the mappings of the page (which, on a !DEBUG_VM kernel, would then keep the huge page pinned in memory). Crash dumps showed two tail pages of a shmem huge page remained mapped by pte: ptes in a non-huge-aligned vma of a gVisor process, at the end of a long unmapped range; and no page table had yet been allocated for the head of the huge page to be mapped into. Although designed to handle these odd misaligned huge-page-mapped-by-pte cases, page_vma_mapped_walk() falls short by returning false prematurely when !pmd_present or !pud_present or !p4d_present or !pgd_present: there are cases when a huge page may span the boundary, with ptes present in the next. Restructure page_vma_mapped_walk() as a loop to continue in these cases, while keeping its layout much as before. Add a step_forward() helper to advance pvmw->address across those boundaries: originally I tried to use mm's standard p?d_addr_end() macros, but hit the same crash 512 times less often: because of the way redundant levels are folded together, but folded differently in different configurations, it was just too difficult to use them correctly; and step_forward() is simpler anyway. Link: https://lkml.kernel.org/r/fedb8632-1798-de42-f39e-873551d5bc81@google.com Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-24mm: page_vma_mapped_walk(): get vma_address_end() earlierHugh Dickins
page_vma_mapped_walk() cleanup: get THP's vma_address_end() at the start, rather than later at next_pte. It's a little unnecessary overhead on the first call, but makes for a simpler loop in the following commit. Link: https://lkml.kernel.org/r/4542b34d-862f-7cb4-bb22-e0df6ce830a2@google.com Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>