linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2023-04-17	arm64: dts: rockchip: Add vdd_cpu_big regulators to rk3588-rock-5b	Cristian Ciocaltea
	The RK8602 and RK8603 voltage regulators on the Rock 5B board provide the power lines vdd_cpu_big0 and vdd_cpu_big1, respectively. Add the necessary device tree nodes and bind them to the corresponding CPU big core nodes. Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://lore.kernel.org/r/20230414125425.124994-4-cristian.ciocaltea@collabora.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2023-04-17	arm64: dts: rockchip: Use generic name for es8316 on Pinebook Pro and Rock 5B	Cristian Ciocaltea
	Use generic 'audio-codec' name for es8316 node on Pinebook Pro and Rock 5B boards. Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://lore.kernel.org/r/20230414125425.124994-3-cristian.ciocaltea@collabora.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2023-04-17	arm64: dts: rockchip: Drop RTC clock-frequency on rk3588-rock-5b	Cristian Ciocaltea
	The hym8563 RTC driver doesn't handle the 'clock-frequency' property, which is also indicated by the following dtbs_check warning: rk3588-rock-5b.dtb: rtc@51: Unevaluated properties are not allowed ('clock-frequency' was unexpected) From schema: Documentation/devicetree/bindings/rtc/haoyu,hym8563.yaml Drop the unsupported property. Fixes: 1e9c2404d887 ("arm64: dts: rockchip: Enable RTC support for Rock 5B") Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://lore.kernel.org/r/20230414125425.124994-2-cristian.ciocaltea@collabora.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2023-04-17	arm64: dts: apple: t8112: Add PWM controller	Sasha Finkelstein
	This patch adds the device tree entries for the PWM controller present on M2 macbooks that is connected to the keyboard backlight. Signed-off-by: Sasha Finkelstein <fnkl.kernel@gmail.com> Signed-off-by: Hector Martin <marcan@marcan.st>
2023-04-17	platform/x86/amd: pmc: Fix memory leak in amd_pmc_stb_debugfs_open_v2()	Feng Jiang
	Function amd_pmc_stb_debugfs_open_v2() may be called when the STB debug mechanism enabled. When amd_pmc_send_cmd() fails, the 'buf' needs to be released. Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn> Link: https://lore.kernel.org/r/20230412093734.1126410-1-jiangfeng@kylinos.cn Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-04-17	arm64: dts: apple: t600x: Add PWM controller	Sasha Finkelstein
	Adds PWM controller and keyboard backlight bindings for M1 Pro/Max MacBook Pros Signed-off-by: Sasha Finkelstein <fnkl.kernel@gmail.com> Acked-by: Hector Martin <marcan@marcan.st> Signed-off-by: Hector Martin <marcan@marcan.st>
2023-04-17	arm64: dts: apple: t8103: Add PWM controller	Sasha Finkelstein
	Adds PWM controller and keyboard backlight bindings for M1 MacBooks Signed-off-by: Sasha Finkelstein <fnkl.kernel@gmail.com> Acked-by: Sven Peter <sven@svenpeter.dev> Signed-off-by: Hector Martin <marcan@marcan.st>
2023-04-17	arm64: dts: rockchip: Add pinctrl gpio-ranges for rk356x	John Clark
	Add gpio-range properties to the pinctrl gpio nodes in rk356x.dtsi Signed-off-by: John Clark <inindev@gmail.com> Link: https://lore.kernel.org/r/20230413170337.6815-1-inindev@gmail.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2023-04-17	mlxbf-bootctl: Add sysfs file for BlueField boot fifo	Liming Sun
	This commit adds sysfs file for BlueField boot fifo. The boot fifo is usually used to push boot stream via USB or PCIe. Once OS is up, it can be reused by applications to read data or configuration from external host. Signed-off-by: Liming Sun <limings@nvidia.com> Reviewed-by: David Thompson <davthompson@nvidia.com> Link: https://lore.kernel.org/r/52b0b00dacbc4aad3169dd3667d79c85e334783b.1680657571.git.limings@nvidia.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-04-17	drm/exynos: Implement fbdev emulation as in-kernel client	Thomas Zimmermann
	Move code from ad-hoc fbdev callbacks into DRM client functions and remove the old callbacks. The functions instruct the client to poll for changed output or restore the display. The DRM core calls both, the old callbacks and the new client helpers, from the same places. The new functions perform the same operation as before, so there's no change in functionality. Replace all code that initializes or releases fbdev emulation throughout the driver. Instead initialize the fbdev client by a single call to exynos_fbdev_setup() after exynos has registered its DRM device. As in most drivers, exynos' fbdev emulation now acts like a regular DRM client. The fbdev client setup consists of the initial preparation and the hot-plugging of the display. The latter creates the fbdev device and sets up the fbdev framebuffer. The setup performs display hot-plugging once. If no display can be detected, DRM probe helpers re-run the detection on each hotplug event. A call to drm_dev_unregister() releases the client automatically. No further action is required within exynos. If the fbdev framebuffer has been fully set up, struct fb_ops.fb_destroy implements the release. For partially initialized emulation, the fbdev client reverts the initial setup. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2023-04-17	drm/exynos: Initialize fbdev DRM client	Thomas Zimmermann
	Initialize the fbdev client in the fbdev code with empty helper functions. Also clean up the client. The helpers will later implement various functionality of the DRM client. No functional changes. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2023-04-17	drm/exynos: Remove fb_helper from struct exynos_drm_private	Thomas Zimmermann
	The DRM device stores a pointer to the fbdev helper. Remove struct exynos_drm_private.fb_helper, which contains the same value. No functional changes. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2023-04-17	drm/exynos: Remove struct exynos_drm_fbdev	Thomas Zimmermann
	Remove struct exynos_drm_fbdev, which is an empty wrapper around struct drm_fb_helper. Use the latter directly. No functional changes. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2023-04-17	drm/exynos: Remove exynos_gem from struct exynos_drm_fbdev	Thomas Zimmermann
	Fbdev's framebuffer stores a pointer to the GEM object. Remove struct exynos_drm_fbdev.exynos_gem, which contains the same value. No functional changes. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2023-04-17	mmc: core: Remove unused macro mmc_req_rel_wr	Bean Huo
	There is no user for macro mmc_req_rel_wr, so delete it. Signed-off-by: Bean Huo <beanhuo@micron.com> Link: https://lore.kernel.org/r/20230403221754.16168-1-beanhuo@iokpp.de Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2023-04-17	mmc: sdhci-of-arasan: Skip setting clock delay for 400KHz	Sai Krishna Potthuri
	Clock delay settings are not defined for 400KHz, so add frequency check to skip calling the clock delay settings when frequency is <=400KHz. Signed-off-by: Sai Krishna Potthuri <sai.krishna.potthuri@amd.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20230403102551.3763054-4-sai.krishna.potthuri@amd.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2023-04-17	mmc: sdhci-of-arasan: Add support for eMMC5.1 on Xilinx Versal Net platform	Swati Agarwal
	Add support for eMMC5.1 on Xilinx Versal Net platform - Add new compatible string(xlnx,versal-net-emmc). - Add support for PHY which is part of Host Controller register space. - Add DLL and Delay Chain mode support and corresponding tap delays for all eMMC modes. - Add Strobe select tap for HS400 mode. Signed-off-by: Swati Agarwal <swati.agarwal@amd.com> Co-developed-by: Sai Krishna Potthuri <sai.krishna.potthuri@amd.com> Signed-off-by: Sai Krishna Potthuri <sai.krishna.potthuri@amd.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20230403102551.3763054-3-sai.krishna.potthuri@amd.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2023-04-17	dt-bindings: mmc: arasan,sdci: Add Xilinx Versal Net compatible	Sai Krishna Potthuri
	Add Xilinx Versal Net compatible to support eMMC 5.1 PHY. Signed-off-by: Sai Krishna Potthuri <sai.krishna.potthuri@amd.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20230403102551.3763054-2-sai.krishna.potthuri@amd.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2023-04-17	drm/i915: Fix fast wake AUX sync len	Ville Syrjälä
	Fast wake should use 8 SYNC pulses for the preamble and 10-16 SYNC pulses for the precharge. Reduce our fast wake SYNC count to match the maximum value. We also use the maximum precharge length for normal AUX transactions. Cc: stable@vger.kernel.org Cc: Jouni Högander <jouni.hogander@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230329172434.18744-1-ville.syrjala@linux.intel.com Reviewed-by: Jouni Högander <jouni.hogander@intel.com> (cherry picked from commit 605f7c73133341d4b762cbd9a22174cc22d4c38b) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2023-04-17	sfc: Fix use-after-free due to selftest_work	Ding Hui
	There is a use-after-free scenario that is: When the NIC is down, user set mac address or vlan tag to VF, the xxx_set_vf_mac() or xxx_set_vf_vlan() will invoke efx_net_stop() and efx_net_open(), since netif_running() is false, the port will not start and keep port_enabled false, but selftest_work is scheduled in efx_net_open(). If we remove the device before selftest_work run, the efx_stop_port() will not be called since the NIC is down, and then efx is freed, we will soon get a UAF in run_timer_softirq() like this: [ 1178.907941] ================================================================== [ 1178.907948] BUG: KASAN: use-after-free in run_timer_softirq+0xdea/0xe90 [ 1178.907950] Write of size 8 at addr ff11001f449cdc80 by task swapper/47/0 [ 1178.907950] [ 1178.907953] CPU: 47 PID: 0 Comm: swapper/47 Kdump: loaded Tainted: G O --------- -t - 4.18.0 #1 [ 1178.907954] Hardware name: SANGFOR X620G40/WI2HG-208T1061A, BIOS SPYH051032-U01 04/01/2022 [ 1178.907955] Call Trace: [ 1178.907956] <IRQ> [ 1178.907960] dump_stack+0x71/0xab [ 1178.907963] print_address_description+0x6b/0x290 [ 1178.907965] ? run_timer_softirq+0xdea/0xe90 [ 1178.907967] kasan_report+0x14a/0x2b0 [ 1178.907968] run_timer_softirq+0xdea/0xe90 [ 1178.907971] ? init_timer_key+0x170/0x170 [ 1178.907973] ? hrtimer_cancel+0x20/0x20 [ 1178.907976] ? sched_clock+0x5/0x10 [ 1178.907978] ? sched_clock_cpu+0x18/0x170 [ 1178.907981] __do_softirq+0x1c8/0x5fa [ 1178.907985] irq_exit+0x213/0x240 [ 1178.907987] smp_apic_timer_interrupt+0xd0/0x330 [ 1178.907989] apic_timer_interrupt+0xf/0x20 [ 1178.907990] </IRQ> [ 1178.907991] RIP: 0010:mwait_idle+0xae/0x370 If the NIC is not actually brought up, there is no need to schedule selftest_work, so let's move invoking efx_selftest_async_start() into efx_start_all(), and it will be canceled by broughting down. Fixes: dd40781e3a4e ("sfc: Run event/IRQ self-test asynchronously when interface is brought up") Fixes: e340be923012 ("sfc: add ndo_set_vf_mac() function for EF10") Debugged-by: Huang Cun <huangcun@sangfor.com.cn> Cc: Donglin Peng <pengdonglin@sangfor.com.cn> Suggested-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-17	virtio_net: bugfix overflow inside xdp_linearize_page()	Xuan Zhuo
	Here we copy the data from the original buf to the new page. But we not check that it may be overflow. As long as the size received(including vnethdr) is greater than 3840 (PAGE_SIZE -VIRTIO_XDP_HEADROOM). Then the memcpy will overflow. And this is completely possible, as long as the MTU is large, such as 4096. In our test environment, this will cause crash. Since crash is caused by the written memory, it is meaningless, so I do not include it. Fixes: 72979a6c3590 ("virtio_net: xdp, add slowpath case for non contiguous buffers") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-17	drm/i915: Fix memory leaks in i915 selftests	Cong Liu
	This patch fixes memory leaks on error escapes in function fake_get_pages Fixes: c3bfba9a2225 ("drm/i915: Check for integer truncation on scatterlist creation") Signed-off-by: Cong Liu <liucong2@kylinos.cn> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230414224109.1051922-1-andi.shyti@linux.intel.com (cherry picked from commit 8bfbdadce85c4c51689da10f39c805a7106d4567) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2023-04-17	drm/i915: Make intel_get_crtc_new_encoder() less oopsy	Ville Syrjälä
	The point of the WARN was to print something, not oops straight up. Currently that is precisely what happens if we can't find the connector for the crtc in the atomic state. Get the dev pointer from the atomic state instead of the potentially NULL encoder to avoid that. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230413200602.6037-2-ville.syrjala@linux.intel.com Fixes: 3a47ae201e07 ("drm/i915/display: Make WARN* drm specific where encoder ptr is available") Reviewed-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit 3b6692357f70498f617ea1b31a0378070a0acf1c) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2023-04-17	drm/i915/gt: Avoid out-of-bounds access when loading HuC	Lucas De Marchi
	When HuC is loaded by GSC, there is no header definition for the kernel to look at and firmware is just handed to GSC. However when reading the version, it should still check the size of the blob to guarantee it's not incurring into out-of-bounds array access. If firmware is smaller than expected, the following message is now printed: # echo boom > /lib/firmware/i915/dg2_huc_gsc.bin # dmesg \| grep -i huc [drm] GT0: HuC firmware i915/dg2_huc_gsc.bin: invalid size: 5 < 184 [drm] ERROR GT0: HuC firmware i915/dg2_huc_gsc.bin: fetch failed -ENODATA ... Even without this change the size, header and signature are still checked by GSC when loading, so this only avoids the out-of-bounds array access. Fixes: a7b516bd981f ("drm/i915/huc: Add fetch support for gsc-loaded HuC binary") Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230413200349.3492571-1-lucas.demarchi@intel.com (cherry picked from commit adfbae9ffe339eed08d54a4eb87c93f4b35f214b) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2023-04-17	Merge tag 'amd-drm-next-6.4-2023-04-14' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.4-2023-04-14: amdgpu: - S4 fixes for APUs - GFX11 fixes - Misc code cleanups - DCN 3.2 fixes - DCN 3.1.4 fixes - FPO/FAMS work to improve display power savings - DP fixes - UMC 8.10 code cleanup - SDMA v4 fix - GPU clock counter fixes - SMU 13 fixes - Sdma v6 invalidation fix for preemption - RAS fixes - S0ix fix - GC 9.4.3 updates amdkfd: - Fix user pointers with IOMMU - Fix coherency flag handling Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230414204609.7942-1-alexander.deucher@amd.com
2023-04-16	cifs: avoid dup prefix path in dfs_get_automount_devname()	Paulo Alcantara
	@server->origin_fullpath already contains the tree name + optional prefix, so avoid calling __build_path_from_dentry_optional_prefix() as it might end up duplicating prefix path from @cifs_sb->prepath into final full path. Instead, generate DFS full path by simply merging @server->origin_fullpath with dentry's path. This fixes the following case mount.cifs //root/dfs/dir /mnt/ -o ... ls /mnt/link where cifs_dfs_do_automount() will call smb3_parse_devname() with @devname set to "//root/dfs/dir/link" instead of "//root/dfs/dir/dir/link". Fixes: 7ad54b98fc1f ("cifs: use origin fullpath for automounts") Cc: <stable@vger.kernel.org> # 6.2+ Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-04-16	Linux 6.3-rc7v6.3-rc7	Linus Torvalds

2023-04-16	Revert "userfaultfd: don't fail on unrecognized features"	Peter Xu
	This is a proposal to revert commit 914eedcb9ba0ff53c33808. I found this when writing a simple UFFDIO_API test to be the first unit test in this set. Two things breaks with the commit: - UFFDIO_API check was lost and missing. According to man page, the kernel should reject ioctl(UFFDIO_API) if uffdio_api.api != 0xaa. This check is needed if the api version will be extended in the future, or user app won't be able to identify which is a new kernel. - Feature flags checks were removed, which means UFFDIO_API with a feature that does not exist will also succeed. According to the man page, we should (and it makes sense) to reject ioctl(UFFDIO_API) if unknown features passed in. Link: https://lore.kernel.org/r/20220722201513.1624158-1-axelrasmussen@google.com Link: https://lkml.kernel.org/r/20230412163922.327282-2-peterx@redhat.com Fixes: 914eedcb9ba0 ("userfaultfd: don't fail on unrecognized features") Signed-off-by: Peter Xu <peterx@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-16	writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs	Baokun Li
	KASAN report null-ptr-deref: ================================================================== BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0 Write of size 8 at addr 0000000000000000 by task sync/943 CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461 Call Trace: <TASK> dump_stack_lvl+0x7f/0xc0 print_report+0x2ba/0x340 kasan_report+0xc4/0x120 kasan_check_range+0x1b7/0x2e0 __kasan_check_write+0x24/0x40 bdi_split_work_to_wbs+0x5c5/0x7b0 sync_inodes_sb+0x195/0x630 sync_inodes_one_sb+0x3a/0x50 iterate_supers+0x106/0x1b0 ksys_sync+0x98/0x160 [...] ================================================================== The race that causes the above issue is as follows: cpu1 cpu2 -------------------------\|------------------------- inode_switch_wbs INIT_WORK(&isw->work, inode_switch_wbs_work_fn) queue_rcu_work(isw_wq, &isw->work) // queue_work async inode_switch_wbs_work_fn wb_put_many(old_wb, nr_switched) percpu_ref_put_many ref->data->release(ref) cgwb_release queue_work(cgwb_release_wq, &wb->release_work) // queue_work async &wb->release_work cgwb_release_workfn ksys_sync iterate_supers sync_inodes_one_sb sync_inodes_sb bdi_split_work_to_wbs kmalloc(sizeof(*work), GFP_ATOMIC) // alloc memory failed percpu_ref_exit ref->data = NULL kfree(data) wb_get(wb) percpu_ref_get(&wb->refcnt) percpu_ref_get_many(ref, 1) atomic_long_add(nr, &ref->data->count) atomic64_add(i, v) // trigger null-ptr-deref bdi_split_work_to_wbs() traverses &bdi->wb_list to split work into all wbs. If the allocation of new work fails, the on-stack fallback will be used and the reference count of the current wb is increased afterwards. If cgroup writeback membership switches occur before getting the reference count and the current wb is released as old_wd, then calling wb_get() or wb_put() will trigger the null pointer dereference above. This issue was introduced in v4.3-rc7 (see fix tag1). Both sync_inodes_sb() and __writeback_inodes_sb_nr() calls to bdi_split_work_to_wbs() can trigger this issue. For scenarios called via sync_inodes_sb(), originally commit 7fc5854f8c6e ("writeback: synchronize sync(2) against cgroup writeback membership switches") reduced the possibility of the issue by adding wb_switch_rwsem, but in v5.14-rc1 (see fix tag2) removed the "inode_io_list_del_locked(inode, old_wb)" from inode_switch_wbs_work_fn() so that wb->state contains WB_has_dirty_io, thus old_wb is not skipped when traversing wbs in bdi_split_work_to_wbs(), and the issue becomes easily reproducible again. To solve this problem, percpu_ref_exit() is called under RCU protection to avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs(). Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(), and skip the current wb if wb_tryget() fails because the wb has already been shutdown. Link: https://lkml.kernel.org/r/20230410130826.1492525-1-libaokun1@huawei.com Fixes: b817525a4a80 ("writeback: bdi_writeback iteration must not skip dying ones") Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Tejun Heo <tj@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Christian Brauner <brauner@kernel.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Hou Tao <houtao1@huawei.com> Cc: yangerkun <yangerkun@huawei.com> Cc: Zhang Yi <yi.zhang@huawei.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-16	maple_tree: fix a potential memory leak, OOB access, or other unpredictable bug	Peng Zhang
	In mas_alloc_nodes(), "node->node_count = 0" means to initialize the node_count field of the new node, but the node may not be a new node. It may be a node that existed before and node_count has a value, setting it to 0 will cause a memory leak. At this time, mas->alloc->total will be greater than the actual number of nodes in the linked list, which may cause many other errors. For example, out-of-bounds access in mas_pop_node(), and mas_pop_node() may return addresses that should not be used. Fix it by initializing node_count only for new nodes. Also, by the way, an if-else statement was removed to simplify the code. Link: https://lkml.kernel.org/r/20230411041005.26205-1-zhangpeng.00@bytedance.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-16	tools/mm/page_owner_sort.c: fix TGID output when cull=tg is used	Steve Chou
	When using cull option with 'tg' flag, the fprintf is using pid instead of tgid. It should use tgid instead. Link: https://lkml.kernel.org/r/20230411034929.2071501-1-steve_chou@pesi.com.tw Fixes: 9c8a0a8e599f4a ("tools/vm/page_owner_sort.c: support for user-defined culling rules") Signed-off-by: Steve Chou <steve_chou@pesi.com.tw> Cc: Jiajian Ye <yejiajian2018@email.szu.edu.cn> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-16	mailmap: update jtoppins' entry to reference correct email	Jonathan Toppins
	Link: https://lkml.kernel.org/r/d79bc6eaf65e68bd1c2a1e1510ab6291ce5926a6.1681162487.git.jtoppins@redhat.com Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Kirill Tkhai <tkhai@ya.ru> Cc: Konrad Dybcio <konrad.dybcio@linaro.org> Cc: Qais Yousef <qyousef@layalina.io> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-16	mm/mempolicy: fix use-after-free of VMA iterator	Liam R. Howlett
	set_mempolicy_home_node() iterates over a list of VMAs and calls mbind_range() on each VMA, which also iterates over the singular list of the VMA passed in and potentially splits the VMA. Since the VMA iterator is not passed through, set_mempolicy_home_node() may now point to a stale node in the VMA tree. This can result in a UAF as reported by syzbot. Avoid the stale maple tree node by passing the VMA iterator through to the underlying call to split_vma(). mbind_range() is also overly complicated, since there are two calling functions and one already handles iterating over the VMAs. Simplify mbind_range() to only handle merging and splitting of the VMAs. Align the new loop in do_mbind() and existing loop in set_mempolicy_home_node() to use the reduced mbind_range() function. This allows for a single location of the range calculation and avoids constantly looking up the previous VMA (since this is a loop over the VMAs). Link: https://lore.kernel.org/linux-mm/000000000000c93feb05f87e24ad@google.com/ Fixes: 66850be55e8e ("mm/mempolicy: use vma iterator & maple state instead of vma linked list") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: syzbot+a7c1ec5b1d71ceaa5186@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/20230410152205.2294819-1-Liam.Howlett@oracle.com Tested-by: syzbot+a7c1ec5b1d71ceaa5186@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-16	mm/huge_memory.c: warn with pr_warn_ratelimited instead of VM_WARN_ON_ONCE_FOLIO	Naoya Horiguchi
	split_huge_page_to_list() WARNs when called for huge zero pages, which sounds to me too harsh because it does not imply a kernel bug, but just notifies the event to admins. On the other hand, this is considered as critical by syzkaller and makes its testing less efficient, which seems to me harmful. So replace the VM_WARN_ON_ONCE_FOLIO with pr_warn_ratelimited. Link: https://lkml.kernel.org/r/20230406082004.2185420-1-naoya.horiguchi@linux.dev Fixes: 478d134e9506 ("mm/huge_memory: do not overkill when splitting huge_zero_page") Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reported-by: syzbot+07a218429c8d19b1fb25@syzkaller.appspotmail.com Link: https://lore.kernel.org/lkml/000000000000a6f34a05e6efcd01@google.com/ Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Xu Yu <xuyu@linux.alibaba.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-16	mm/mprotect: fix do_mprotect_pkey() return on error	Liam R. Howlett
	When the loop over the VMA is terminated early due to an error, the return code could be overwritten with ENOMEM. Fix the return code by only setting the error on early loop termination when the error is not set. User-visible effects include: attempts to run mprotect() against a special mapping or with a poorly-aligned hugetlb address should return -EINVAL, but they presently return -ENOMEM. In other cases an -EACCESS should be returned. Link: https://lkml.kernel.org/r/20230406193050.1363476-1-Liam.Howlett@oracle.com Fixes: 2286a6914c77 ("mm: change mprotect_fixup to vma iterator") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-16	mm/khugepaged: check again on anon uffd-wp during isolation	Peter Xu
	Khugepaged collapse an anonymous thp in two rounds of scans. The 2nd round done in __collapse_huge_page_isolate() after hpage_collapse_scan_pmd(), during which all the locks will be released temporarily. It means the pgtable can change during this phase before 2nd round starts. It's logically possible some ptes got wr-protected during this phase, and we can errornously collapse a thp without noticing some ptes are wr-protected by userfault. e1e267c7928f wanted to avoid it but it only did that for the 1st phase, not the 2nd phase. Since __collapse_huge_page_isolate() happens after a round of small page swapins, we don't need to worry on any !present ptes - if it existed khugepaged will already bail out. So we only need to check present ptes with uffd-wp bit set there. This is something I found only but never had a reproducer, I thought it was one caused a bug in Muhammad's recent pagemap new ioctl work, but it turns out it's not the cause of that but an userspace bug. However this seems to still be a real bug even with a very small race window, still worth to have it fixed and copy stable. Link: https://lkml.kernel.org/r/20230405155120.3608140-1-peterx@redhat.com Fixes: e1e267c7928f ("khugepaged: skip collapse if uffd-wp detected") Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-16	mm/userfaultfd: fix uffd-wp handling for THP migration entries	David Hildenbrand
	Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb: fix uffd-wp handling for migration entries in hugetlb_change_protection()") similarly applies to THP. Setting/clearing uffd-wp on THP migration entries is not implemented properly. Further, while removing migration PMDs considers the uffd-wp bit, inserting migration PMDs does not consider the uffd-wp bit. We have to set/clear independently of the migration entry type in change_huge_pmd() and properly copy the uffd-wp bit in set_pmd_migration_entry(). Verified using a simple reproducer that triggers migration of a THP, that the set_pmd_migration_entry() no longer loses the uffd-wp bit. Link: https://lkml.kernel.org/r/20230405160236.587705-2-david@redhat.com Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-16	mm: swap: fix performance regression on sparsetruncate-tiny	Qi Zheng
	The ->percpu_pvec_drained was originally introduced by commit d9ed0d08b6c6 ("mm: only drain per-cpu pagevecs once per pagevec usage") to drain per-cpu pagevecs only once per pagevec usage. But after converting the swap code to be more folio-based, the commit c2bc16817aa0 ("mm/swap: add folio_batch_move_lru()") breaks this logic, which would cause ->percpu_pvec_drained to be reset to false, that means per-cpu pagevecs will be drained multiple times per pagevec usage. In theory, there should be no functional changes when converting code to be more folio-based. We should call folio_batch_reinit() in folio_batch_move_lru() instead of folio_batch_init(). And to verify that we still need ->percpu_pvec_drained, I ran mmtests/sparsetruncate-tiny and got the following data: baseline with baseline/ patch/ Min Time 326.00 ( 0.00%) 328.00 ( -0.61%) 1st-qrtle Time 334.00 ( 0.00%) 336.00 ( -0.60%) 2nd-qrtle Time 338.00 ( 0.00%) 341.00 ( -0.89%) 3rd-qrtle Time 343.00 ( 0.00%) 347.00 ( -1.17%) Max-1 Time 326.00 ( 0.00%) 328.00 ( -0.61%) Max-5 Time 327.00 ( 0.00%) 330.00 ( -0.92%) Max-10 Time 328.00 ( 0.00%) 331.00 ( -0.91%) Max-90 Time 350.00 ( 0.00%) 357.00 ( -2.00%) Max-95 Time 395.00 ( 0.00%) 390.00 ( 1.27%) Max-99 Time 508.00 ( 0.00%) 434.00 ( 14.57%) Max Time 547.00 ( 0.00%) 476.00 ( 12.98%) Amean Time 344.61 ( 0.00%) 345.56 * -0.28%* Stddev Time 30.34 ( 0.00%) 19.51 ( 35.69%) CoeffVar Time 8.81 ( 0.00%) 5.65 ( 35.87%) BAmean-99 Time 342.38 ( 0.00%) 344.27 ( -0.55%) BAmean-95 Time 338.58 ( 0.00%) 341.87 ( -0.97%) BAmean-90 Time 336.89 ( 0.00%) 340.26 ( -1.00%) BAmean-75 Time 335.18 ( 0.00%) 338.40 ( -0.96%) BAmean-50 Time 332.54 ( 0.00%) 335.42 ( -0.87%) BAmean-25 Time 329.30 ( 0.00%) 332.00 ( -0.82%) From the above it can be seen that we get similar data to when ->percpu_pvec_drained was introduced, so we still need it. Let's call folio_batch_reinit() in folio_batch_move_lru() to restore the original logic. Link: https://lkml.kernel.org/r/20230405161854.6931-1-zhengqi.arch@bytedance.com Fixes: c2bc16817aa0 ("mm/swap: add folio_batch_move_lru()") Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-16	Merge tag 'sched_urgent_for_v6.3_rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Borislav Petkov: - Do not pull tasks to the local scheduling group if its average load is higher than the average system load * tag 'sched_urgent_for_v6.3_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Fix imbalance overflow
2023-04-16	Merge tag 'x86_urgent_for_v6.3_rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Borislav Petkov: - Drop __init annotation from two rtc functions which get called after boot is done, in order to prevent a crash * tag 'x86_urgent_for_v6.3_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/rtc: Remove __init for runtime functions
2023-04-17	erofs: cleanup i_format-related stuffs	Gao Xiang
	Switch EROFS_I_{VERSION,DATALAYOUT}_BITS into EROFS_I_{VERSION,DATALAYOUT}_MASK. Also avoid erofs_bitrange() since its functionality is simple enough. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230414083027.12307-2-hsiangkao@linux.alibaba.com
2023-04-17	erofs: sunset erofs_dbg()	Gao Xiang
	Such debug messages are rarely used now. Let's get rid of these, and revert locally if they are needed for debugging. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230414083027.12307-1-hsiangkao@linux.alibaba.com
2023-04-17	erofs: fix potential overflow calculating xattr_isize	Jingbo Xu
	Given on-disk i_xattr_icount is 16 bits and xattr_isize is calculated from i_xattr_icount multiplying 4, xattr_isize has a theoretical maximum of 256K (64K * 4). Thus declare xattr_isize as unsigned int to avoid the potential overflow. Fixes: bfb8674dc044 ("staging: erofs: add erofs in-memory stuffs") Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230414061810.6479-1-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-17	erofs: get rid of z_erofs_fill_inode()	Gao Xiang
	Prior to big pclusters, non-compact compression indexes could have empty headers. Let's just avoid the legacy path since it can be handled properly as a specific compression header with z_erofs_fill_inode_lazy() too. Tested with erofs-utils exist versions. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230413092241.73829-1-hsiangkao@linux.alibaba.com
2023-04-17	erofs: enable long extended attribute name prefixes	Jingbo Xu
	Let's enable long xattr name prefix feature. Old kernels will just ignore / skip such extended attributes. In addition, in case you don't want to mount such images, add another incompatible feature as an option for this. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Acked-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230407222808.19670-1-jefflexu@linux.alibaba.com [ Gao Xiang: minor commit message fix. ] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-17	erofs: handle long xattr name prefixes properly	Jingbo Xu
	Make .{list,get}xattr routines adapted to long xattr name prefixes. When the bit 7 of erofs_xattr_entry.e_name_index is set, it indicates that it refers to a long xattr name prefix. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Acked-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230411093537.127286-1-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-17	erofs: add helpers to load long xattr name prefixes	Jingbo Xu
	Long xattr name prefixes will be scanned upon mounting and the in-memory long xattr name prefix array will be initialized accordingly. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Acked-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230407141710.113882-6-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-17	erofs: introduce on-disk format for long xattr name prefixes	Jingbo Xu
	Besides the predefined xattr name prefixes, introduces long xattr name prefixes, which work similarly as the predefined name prefixes, except that they are user specified. It is especially useful for use cases together with overlayfs like Composefs model, which introduces diverse xattr values with only a few common xattr names (trusted.overlay.redirect, trusted.overlay.digest, and maybe more in the future). That makes the existing predefined prefixes ineffective in both image size and runtime performance. When a user specified long xattr name prefix is used, only the trailing part of the xattr name apart from the long xattr name prefix will be stored in erofs_xattr_entry.e_name. e_name is empty if the xattr name matches exactly as the long xattr name prefix. All long xattr prefixes are stored in the packed or meta inode, which depends if fragments feature is enabled or not. For each long xattr name prefix, the on-disk format is kept as the same as the unique metadata format: ALIGN({__le16 len, data}, 4), where len represents the total size of struct erofs_xattr_long_prefix, followed by data of struct erofs_xattr_long_prefix itself. Each erofs_xattr_long_prefix keeps predefined prefixes (base_index) and the remaining prefix string without the trailing '\0'. Two fields are introduced to the on-disk superblock, where xattr_prefix_count represents the total number of the long xattr name prefixes recorded, and xattr_prefix_start represents the start offset of recorded name prefixes in the packed/meta inode divided by 4. When referring to a long xattr name prefix, the highest bit (bit 7) of erofs_xattr_entry.e_name_index is set, while the lower bits (bit 0-6) as a whole represents the index of the referred long name prefix among all long xattr name prefixes. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Acked-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230407141710.113882-5-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-17	erofs: move packed inode out of the compression part	Jingbo Xu
	packed inode could be used in more scenarios which are independent of compression in the future. For example, packed inode could be used to keep extra long xattr prefixes with the help of following patches. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Acked-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230407141710.113882-4-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-04-17	erofs: keep meta inode into erofs_buf	Gao Xiang
	So that erofs_read_metadata() can read metadata from other inodes (e.g. packed inode) as well. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Acked-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230407141710.113882-2-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>