summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-03-30nvme-tcp: fix a possible UAF when failing to allocate an io queueSagi Grimberg
When we allocate a nvme-tcp queue, we set the data_ready callback before we actually need to use it. This creates the potential that if a stray controller sends us data on the socket before we connect, we can trigger the io_work and start consuming the socket. In this case reported: we failed to allocate one of the io queues, and as we start releasing the queues that we already allocated, we get a UAF [1] from the io_work which is running before it should really. Fix this by setting the socket ops callbacks only before we start the queue, so that we can't accidentally schedule the io_work in the initialization phase before the queue started. While we are at it, rename nvme_tcp_restore_sock_calls to pair with nvme_tcp_setup_sock_ops. [1]: [16802.107284] nvme nvme4: starting error recovery [16802.109166] nvme nvme4: Reconnecting in 10 seconds... [16812.173535] nvme nvme4: failed to connect socket: -111 [16812.173745] nvme nvme4: Failed reconnect attempt 1 [16812.173747] nvme nvme4: Reconnecting in 10 seconds... [16822.413555] nvme nvme4: failed to connect socket: -111 [16822.413762] nvme nvme4: Failed reconnect attempt 2 [16822.413765] nvme nvme4: Reconnecting in 10 seconds... [16832.661274] nvme nvme4: creating 32 I/O queues. [16833.919887] BUG: kernel NULL pointer dereference, address: 0000000000000088 [16833.920068] nvme nvme4: Failed reconnect attempt 3 [16833.920094] #PF: supervisor write access in kernel mode [16833.920261] nvme nvme4: Reconnecting in 10 seconds... [16833.920368] #PF: error_code(0x0002) - not-present page [16833.921086] Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp] [16833.921191] RIP: 0010:_raw_spin_lock_bh+0x17/0x30 ... [16833.923138] Call Trace: [16833.923271] <TASK> [16833.923402] lock_sock_nested+0x1e/0x50 [16833.923545] nvme_tcp_try_recv+0x40/0xa0 [nvme_tcp] [16833.923685] nvme_tcp_io_work+0x68/0xa0 [nvme_tcp] [16833.923824] process_one_work+0x1e8/0x390 [16833.923969] worker_thread+0x53/0x3d0 [16833.924104] ? process_one_work+0x390/0x390 [16833.924240] kthread+0x124/0x150 [16833.924376] ? set_kthread_struct+0x50/0x50 [16833.924518] ret_from_fork+0x1f/0x30 [16833.924655] </TASK> Reported-by: Yanjun Zhang <zhangyanjun@cestc.cn> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Yanjun Zhang <zhangyanjun@cestc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-03-29drm/amd/display: Take FEC Overhead into Timeslot CalculationFangzhi Zuo
8b/10b encoding needs to add 3% fec overhead into the pbn. In the Synapcis Cascaded MST hub, the first stage MST branch device needs the information to determine the timeslot count for the second stage MST branch device. Missing this overhead will leads to insufficient timeslot allocation. Cc: stable@vger.kernel.org Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Hersen Wu <hersenxs.wu@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-29drm/amd/display: Add DSC Support for Synaptics Cascaded MST HubFangzhi Zuo
Traditional synaptics hub has one MST branch device without virtual dpcd. Synaptics cascaded hub has two chained MST branch devices. DSC decoding is performed via root MST branch device, instead of the second MST branch device. Reviewed-by: Hersen Wu <hersenxs.wu@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-03-29Merge patch series "RISC-V: Fixes for riscv_has_extension[un]likely()'s ↵Palmer Dabbelt
alternative dependency" Conor Dooley <conor.dooley@microchip.com> says: Here's my attempt at fixing both the use of an FPU on XIP kernels and the issue that Jason ran into where CONFIG_FPU, which needs the alternatives frame work for has_fpu() checks, could be enabled without the alternatives actually being present. For the former, a "slow" fallback that does not use alternatives is added to riscv_has_extension_[un]likely() that can be used with XIP. Obviously, we want to make use of Jisheng's alternatives based approach where possible, so any users of riscv_has_extension_[un]likely() will want to make sure that they select RISCV_ALTERNATIVE. If they don't however, they'll hit the fallback path which (should, sparing a silly mistake from me!) behave in the same way, thus succeeding silently. Sounds like a To prevent "depends on !XIP_KERNEL; select RISCV_ALTERNATIVE" spreading like the plague through the various places that want to check for the presence of extensions, and sidestep the potential silent "success" mentioned above, all users RISCV_ALTERNATIVE are converted from selects to dependencies, with the option being selected for all !XIP_KERNEL builds. I know that the VDSO was a key place that Jisheng wanted to use the new helper rather than static branches, and I think the fallback path should not cause issues there. See the thread at [1] for the prior discussion. 1 - https://lore.kernel.org/linux-riscv/20230128172856.3814-1-jszhang@kernel.org/T/#m21390d570997145d31dd8bb95002fd61f99c6573 [Palmer: merging in the fixes as a branch as there's some features that depend on it.] * b4-shazam-merge: RISC-V: always select RISCV_ALTERNATIVE for non-xip kernels RISC-V: add non-alternative fallback for riscv_has_extension_[un]likely() Link: https://lore.kernel.org/r/20230324100538.3514663-1-conor.dooley@microchip.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-29RISC-V: always select RISCV_ALTERNATIVE for non-xip kernelsConor Dooley
When moving switch_to's has_fpu() over to using riscv_has_extension_likely() rather than static branches, the FPU code gained a dependency on the alternatives framework. That dependency has now been removed, as riscv_has_extension_ikely() now contains a fallback path, using __riscv_isa_extension_available(), but if CONFIG_RISCV_ALTERNATIVE isn't selected when CONFIG_FPU is, has_fpu() checks will not benefit from the "fast path" that the alternatives framework provides. We want to ensure that alternatives are available whenever riscv_has_extension_[un]likely() is used, rather than silently falling back to the slow path, but rather than rely on selecting RISCV_ALTERNATIVE in the myriad of locations that may use riscv_has_extension_[un]likely(), select it (almost) always instead by adding it to the main RISCV config entry. xip kernels cannot make use of the alternatives framework, so it is not enabled for those configurations, although this is the status quo. All current sites that select RISCV_ALTERNATIVE are converted to dependencies on the option instead. The explicit dependencies on !XIP_KERNEL can be dropped, as RISCV_ALTERNATIVE is not user selectable. Fixes: 702e64550b12 ("riscv: fpu: switch has_fpu() to riscv_has_extension_likely()") Link: https://lore.kernel.org/all/ZBruFRwt3rUVngPu@zx2c4.com/ Reported-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20230324100538.3514663-3-conor.dooley@microchip.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-29RISC-V: add non-alternative fallback for riscv_has_extension_[un]likely()Conor Dooley
The has_fpu() check, which in turn calls riscv_has_extension_likely(), relies on alternatives to figure out whether the system has an FPU. As a result, it will malfunction on XIP kernels, as they do not support the alternatives mechanism. When alternatives support is not present, fall back to using __riscv_isa_extension_available() in riscv_has_extension_[un]likely() instead stead, which handily takes the same argument, so that kernels that do not support alternatives can accurately report the presence of FPU support. Fixes: 702e64550b12 ("riscv: fpu: switch has_fpu() to riscv_has_extension_likely()") Link: https://lore.kernel.org/all/ad445951-3d13-4644-94d9-e0989cda39c3@spud/ Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20230324100538.3514663-2-conor.dooley@microchip.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-29thermal: intel: int340x: processor_thermal: Fix additional deadlockSrinivas Pandruvada
Commit 52f04f10b900 ("thermal: intel: int340x: processor_thermal: Fix deadlock") addressed deadlock issue during user space trip update. But it missed a case when thermal zone device is disabled when user writes 0. Call to thermal_zone_device_disable() also causes deadlock as it also tries to lock tz->lock, which is already claimed by trip_point_temp_store() in the thermal core code. Remove call to thermal_zone_device_disable() in the function sys_set_trip_temp(), which is called from trip_point_temp_store(). Fixes: 52f04f10b900 ("thermal: intel: int340x: processor_thermal: Fix deadlock") Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 6.2+ <stable@vger.kernel.org> # 6.2+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-03-29md: fix regression for null-ptr-deference in __md_stop()Yu Kuai
Commit 3e453522593d ("md: Free resources in __md_stop") tried to fix null-ptr-deference for 'active_io' by moving percpu_ref_exit() to __md_stop(), however, the commit also moving 'writes_pending' to __md_stop(), and this will cause mdadm tests broken: BUG: kernel NULL pointer dereference, address: 0000000000000038 Oops: 0000 [#1] PREEMPT SMP CPU: 15 PID: 17830 Comm: mdadm Not tainted 6.3.0-rc3-next-20230324-00009-g520d37 RIP: 0010:free_percpu+0x465/0x670 Call Trace: <TASK> __percpu_ref_exit+0x48/0x70 percpu_ref_exit+0x1a/0x90 __md_stop+0xe9/0x170 do_md_stop+0x1e1/0x7b0 md_ioctl+0x90c/0x1aa0 blkdev_ioctl+0x19b/0x400 vfs_ioctl+0x20/0x50 __x64_sys_ioctl+0xba/0xe0 do_syscall_64+0x6c/0xe0 entry_SYSCALL_64_after_hwframe+0x63/0xcd And the problem can be reporduced 100% by following test: mdadm -CR /dev/md0 -l1 -n1 /dev/sda --force echo inactive > /sys/block/md0/md/array_state echo read-auto > /sys/block/md0/md/array_state echo inactive > /sys/block/md0/md/array_state Root cause: // start raid raid1_run mddev_init_writes_pending percpu_ref_init // inactive raid array_state_store do_md_stop __md_stop percpu_ref_exit // start raid again array_state_store do_md_run raid1_run mddev_init_writes_pending if (mddev->writes_pending.percpu_count_ptr) // won't reinit // inactive raid again ... percpu_ref_exit -> null-ptr-deference Before the commit, 'writes_pending' is exited when mddev is freed, and it's safe to restart raid because mddev_init_writes_pending() already make sure that 'writes_pending' will only be initialized once. Fix the prblem by moving 'writes_pending' back, it's a litter hard to find the relationship between alloc memory and free memory, however, code changes is much less and we lived with this for a long time already. Fixes: 3e453522593d ("md: Free resources in __md_stop") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230328094400.1448955-1-yukuai1@huaweicloud.com
2023-03-29Merge tag 'xtensa-20230327' of https://github.com/jcmvbkbc/linux-xtensaLinus Torvalds
Pull xtensa fixes from Max Filippov: - fix KASAN report in show_stack - drop linux-xtensa mailing list from the MAINTAINERS file * tag 'xtensa-20230327' of https://github.com/jcmvbkbc/linux-xtensa: MAINTAINERS: xtensa: drop linux-xtensa@linux-xtensa.org mailing list xtensa: fix KASAN report for show_stack
2023-03-29Merge tag 'f2fs-fix-6.3-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fix from Jaegeuk Kim: "This fixes a tracepoint field size in f2fs in preparation for stricter rules for tracing fields" * tag 'f2fs-fix-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: Fix f2fs_truncate_partial_nodes ftrace event
2023-03-29io_uring/rsrc: fix rogue rsrc node grabbingPavel Begunkov
We should not be looking at ctx->rsrc_node and anyhow modifying the node without holding uring_lock, grabbing references in such a way is not safe either. Cc: stable@vger.kernel.org Fixes: 5106dd6e74ab6 ("io_uring: propagate issue_flags state down to file assignment") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1202ede2d7bb90136e3482b2b84aad9ed483e5d6.1680098433.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-29drm: test: Fix 32-bit issue in drm_buddy_testDavid Gow
The drm_buddy_test KUnit tests verify that returned blocks have sizes which are powers of two using is_power_of_2(). However, is_power_of_2() operations on a 'long', but the block size is a u64. So on systems where long is 32-bit, this can sometimes fail even on correctly sized blocks. This only reproduces randomly, as the parameters passed to the buddy allocator in this test are random. The seed 0xb2e06022 reproduced it fine here. For now, just hardcode an is_power_of_2() implementation using x & (x - 1). Signed-off-by: David Gow <davidgow@google.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Arunpravin Paneer Selvam <arunpravin.paneerselvam@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230329065532.2122295-2-davidgow@google.com Signed-off-by: Christian König <christian.koenig@amd.com>
2023-03-29drm: buddy_allocator: Fix buddy allocator init on 32-bit systemsDavid Gow
The drm buddy allocator tests were broken on 32-bit systems, as rounddown_pow_of_two() takes a long, and the buddy allocator handles 64-bit sizes even on 32-bit systems. This can be reproduced with the drm_buddy_allocator KUnit tests on i386: ./tools/testing/kunit/kunit.py run --arch i386 \ --kunitconfig ./drivers/gpu/drm/tests drm_buddy (It results in kernel BUG_ON() when too many blocks are created, due to the block size being too small.) This was independently uncovered (and fixed) by Luís Mendes, whose patch added a new u64 variant of rounddown_pow_of_two(). This version instead recalculates the size based on the order. Reported-by: Luís Mendes <luis.p.mendes@gmail.com> Link: https://lore.kernel.org/lkml/CAEzXK1oghXAB_KpKpm=-CviDQbNaH0qfgYTSSjZgvvyj4U78AA@mail.gmail.com/T/ Signed-off-by: David Gow <davidgow@google.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Arunpravin Paneer Selvam <arunpravin.paneerselvam@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230329065532.2122295-1-davidgow@google.com Signed-off-by: Christian König <christian.koenig@amd.com>
2023-03-29ALSA: hda/realtek: Add quirk for Lenovo ZhaoYang CF4620Zhuangwenhui
Fix headset microphone detection on Lenovo ZhaoYang CF4620Z. [ adjusted to be applicable to the latest tree -- tiwai ] Signed-off-by: huangwenhui <huangwenhuia@uniontech.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20230328074644.30142-1-huangwenhuia@uniontech.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-03-29powerpc/pseries/vas: Ignore VAS update for DLPAR if copy/paste is not enabledHaren Myneni
The hypervisor supports user-mode NX from Power10. pseries_vas_dlpar_cpu() is called from lparcfg_write() to update VAS windows for DLPAR event in shared processor mode and the kernel gets -ENOTSUPP for HCALLs if the user-mode NX is not supported. The current VAS implementation also supports only with Radix page tables. Whereas in dedicated processor mode, pseries_vas_notifier() is registered only if the copy/paste feature is enabled. So instead of displaying HCALL error messages, update VAS capabilities if the copy/paste feature is available. This patch ignores updating VAS capabilities in pseries_vas_dlpar_cpu() and returns success if the copy/paste feature is not enabled. Then lparcfg_write() completes the processor DLPAR operations without any failures. Fixes: 2147783d6bf0 ("powerpc/pseries: Use lparcfg to reconfig VAS windows for DLPAR CPU") Cc: stable@vger.kernel.org # v6.1+ Signed-off-by: Haren Myneni <haren@linux.ibm.com> Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/1d0e727e7dbd9a28627ef08ca9df9c86a50175e2.camel@linux.ibm.com
2023-03-29cacheinfo: Fix LLC is not exported through sysfsYicong Yang
After entering 6.3-rc1 the LLC cacheinfo is not exported on our ACPI based arm64 server. This is because the LLC cacheinfo is partly reset when secondary CPUs boot up. On arm64 the primary cpu will allocate and setup cacheinfo: init_cpu_topology() for_each_possible_cpu() fetch_cache_info() // Allocate cacheinfo and init levels detect_cache_attributes() cache_shared_cpu_map_setup() if (!last_level_cache_is_valid()) // not valid, setup LLC cache_setup_properties() // setup LLC On secondary CPU boot up: detect_cache_attributes() populate_cache_leaves() get_cache_type() // Get cache type from clidr_el1, // for LLC type=CACHE_TYPE_NOCACHE cache_shared_cpu_map_setup() if (!last_level_cache_is_valid()) // Valid and won't go to this branch, // leave LLC's type=CACHE_TYPE_NOCACHE The last_level_cache_is_valid() use cacheinfo->{attributes, fw_token} to test it's valid or not, but populate_cache_leaves() will only reset LLC's type, so we won't try to re-setup LLC's type and leave it CACHE_TYPE_NOCACHE and won't export it through sysfs. This patch tries to fix this by not re-populating the cache leaves if the LLC is valid. Fixes: 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU") Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Reviewed-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20230328114915.33340-1-yangyicong@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-29drm/nouveau/kms: Fix backlight registrationHans de Goede
The nouveau code used to call drm_fb_helper_initial_config() from nouveau_fbcon_init() before calling drm_dev_register(). This would probe all connectors so that drm_connector->status could be used during backlight registration which runs from nouveau_connector_late_register(). After commit 4a16dd9d18a0 ("drm/nouveau/kms: switch to drm fbdev helpers") the fbdev emulation code, which now is a drm-client, can only run after drm_dev_register(). So during backlight registration the connectors are not probed yet and the drm_connector->status == connected check in nv50_backlight_init() would now always fail. Replace the drm_connector->status == connected check with a drm_helper_probe_detect() == connected check to fix nv_backlight no longer getting registered because of this. Fixes: 4a16dd9d18a0 ("drm/nouveau/kms: switch to drm fbdev helpers") Link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/202 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2181941 Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230326205433.36485-1-hdegoede@redhat.com
2023-03-29dt-bindings: pinctrl: qcom,sm8550-lpass-lpi: allow input-enabled and ↵Krzysztof Kozlowski
bias-bus-hold Add missing common pin configuration properties: input-enabled and bias-bus-hold. Fixes: 268e97ccc311 ("dt-bindings: pinctrl: qcom,sm8550-lpass-lpi-pinctrl: add SM8550 LPASS") Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20230324084127.29362-1-krzysztof.kozlowski@linaro.org Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-03-29net: wwan: iosm: fixes 7560 modem crashM Chetan Kumar
ModemManger/Apps probing the wwan0xmmrpc0 port for 7560 Modem results in modem crash. 7560 Modem FW uses the MBIM interface for control command communication whereas 7360 uses Intel RPC interface so disable wwan0xmmrpc0 port for 7560. Fixes: d08b0f8f46e4 ("net: wwan: iosm: add rpc interface for xmm modems") Reported-and-tested-by: Martin <mwolf@adiumentum.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217200 Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com> Signed-off-by: Shane Parslow <shaneparslow808@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-29ALSA: ymfpci: Fix BUG_ON in probe functionTasos Sahanidis
The snd_dma_buffer.bytes field now contains the aligned size, which this snd_BUG_ON() did not account for, resulting in the following: [ 9.625915] ------------[ cut here ]------------ [ 9.633440] WARNING: CPU: 0 PID: 126 at sound/pci/ymfpci/ymfpci_main.c:2168 snd_ymfpci_create+0x681/0x698 [snd_ymfpci] [ 9.648926] Modules linked in: snd_ymfpci(+) snd_intel_dspcfg kvm(+) snd_intel_sdw_acpi snd_ac97_codec snd_mpu401_uart snd_opl3_lib irqbypass snd_hda_codec gameport snd_rawmidi crct10dif_pclmul crc32_pclmul cfg80211 snd_hda_core polyval_clmulni polyval_generic gf128mul snd_seq_device ghash_clmulni_intel snd_hwdep ac97_bus sha512_ssse3 rfkill snd_pcm aesni_intel tg3 snd_timer crypto_simd snd mxm_wmi libphy cryptd k10temp fam15h_power pcspkr soundcore sp5100_tco wmi acpi_cpufreq mac_hid dm_multipath sg loop fuse dm_mod bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi firewire_ohci crc32c_intel firewire_core xhci_pci crc_itu_t pata_via xhci_pci_renesas floppy [ 9.711849] CPU: 0 PID: 126 Comm: kworker/0:2 Not tainted 6.1.21-1-lts #1 08d2e5ece03136efa7c6aeea9a9c40916b1bd8da [ 9.722200] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./990FX Extreme4, BIOS P2.70 06/05/2014 [ 9.732204] Workqueue: events work_for_cpu_fn [ 9.736580] RIP: 0010:snd_ymfpci_create+0x681/0x698 [snd_ymfpci] [ 9.742594] Code: 8c c0 4c 89 e2 48 89 df 48 c7 c6 92 c6 8c c0 e8 15 d0 e9 ff 48 83 c4 08 44 89 e8 5b 5d 41 5c 41 5d 41 5e 41 5f e9 d3 7a 33 e3 <0f> 0b e9 cb fd ff ff 41 bd fb ff ff ff eb db 41 bd f4 ff ff ff eb [ 9.761358] RSP: 0018:ffffab64804e7da0 EFLAGS: 00010287 [ 9.766594] RAX: ffff8fa2df06c400 RBX: ffff8fa3073a8000 RCX: ffff8fa303fbc4a8 [ 9.773734] RDX: ffff8fa2df06d000 RSI: 0000000000000010 RDI: 0000000000000020 [ 9.780876] RBP: ffff8fa300b5d0d0 R08: ffff8fa3073a8e50 R09: 00000000df06bf00 [ 9.788018] R10: ffff8fa2df06bf00 R11: 00000000df068200 R12: ffff8fa3073a8918 [ 9.795159] R13: 0000000000000000 R14: 0000000000000080 R15: ffff8fa2df068200 [ 9.802317] FS: 0000000000000000(0000) GS:ffff8fa9fec00000(0000) knlGS:0000000000000000 [ 9.810414] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9.816158] CR2: 000055febaf66500 CR3: 0000000101a2e000 CR4: 00000000000406f0 [ 9.823301] Call Trace: [ 9.825747] <TASK> [ 9.827889] snd_card_ymfpci_probe+0x194/0x950 [snd_ymfpci b78a5fe64b5663a6390a909c67808567e3e73615] [ 9.837030] ? finish_task_switch.isra.0+0x90/0x2d0 [ 9.841918] local_pci_probe+0x45/0x80 [ 9.845680] work_for_cpu_fn+0x1a/0x30 [ 9.849431] process_one_work+0x1c7/0x380 [ 9.853464] worker_thread+0x1af/0x390 [ 9.857225] ? rescuer_thread+0x3b0/0x3b0 [ 9.861254] kthread+0xde/0x110 [ 9.864414] ? kthread_complete_and_exit+0x20/0x20 [ 9.869210] ret_from_fork+0x22/0x30 [ 9.872792] </TASK> [ 9.874985] ---[ end trace 0000000000000000 ]--- Fixes: 5c1733e33c88 ("ALSA: memalloc: Align buffer allocations in page size") Signed-off-by: Tasos Sahanidis <tasos@tasossah.com> Link: https://lore.kernel.org/r/20230329032808.170403-1-tasos@tasossah.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-03-29ALSA: ymfpci: Create card with device-managed snd_devm_card_new()Tasos Sahanidis
snd_card_ymfpci_remove() was removed in commit c6e6bb5eab74 ("ALSA: ymfpci: Allocate resources with device-managed APIs"), but the call to snd_card_new() was not replaced with snd_devm_card_new(). Since there was no longer a call to snd_card_free, unloading the module would eventually result in Oops: [697561.532887] BUG: unable to handle page fault for address: ffffffffc0924480 [697561.532893] #PF: supervisor read access in kernel mode [697561.532896] #PF: error_code(0x0000) - not-present page [697561.532899] PGD ae1e15067 P4D ae1e15067 PUD ae1e17067 PMD 11a8f5067 PTE 0 [697561.532905] Oops: 0000 [#1] PREEMPT SMP NOPTI [697561.532909] CPU: 21 PID: 5080 Comm: wireplumber Tainted: G W OE 6.2.7 #1 [697561.532914] Hardware name: System manufacturer System Product Name/TUF GAMING X570-PLUS, BIOS 4408 10/28/2022 [697561.532916] RIP: 0010:try_module_get.part.0+0x1a/0xe0 [697561.532924] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 49 89 fc bf 01 00 00 00 e8 56 3c f8 ff <41> 83 3c 24 02 0f 84 96 00 00 00 41 8b 84 24 30 03 00 00 85 c0 0f [697561.532927] RSP: 0018:ffffbe9b858c3bd8 EFLAGS: 00010246 [697561.532930] RAX: ffff9815d14f1900 RBX: ffff9815c14e6000 RCX: 0000000000000000 [697561.532933] RDX: 0000000000000000 RSI: ffffffffc055092c RDI: ffffffffb3778c1a [697561.532935] RBP: ffffbe9b858c3be8 R08: 0000000000000040 R09: ffff981a1a741380 [697561.532937] R10: ffffbe9b858c3c80 R11: 00000009d56533a6 R12: ffffffffc0924480 [697561.532939] R13: ffff9823439d8500 R14: 0000000000000025 R15: ffff9815cd109f80 [697561.532942] FS: 00007f13084f1f80(0000) GS:ffff9824aef40000(0000) knlGS:0000000000000000 [697561.532945] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [697561.532947] CR2: ffffffffc0924480 CR3: 0000000145344000 CR4: 0000000000350ee0 [697561.532949] Call Trace: [697561.532951] <TASK> [697561.532955] try_module_get+0x13/0x30 [697561.532960] snd_ctl_open+0x61/0x1c0 [snd] [697561.532976] snd_open+0xb4/0x1e0 [snd] [697561.532989] chrdev_open+0xc7/0x240 [697561.532995] ? fsnotify_perm.part.0+0x6e/0x160 [697561.533000] ? __pfx_chrdev_open+0x10/0x10 [697561.533005] do_dentry_open+0x169/0x440 [697561.533009] vfs_open+0x2d/0x40 [697561.533012] path_openat+0xa9d/0x10d0 [697561.533017] ? debug_smp_processor_id+0x17/0x20 [697561.533022] ? trigger_load_balance+0x65/0x370 [697561.533026] do_filp_open+0xb2/0x160 [697561.533032] ? _raw_spin_unlock+0x19/0x40 [697561.533036] ? alloc_fd+0xa9/0x190 [697561.533040] do_sys_openat2+0x9f/0x160 [697561.533044] __x64_sys_openat+0x55/0x90 [697561.533048] do_syscall_64+0x3b/0x90 [697561.533052] entry_SYSCALL_64_after_hwframe+0x72/0xdc [697561.533056] RIP: 0033:0x7f1308a40db4 [697561.533059] Code: 24 20 eb 8f 66 90 44 89 54 24 0c e8 46 68 f8 ff 44 8b 54 24 0c 44 89 e2 48 89 ee 41 89 c0 bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 32 44 89 c7 89 44 24 0c e8 78 68 f8 ff 8b 44 [697561.533062] RSP: 002b:00007ffcce664450 EFLAGS: 00000293 ORIG_RAX: 0000000000000101 [697561.533066] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f1308a40db4 [697561.533068] RDX: 0000000000080000 RSI: 00007ffcce664690 RDI: 00000000ffffff9c [697561.533070] RBP: 00007ffcce664690 R08: 0000000000000000 R09: 0000000000000012 [697561.533072] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000080000 [697561.533074] R13: 00007f13054b069b R14: 0000565209f83200 R15: 0000000000000000 [697561.533078] </TASK> Fixes: c6e6bb5eab74 ("ALSA: ymfpci: Allocate resources with device-managed APIs") Signed-off-by: Tasos Sahanidis <tasos@tasossah.com> Link: https://lore.kernel.org/r/20230329032422.170024-1-tasos@tasossah.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-03-28net: ethernet: mtk_eth_soc: fix tx throughput regression with direct 1G linksFelix Fietkau
Using the QDMA tx scheduler to throttle tx to line speed works fine for switch ports, but apparently caused a regression on non-switch ports. Based on a number of tests, it seems that this throttling can be safely dropped without re-introducing the issues on switch ports that the tx scheduling changes resolved. Link: https://lore.kernel.org/netdev/trinity-92c3826f-c2c8-40af-8339-bc6d0d3ffea4-1678213958520@3c-app-gmx-bs16/ Fixes: f63959c7eec3 ("net: ethernet: mtk_eth_soc: implement multi-queue support for per-port queues") Reported-by: Frank Wunderlich <frank-w@public-files.de> Reported-by: Daniel Golle <daniel@makrotopia.org> Tested-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20230324140404.95745-1-nbd@nbd.name Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-29btrfs: ignore fiemap path cache when there are multiple paths for a nodeFilipe Manana
During fiemap, when walking backreferences to determine if a b+tree node/leaf is shared, we may find a tree block (leaf or node) for which two parents were added to the references ulist. This happens if we get for example one direct ref (shared tree block ref) and one indirect ref (non-shared tree block ref) for the tree block at the current level, which can happen during relocation. In that case the fiemap path cache can not be used since it's meant for a single path, with one tree block at each possible level, so having multiple references for a tree block at any level may result in getting the level counter exceed BTRFS_MAX_LEVEL and eventually trigger the warning: WARN_ON_ONCE(level >= BTRFS_MAX_LEVEL) at lookup_backref_shared_cache() and at store_backref_shared_cache(). This is harmless since the code ignores any level >= BTRFS_MAX_LEVEL, the warning is there just to catch any unexpected case like the one described above. However if a user finds this it may be scary and get reported. So just ignore the path cache once we find a tree block for which there are more than one reference, which is the less common case, and update the cache with the sharedness check result for all levels below the level for which we found multiple references. Reported-by: Jarno Pelkonen <jarno.pelkonen@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CAKv8qLmDNAGJGCtsevxx_VZ_YOvvs1L83iEJkTgyA4joJertng@mail.gmail.com/ Fixes: 12a824dc67a6 ("btrfs: speedup checking for extent sharedness during fiemap") CC: stable@vger.kernel.org # 6.1+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-28Merge tag 'urgent-rcu.2023.03.28a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU fix from Paul McKenney: "This brings the rcu_torture_read event trace into line with the new trace tools by replacing this event trace's __field() with the corresponding __array(). Without this, the new trace tools will fail when presented wtih an rcu_torture_read event trace, which is a regression from the viewpoint of trace tools users" Link: https://lore.kernel.org/all/20230320133650.5388a05e@gandalf.local.home/ * tag 'urgent-rcu.2023.03.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: rcu: Fix rcu_torture_read ftrace event
2023-03-28Merge tag 'linux-kselftest-fixes-6.3-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest fixes from Shuah Khan: "One single fix for sigaltstack test -Wuninitialized warning found when building with clang" * tag 'linux-kselftest-fixes-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests: sigaltstack: fix -Wuninitialized
2023-03-28thermal: core: Drop excessive lockdep_assert_held() callsRafael J. Wysocki
The lockdep_assert_held() calls added to cooling_device_stats_setup() and cooling_device_stats_destroy() by commit 790930f44289 ("thermal: core: Introduce thermal_cooling_device_update()") trigger false-positive lockdep reports in code paths that are not subject to race conditions (before cooling device registration and after cooling device removal). For this reason, remove the lockdep_assert_held() calls from both cooling_device_stats_setup() and cooling_device_stats_destroy() and add one to thermal_cooling_device_stats_reinit() that has to be called under the cdev lock. Fixes: 790930f44289 ("thermal: core: Introduce thermal_cooling_device_update()") Link: https://lore.kernel.org/linux-acpi/ZCIDTLFt27Ei7+V6@ideak-desk.fi.intel.com Reported-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-03-28Merge tag 's390-6.3-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Fix an error handling issue with PTRACE_GET_LAST_BREAK request so that -EFAULT is returned if put_user() fails, instead of ignoring it - Fix a build race for the modules_prepare target when CONFIG_EXPOLINE_EXTERN is enabled by reintroducing the dependence on scripts - Fix a memory leak in vfio_ap device driver - Add missing earlyclobber annotations to __clear_user() inline assembly to prevent incorrect register allocation * tag 's390-6.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/ptrace: fix PTRACE_GET_LAST_BREAK error handling s390: reintroduce expoline dependence to scripts s390/vfio-ap: fix memory leak in vfio_ap device driver s390/uaccess: add missing earlyclobber annotations to __clear_user()
2023-03-28ice: fix invalid check for empty list in ice_sched_assoc_vsi_to_agg()Jakob Koschel
The code implicitly assumes that the list iterator finds a correct handle. If 'vsi_handle' is not found the 'old_agg_vsi_info' was pointing to an bogus memory location. For safety a separate list iterator variable should be used to make the != NULL check on 'old_agg_vsi_info' correct under any circumstances. Additionally Linus proposed to avoid any use of the list iterator variable after the loop, in the attempt to move the list iterator variable declaration into the macro to avoid any potential misuse after the loop. Using it in a pointer comparison after the loop is undefined behavior and should be omitted if possible [1]. Fixes: 37c592062b16 ("ice: remove the VSI info from previous agg") Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1] Signed-off-by: Jakob Koschel <jkl820.git@gmail.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-28ice: add profile conflict check for AVF FDIRJunfeng Guo
Add profile conflict check while adding some FDIR rules to avoid unexpected flow behavior, rules may have conflict including: IPv4 <---> {IPv4_UDP, IPv4_TCP, IPv4_SCTP} IPv6 <---> {IPv6_UDP, IPv6_TCP, IPv6_SCTP} For example, when we create an FDIR rule for IPv4, this rule will work on packets including IPv4, IPv4_UDP, IPv4_TCP and IPv4_SCTP. But if we then create an FDIR rule for IPv4_UDP and then destroy it, the first FDIR rule for IPv4 cannot work on pkt IPv4_UDP then. To prevent this unexpected behavior, we add restriction in software when creating FDIR rules by adding necessary profile conflict check. Fixes: 1f7ea1cd6a37 ("ice: Enable FDIR Configure for AVF") Signed-off-by: Junfeng Guo <junfeng.guo@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-28ice: Fix ice_cfg_rdma_fltr() to only update relevant fieldsBrett Creeley
The current implementation causes ice_vsi_update() to update all VSI fields based on the cached VSI context. This also assumes that the ICE_AQ_VSI_PROP_Q_OPT_VALID bit is set. This can cause problems if the VSI context is not correctly synced by the driver. Fix this by only updating the fields that correspond to ICE_AQ_VSI_PROP_Q_OPT_VALID. Also, make sure to save the updated result in the cached VSI context on success. Fixes: 348048e724a0 ("ice: Implement iidc operations") Co-developed-by: Robert Malz <robertx.malz@intel.com> Signed-off-by: Robert Malz <robertx.malz@intel.com> Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com> Tested-by: Jakub Andrysiak <jakub.andrysiak@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-28ice: fix W=1 headers mismatchJesse Brandeburg
make modules W=1 returns: .../ice/ice_txrx_lib.c:448: warning: Function parameter or member 'first_idx' not described in 'ice_finalize_xdp_rx' .../ice/ice_txrx.c:948: warning: Function parameter or member 'ntc' not described in 'ice_get_rx_buf' .../ice/ice_txrx.c:1038: warning: Excess function parameter 'rx_buf' description in 'ice_construct_skb' Fix these warnings by adding and deleting the deviant arguments. Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side") Fixes: d7956d81f150 ("ice: Pull out next_to_clean bump out of ice_put_rx_buf()") CC: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-28iommu/exynos: Fix set_platform_dma_ops() callbackMarek Szyprowski
There are some subtle differences between release_device() and set_platform_dma_ops() callbacks, so separate those two callbacks. Device links should be removed only in release_device(), because they were created in probe_device() on purpose and they are needed for proper Exynos IOMMU driver operation. While fixing this, remove the conditional code as it is not really needed. Reported-by: Jason Gunthorpe <jgg@ziepe.ca> Fixes: 189d496b48b1 ("iommu/exynos: Add missing set_platform_dma_ops callback") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Sam Protsenko <semen.protsenko@linaro.org> Link: https://lore.kernel.org/r/20230315232514.1046589-1-m.szyprowski@samsung.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-28pinctrl: amd: Disable and mask interrupts on resumeKornel Dulęba
This fixes a similar problem to the one observed in: commit 4e5a04be88fe ("pinctrl: amd: disable and mask interrupts on probe"). On some systems, during suspend/resume cycle firmware leaves an interrupt enabled on a pin that is not used by the kernel. This confuses the AMD pinctrl driver and causes spurious interrupts. The driver already has logic to detect if a pin is used by the kernel. Leverage it to re-initialize interrupt fields of a pin only if it's not used by us. Cc: stable@vger.kernel.org Fixes: dbad75dd1f25 ("pinctrl: add AMD GPIO driver support.") Signed-off-by: Kornel Dulęba <korneld@chromium.org> Link: https://lore.kernel.org/r/20230320093259.845178-1-korneld@chromium.org Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-03-28io_uring/poll: clear single/double poll flags on poll armingJens Axboe
Unless we have at least one entry queued, then don't call into io_poll_remove_entries(). Normally this isn't possible, but if we retry poll then we can have ->nr_entries cleared again as we're setting it up. If this happens for a poll retry, then we'll still have at least REQ_F_SINGLE_POLL set. io_poll_remove_entries() then thinks it has entries to remove. Clear REQ_F_SINGLE_POLL and REQ_F_DOUBLE_POLL unconditionally when arming a poll request. Fixes: c16bda37594f ("io_uring/poll: allow some retries for poll triggering spuriously") Cc: stable@vger.kernel.org Reported-by: Pengfei Xu <pengfei.xu@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-28Merge branch 'xen-netback-fix-issue-introduced-recently'Paolo Abeni
Juergen Gross says: ==================== xen/netback: fix issue introduced recently The fix for XSA-423 introduced a bug which resulted in loss of network connection in some configurations. The first patch is fixing the issue, while the second one is removing a test which isn't needed. ==================== Link: https://lore.kernel.org/r/20230327083646.18690-1-jgross@suse.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-28xen/netback: remove not needed test in xenvif_tx_build_gops()Juergen Gross
The tests for the number of grant mapping or copy operations reaching the array size of the operations buffer at the end of the main loop in xenvif_tx_build_gops() isn't needed. The loop can handle at maximum MAX_PENDING_REQS transfer requests, as XEN_RING_NR_UNCONSUMED_REQUESTS() is taking unsent responses into consideration, too. Remove the tests. Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Paul Durrant <paul@xen.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-28xen/netback: don't do grant copy across page boundaryJuergen Gross
Fix xenvif_get_requests() not to do grant copy operations across local page boundaries. This requires to double the maximum number of copy operations per queue, as each copy could now be split into 2. Make sure that struct xenvif_tx_cb doesn't grow too large. Cc: stable@vger.kernel.org Fixes: ad7f402ae4f4 ("xen/netback: Ensure protocol headers don't fall in the non-linear area") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Paul Durrant <paul@xen.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-28smsc911x: avoid PHY being resumed when interface is not upWolfram Sang
SMSC911x doesn't need mdiobus suspend/resume, that's why it sets 'mac_managed_pm'. However, setting it needs to be moved from init to probe, so mdiobus PM functions will really never be called (e.g. when the interface is not up yet during suspend/resume). Fixes: 3ce9f2bef755 ("net: smsc911x: Stop and start PHY during suspend and resume") Suggested-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230327083138.6044-1-wsa+renesas@sang-engineering.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-28powerpc: Don't try to copy PPR for task with NULL pt_regsJens Axboe
powerpc sets up PF_KTHREAD and PF_IO_WORKER with a NULL pt_regs, which from my (arguably very short) checking is not commonly done for other archs. This is fine, except when PF_IO_WORKER's have been created and the task does something that causes a coredump to be generated. Then we get this crash: Kernel attempted to read user page (160) - exploit attempt? (uid: 1000) BUG: Kernel NULL pointer dereference on read at 0x00000160 Faulting instruction address: 0xc0000000000c3a60 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=32 NUMA pSeries Modules linked in: bochs drm_vram_helper drm_kms_helper xts binfmt_misc ecb ctr syscopyarea sysfillrect cbc sysimgblt drm_ttm_helper aes_generic ttm sg libaes evdev joydev virtio_balloon vmx_crypto gf128mul drm dm_mod fuse loop configfs drm_panel_orientation_quirks ip_tables x_tables autofs4 hid_generic usbhid hid xhci_pci xhci_hcd usbcore usb_common sd_mod CPU: 1 PID: 1982 Comm: ppc-crash Not tainted 6.3.0-rc2+ #88 Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries NIP: c0000000000c3a60 LR: c000000000039944 CTR: c0000000000398e0 REGS: c0000000041833b0 TRAP: 0300 Not tainted (6.3.0-rc2+) MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 88082828 XER: 200400f8 ... NIP memcpy_power7+0x200/0x7d0 LR ppr_get+0x64/0xb0 Call Trace: ppr_get+0x40/0xb0 (unreliable) __regset_get+0x180/0x1f0 regset_get_alloc+0x64/0x90 elf_core_dump+0xb98/0x1b60 do_coredump+0x1c34/0x24a0 get_signal+0x71c/0x1410 do_notify_resume+0x140/0x6f0 interrupt_exit_user_prepare_main+0x29c/0x320 interrupt_exit_user_prepare+0x6c/0xa0 interrupt_return_srr_user+0x8/0x138 Because ppr_get() is trying to copy from a PF_IO_WORKER with a NULL pt_regs. Check for a valid pt_regs in both ppc_get/ppr_set, and return an error if not set. The actual error value doesn't seem to be important here, so just pick -EINVAL. Fixes: fa439810cc1b ("powerpc/ptrace: Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR") Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: Jens Axboe <axboe@kernel.dk> [mpe: Trim oops in change log, add Fixes & Cc stable] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/d9f63344-fe7c-56ae-b420-4a1a04a2ae4c@kernel.dk
2023-03-28powerpc/64s: Fix __pte_needs_flush() false positive warningBenjamin Gray
Userspace PROT_NONE ptes set _PAGE_PRIVILEGED, triggering a false positive debug assertion that __pte_flags_need_flush() is not called on a kernel mapping. Detect when it is a userspace PROT_NONE page by checking the required bits of PAGE_NONE are set, and none of the RWX bits are set. pte_protnone() is insufficient here because it always returns 0 when CONFIG_NUMA_BALANCING=n. Fixes: b11931e9adc1 ("powerpc/64s: add pte_needs_flush and huge_pmd_needs_flush") Cc: stable@vger.kernel.org # v6.1+ Reported-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230302225947.81083-1-bgray@linux.ibm.com
2023-03-28Merge branch 'net-mvpp2-rss-fixes'Paolo Abeni
Sven Auhagen says: ==================== net: mvpp2: rss fixes This patch series fixes up some rss problems in the mvpp2 driver. The classifier is missing some fragmentation flags, the parser has the QinQ headers switched and the PPPoE Layer 4 detecion is not working correctly. This is leading to no or bad rss for the default settings. ==================== Link: https://lore.kernel.org/r/20230325163903.ofefgus43x66as7i@Svens-MacBookPro.local Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-28net: mvpp2: parser fix PPPoESven Auhagen
In PPPoE add all IPv4 header option length to the parser and adjust the L3 and L4 offset accordingly. Currently the L4 match does not work with PPPoE and all packets are matched as L3 IP4 OPT. Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit") Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-28net: mvpp2: parser fix QinQSven Auhagen
The mvpp2 parser entry for QinQ has the inner and outer VLAN in the wrong order. Fix the problem by swapping them. Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit") Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de> Reviewed-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-28net: mvpp2: classifier flow fix fragmentation flagsSven Auhagen
Add missing IP Fragmentation Flag. Fixes: f9358e12a0af ("net: mvpp2: split ingress traffic into multiple flows") Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de> Reviewed-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-27Merge tag 'linux-can-fixes-for-6.3-20230327' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2023-03-27 Oleksij Rempel and Hillf Danton contribute a patch for the CAN J1939 protocol that prevents a potential deadlock in j1939_sk_errqueue(). Ivan Orlov fixes an uninit-value in the CAN BCM protocol in the bcm_tx_setup() function. * tag 'linux-can-fixes-for-6.3-20230327' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: bcm: bcm_tx_setup(): fix KMSAN uninit-value in vfs_write can: j1939: prevent deadlock by moving j1939_sk_errqueue() ==================== Link: https://lore.kernel.org/r/20230327124807.1157134-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-27MAINTAINERS: remove the linux-nfc@lists.01.org listLukas Bulwahn
Some MAINTAINERS sections mention to mail patches to the list linux-nfc@lists.01.org. Probably due to changes on Intel's 01.org website and servers, the list server lists.01.org/ml01.01.org is simply gone. Considering emails recorded on lore.kernel.org, only a handful of emails where sent to the linux-nfc@lists.01.org list, and they are usually also sent to the netdev mailing list as well, where they are then picked up. So, there is no big benefit in restoring the linux-nfc elsewhere. Remove all occurrences of the linux-nfc@lists.01.org list in MAINTAINERS. Suggested-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/all/CAKXUXMzggxQ43DUZZRkPMGdo5WkzgA=i14ySJUFw4kZfE5ZaZA@mail.gmail.com/ Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20230324081613.32000-1-lukas.bulwahn@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-27net: fman: Add myself as a reviewerSean Anderson
I've read through or reworked a good portion of this driver. Add myself as a reviewer. Signed-off-by: Sean Anderson <sean.anderson@seco.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Link: https://lore.kernel.org/r/20230323145957.2999211-1-sean.anderson@seco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-28nvme-pci: mark Lexar NM760 as IGNORE_DEV_SUBNQNJuraj Pecigos
A system with more than one of these SSDs will only have one usable. The kernel fails to detect more than one nvme device due to duplicate cntlids. before: [ 9.395229] nvme 0000:01:00.0: platform quirk: setting simple suspend [ 9.395262] nvme nvme0: pci function 0000:01:00.0 [ 9.395282] nvme 0000:03:00.0: platform quirk: setting simple suspend [ 9.395305] nvme nvme1: pci function 0000:03:00.0 [ 9.409873] nvme nvme0: Duplicate cntlid 1 with nvme1, subsys nqn.2022-07.com.siliconmotion:nvm-subsystem-sn- , rejecting [ 9.409982] nvme nvme0: Removing after probe failure status: -22 [ 9.427487] nvme nvme1: allocated 64 MiB host memory buffer. [ 9.445088] nvme nvme1: 16/0/0 default/read/poll queues [ 9.449898] nvme nvme1: Ignoring bogus Namespace Identifiers after: [ 1.161890] nvme 0000:01:00.0: platform quirk: setting simple suspend [ 1.162660] nvme nvme0: pci function 0000:01:00.0 [ 1.162684] nvme 0000:03:00.0: platform quirk: setting simple suspend [ 1.162707] nvme nvme1: pci function 0000:03:00.0 [ 1.191354] nvme nvme0: allocated 64 MiB host memory buffer. [ 1.193378] nvme nvme1: allocated 64 MiB host memory buffer. [ 1.211044] nvme nvme1: 16/0/0 default/read/poll queues [ 1.211080] nvme nvme0: 16/0/0 default/read/poll queues [ 1.216145] nvme nvme0: Ignoring bogus Namespace Identifiers [ 1.216261] nvme nvme1: Ignoring bogus Namespace Identifiers Adding the NVME_QUIRK_IGNORE_DEV_SUBNQN quirk to resolves the issue. Signed-off-by: Juraj Pecigos <kernel@juraj.dev> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-03-28btrfs: fix deadlock when aborting transaction during relocation with scrubFilipe Manana
Before relocating a block group we pause scrub, then do the relocation and then unpause scrub. The relocation process requires starting and committing a transaction, and if we have a failure in the critical section of the transaction commit path (transaction state >= TRANS_STATE_COMMIT_START), we will deadlock if there is a paused scrub. That results in stack traces like the following: [42.479] BTRFS info (device sdc): relocating block group 53876686848 flags metadata|raid6 [42.936] BTRFS warning (device sdc): Skipping commit of aborted transaction. [42.936] ------------[ cut here ]------------ [42.936] BTRFS: Transaction aborted (error -28) [42.936] WARNING: CPU: 11 PID: 346822 at fs/btrfs/transaction.c:1977 btrfs_commit_transaction+0xcc8/0xeb0 [btrfs] [42.936] Modules linked in: dm_flakey dm_mod loop btrfs (...) [42.936] CPU: 11 PID: 346822 Comm: btrfs Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [42.936] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [42.936] RIP: 0010:btrfs_commit_transaction+0xcc8/0xeb0 [btrfs] [42.936] Code: ff ff 45 8b (...) [42.936] RSP: 0018:ffffb58649633b48 EFLAGS: 00010282 [42.936] RAX: 0000000000000000 RBX: ffff8be6ef4d5bd8 RCX: 0000000000000000 [42.936] RDX: 0000000000000002 RSI: ffffffffb35e7782 RDI: 00000000ffffffff [42.936] RBP: ffff8be6ef4d5c98 R08: 0000000000000000 R09: ffffb586496339e8 [42.936] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8be6d38c7c00 [42.936] R13: 00000000ffffffe4 R14: ffff8be6c268c000 R15: ffff8be6ef4d5cf0 [42.936] FS: 00007f381a82b340(0000) GS:ffff8beddfcc0000(0000) knlGS:0000000000000000 [42.936] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [42.936] CR2: 00007f1e35fb7638 CR3: 0000000117680006 CR4: 0000000000370ee0 [42.936] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [42.936] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [42.936] Call Trace: [42.936] <TASK> [42.936] ? start_transaction+0xcb/0x610 [btrfs] [42.936] prepare_to_relocate+0x111/0x1a0 [btrfs] [42.936] relocate_block_group+0x57/0x5d0 [btrfs] [42.936] ? btrfs_wait_nocow_writers+0x25/0xb0 [btrfs] [42.936] btrfs_relocate_block_group+0x248/0x3c0 [btrfs] [42.936] ? __pfx_autoremove_wake_function+0x10/0x10 [42.936] btrfs_relocate_chunk+0x3b/0x150 [btrfs] [42.936] btrfs_balance+0x8ff/0x11d0 [btrfs] [42.936] ? __kmem_cache_alloc_node+0x14a/0x410 [42.936] btrfs_ioctl+0x2334/0x32c0 [btrfs] [42.937] ? mod_objcg_state+0xd2/0x360 [42.937] ? refill_obj_stock+0xb0/0x160 [42.937] ? seq_release+0x25/0x30 [42.937] ? __rseq_handle_notify_resume+0x3b5/0x4b0 [42.937] ? percpu_counter_add_batch+0x2e/0xa0 [42.937] ? __x64_sys_ioctl+0x88/0xc0 [42.937] __x64_sys_ioctl+0x88/0xc0 [42.937] do_syscall_64+0x38/0x90 [42.937] entry_SYSCALL_64_after_hwframe+0x72/0xdc [42.937] RIP: 0033:0x7f381a6ffe9b [42.937] Code: 00 48 89 44 24 (...) [42.937] RSP: 002b:00007ffd45ecf060 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [42.937] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f381a6ffe9b [42.937] RDX: 00007ffd45ecf150 RSI: 00000000c4009420 RDI: 0000000000000003 [42.937] RBP: 0000000000000003 R08: 0000000000000013 R09: 0000000000000000 [42.937] R10: 00007f381a60c878 R11: 0000000000000246 R12: 00007ffd45ed0423 [42.937] R13: 00007ffd45ecf150 R14: 0000000000000000 R15: 00007ffd45ecf148 [42.937] </TASK> [42.937] ---[ end trace 0000000000000000 ]--- [42.937] BTRFS: error (device sdc: state A) in cleanup_transaction:1977: errno=-28 No space left [59.196] INFO: task btrfs:346772 blocked for more than 120 seconds. [59.196] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.196] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59.196] task:btrfs state:D stack:0 pid:346772 ppid:1 flags:0x00004002 [59.196] Call Trace: [59.196] <TASK> [59.196] __schedule+0x392/0xa70 [59.196] ? __pv_queued_spin_lock_slowpath+0x165/0x370 [59.196] schedule+0x5d/0xd0 [59.196] __scrub_blocked_if_needed+0x74/0xc0 [btrfs] [59.197] ? __pfx_autoremove_wake_function+0x10/0x10 [59.197] scrub_pause_off+0x21/0x50 [btrfs] [59.197] scrub_simple_mirror+0x1c7/0x950 [btrfs] [59.197] ? scrub_parity_put+0x1a5/0x1d0 [btrfs] [59.198] ? __pfx_autoremove_wake_function+0x10/0x10 [59.198] scrub_stripe+0x20d/0x740 [btrfs] [59.198] scrub_chunk+0xc4/0x130 [btrfs] [59.198] scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs] [59.198] ? __pfx_autoremove_wake_function+0x10/0x10 [59.198] btrfs_scrub_dev+0x236/0x6a0 [btrfs] [59.199] ? btrfs_ioctl+0xd97/0x32c0 [btrfs] [59.199] ? _copy_from_user+0x7b/0x80 [59.199] btrfs_ioctl+0xde1/0x32c0 [btrfs] [59.199] ? refill_stock+0x33/0x50 [59.199] ? should_failslab+0xa/0x20 [59.199] ? kmem_cache_alloc_node+0x151/0x460 [59.199] ? alloc_io_context+0x1b/0x80 [59.199] ? preempt_count_add+0x70/0xa0 [59.199] ? __x64_sys_ioctl+0x88/0xc0 [59.199] __x64_sys_ioctl+0x88/0xc0 [59.199] do_syscall_64+0x38/0x90 [59.199] entry_SYSCALL_64_after_hwframe+0x72/0xdc [59.199] RIP: 0033:0x7f82ffaffe9b [59.199] RSP: 002b:00007f82ff9fcc50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [59.199] RAX: ffffffffffffffda RBX: 000055b191e36310 RCX: 00007f82ffaffe9b [59.199] RDX: 000055b191e36310 RSI: 00000000c400941b RDI: 0000000000000003 [59.199] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000 [59.199] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82ff9fd640 [59.199] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000 [59.199] </TASK> [59.199] INFO: task btrfs:346773 blocked for more than 120 seconds. [59.200] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.200] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59.201] task:btrfs state:D stack:0 pid:346773 ppid:1 flags:0x00004002 [59.201] Call Trace: [59.201] <TASK> [59.201] __schedule+0x392/0xa70 [59.201] ? __pv_queued_spin_lock_slowpath+0x165/0x370 [59.201] schedule+0x5d/0xd0 [59.201] __scrub_blocked_if_needed+0x74/0xc0 [btrfs] [59.201] ? __pfx_autoremove_wake_function+0x10/0x10 [59.201] scrub_pause_off+0x21/0x50 [btrfs] [59.202] scrub_simple_mirror+0x1c7/0x950 [btrfs] [59.202] ? scrub_parity_put+0x1a5/0x1d0 [btrfs] [59.202] ? __pfx_autoremove_wake_function+0x10/0x10 [59.202] scrub_stripe+0x20d/0x740 [btrfs] [59.202] scrub_chunk+0xc4/0x130 [btrfs] [59.203] scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs] [59.203] ? __pfx_autoremove_wake_function+0x10/0x10 [59.203] btrfs_scrub_dev+0x236/0x6a0 [btrfs] [59.203] ? btrfs_ioctl+0xd97/0x32c0 [btrfs] [59.203] ? _copy_from_user+0x7b/0x80 [59.203] btrfs_ioctl+0xde1/0x32c0 [btrfs] [59.204] ? should_failslab+0xa/0x20 [59.204] ? kmem_cache_alloc_node+0x151/0x460 [59.204] ? alloc_io_context+0x1b/0x80 [59.204] ? preempt_count_add+0x70/0xa0 [59.204] ? __x64_sys_ioctl+0x88/0xc0 [59.204] __x64_sys_ioctl+0x88/0xc0 [59.204] do_syscall_64+0x38/0x90 [59.204] entry_SYSCALL_64_after_hwframe+0x72/0xdc [59.204] RIP: 0033:0x7f82ffaffe9b [59.204] RSP: 002b:00007f82ff1fbc50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [59.204] RAX: ffffffffffffffda RBX: 000055b191e36790 RCX: 00007f82ffaffe9b [59.204] RDX: 000055b191e36790 RSI: 00000000c400941b RDI: 0000000000000003 [59.204] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000 [59.204] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82ff1fc640 [59.204] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000 [59.204] </TASK> [59.204] INFO: task btrfs:346774 blocked for more than 120 seconds. [59.205] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.205] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59.206] task:btrfs state:D stack:0 pid:346774 ppid:1 flags:0x00004002 [59.206] Call Trace: [59.206] <TASK> [59.206] __schedule+0x392/0xa70 [59.206] schedule+0x5d/0xd0 [59.206] __scrub_blocked_if_needed+0x74/0xc0 [btrfs] [59.206] ? __pfx_autoremove_wake_function+0x10/0x10 [59.206] scrub_pause_off+0x21/0x50 [btrfs] [59.207] scrub_simple_mirror+0x1c7/0x950 [btrfs] [59.207] ? scrub_parity_put+0x1a5/0x1d0 [btrfs] [59.207] ? __pfx_autoremove_wake_function+0x10/0x10 [59.207] scrub_stripe+0x20d/0x740 [btrfs] [59.208] scrub_chunk+0xc4/0x130 [btrfs] [59.208] scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs] [59.208] ? __mutex_unlock_slowpath.isra.0+0x9a/0x120 [59.208] btrfs_scrub_dev+0x236/0x6a0 [btrfs] [59.208] ? btrfs_ioctl+0xd97/0x32c0 [btrfs] [59.209] ? _copy_from_user+0x7b/0x80 [59.209] btrfs_ioctl+0xde1/0x32c0 [btrfs] [59.209] ? should_failslab+0xa/0x20 [59.209] ? kmem_cache_alloc_node+0x151/0x460 [59.209] ? alloc_io_context+0x1b/0x80 [59.209] ? preempt_count_add+0x70/0xa0 [59.209] ? __x64_sys_ioctl+0x88/0xc0 [59.209] __x64_sys_ioctl+0x88/0xc0 [59.209] do_syscall_64+0x38/0x90 [59.209] entry_SYSCALL_64_after_hwframe+0x72/0xdc [59.209] RIP: 0033:0x7f82ffaffe9b [59.209] RSP: 002b:00007f82fe9fac50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [59.209] RAX: ffffffffffffffda RBX: 000055b191e36c10 RCX: 00007f82ffaffe9b [59.209] RDX: 000055b191e36c10 RSI: 00000000c400941b RDI: 0000000000000003 [59.209] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000 [59.209] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fe9fb640 [59.209] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000 [59.209] </TASK> [59.209] INFO: task btrfs:346775 blocked for more than 120 seconds. [59.210] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.210] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59.211] task:btrfs state:D stack:0 pid:346775 ppid:1 flags:0x00004002 [59.211] Call Trace: [59.211] <TASK> [59.211] __schedule+0x392/0xa70 [59.211] schedule+0x5d/0xd0 [59.211] __scrub_blocked_if_needed+0x74/0xc0 [btrfs] [59.211] ? __pfx_autoremove_wake_function+0x10/0x10 [59.211] scrub_pause_off+0x21/0x50 [btrfs] [59.212] scrub_simple_mirror+0x1c7/0x950 [btrfs] [59.212] ? scrub_parity_put+0x1a5/0x1d0 [btrfs] [59.212] ? __pfx_autoremove_wake_function+0x10/0x10 [59.212] scrub_stripe+0x20d/0x740 [btrfs] [59.213] scrub_chunk+0xc4/0x130 [btrfs] [59.213] scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs] [59.213] ? __mutex_unlock_slowpath.isra.0+0x9a/0x120 [59.213] btrfs_scrub_dev+0x236/0x6a0 [btrfs] [59.213] ? btrfs_ioctl+0xd97/0x32c0 [btrfs] [59.214] ? _copy_from_user+0x7b/0x80 [59.214] btrfs_ioctl+0xde1/0x32c0 [btrfs] [59.214] ? should_failslab+0xa/0x20 [59.214] ? kmem_cache_alloc_node+0x151/0x460 [59.214] ? alloc_io_context+0x1b/0x80 [59.214] ? preempt_count_add+0x70/0xa0 [59.214] ? __x64_sys_ioctl+0x88/0xc0 [59.214] __x64_sys_ioctl+0x88/0xc0 [59.214] do_syscall_64+0x38/0x90 [59.214] entry_SYSCALL_64_after_hwframe+0x72/0xdc [59.214] RIP: 0033:0x7f82ffaffe9b [59.214] RSP: 002b:00007f82fe1f9c50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [59.214] RAX: ffffffffffffffda RBX: 000055b191e37090 RCX: 00007f82ffaffe9b [59.214] RDX: 000055b191e37090 RSI: 00000000c400941b RDI: 0000000000000003 [59.214] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000 [59.214] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fe1fa640 [59.214] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000 [59.214] </TASK> [59.214] INFO: task btrfs:346776 blocked for more than 120 seconds. [59.215] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.216] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59.217] task:btrfs state:D stack:0 pid:346776 ppid:1 flags:0x00004002 [59.217] Call Trace: [59.217] <TASK> [59.217] __schedule+0x392/0xa70 [59.217] ? __pv_queued_spin_lock_slowpath+0x165/0x370 [59.217] schedule+0x5d/0xd0 [59.217] __scrub_blocked_if_needed+0x74/0xc0 [btrfs] [59.217] ? __pfx_autoremove_wake_function+0x10/0x10 [59.217] scrub_pause_off+0x21/0x50 [btrfs] [59.217] scrub_simple_mirror+0x1c7/0x950 [btrfs] [59.217] ? scrub_parity_put+0x1a5/0x1d0 [btrfs] [59.218] ? __pfx_autoremove_wake_function+0x10/0x10 [59.218] scrub_stripe+0x20d/0x740 [btrfs] [59.218] scrub_chunk+0xc4/0x130 [btrfs] [59.218] scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs] [59.219] ? __pfx_autoremove_wake_function+0x10/0x10 [59.219] btrfs_scrub_dev+0x236/0x6a0 [btrfs] [59.219] ? btrfs_ioctl+0xd97/0x32c0 [btrfs] [59.219] ? _copy_from_user+0x7b/0x80 [59.219] btrfs_ioctl+0xde1/0x32c0 [btrfs] [59.219] ? should_failslab+0xa/0x20 [59.219] ? kmem_cache_alloc_node+0x151/0x460 [59.219] ? alloc_io_context+0x1b/0x80 [59.219] ? preempt_count_add+0x70/0xa0 [59.219] ? __x64_sys_ioctl+0x88/0xc0 [59.219] __x64_sys_ioctl+0x88/0xc0 [59.219] do_syscall_64+0x38/0x90 [59.219] entry_SYSCALL_64_after_hwframe+0x72/0xdc [59.219] RIP: 0033:0x7f82ffaffe9b [59.219] RSP: 002b:00007f82fd9f8c50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [59.219] RAX: ffffffffffffffda RBX: 000055b191e37510 RCX: 00007f82ffaffe9b [59.219] RDX: 000055b191e37510 RSI: 00000000c400941b RDI: 0000000000000003 [59.219] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000 [59.219] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fd9f9640 [59.219] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000 [59.219] </TASK> [59.219] INFO: task btrfs:346822 blocked for more than 120 seconds. [59.220] Tainted: G W 6.3.0-rc2-btrfs-next-127+ #1 [59.221] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [59.222] task:btrfs state:D stack:0 pid:346822 ppid:1 flags:0x00004002 [59.222] Call Trace: [59.222] <TASK> [59.222] __schedule+0x392/0xa70 [59.222] schedule+0x5d/0xd0 [59.222] btrfs_scrub_cancel+0x91/0x100 [btrfs] [59.222] ? __pfx_autoremove_wake_function+0x10/0x10 [59.222] btrfs_commit_transaction+0x572/0xeb0 [btrfs] [59.223] ? start_transaction+0xcb/0x610 [btrfs] [59.223] prepare_to_relocate+0x111/0x1a0 [btrfs] [59.223] relocate_block_group+0x57/0x5d0 [btrfs] [59.223] ? btrfs_wait_nocow_writers+0x25/0xb0 [btrfs] [59.223] btrfs_relocate_block_group+0x248/0x3c0 [btrfs] [59.224] ? __pfx_autoremove_wake_function+0x10/0x10 [59.224] btrfs_relocate_chunk+0x3b/0x150 [btrfs] [59.224] btrfs_balance+0x8ff/0x11d0 [btrfs] [59.224] ? __kmem_cache_alloc_node+0x14a/0x410 [59.224] btrfs_ioctl+0x2334/0x32c0 [btrfs] [59.225] ? mod_objcg_state+0xd2/0x360 [59.225] ? refill_obj_stock+0xb0/0x160 [59.225] ? seq_release+0x25/0x30 [59.225] ? __rseq_handle_notify_resume+0x3b5/0x4b0 [59.225] ? percpu_counter_add_batch+0x2e/0xa0 [59.225] ? __x64_sys_ioctl+0x88/0xc0 [59.225] __x64_sys_ioctl+0x88/0xc0 [59.225] do_syscall_64+0x38/0x90 [59.225] entry_SYSCALL_64_after_hwframe+0x72/0xdc [59.225] RIP: 0033:0x7f381a6ffe9b [59.225] RSP: 002b:00007ffd45ecf060 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [59.225] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f381a6ffe9b [59.225] RDX: 00007ffd45ecf150 RSI: 00000000c4009420 RDI: 0000000000000003 [59.225] RBP: 0000000000000003 R08: 0000000000000013 R09: 0000000000000000 [59.225] R10: 00007f381a60c878 R11: 0000000000000246 R12: 00007ffd45ed0423 [59.225] R13: 00007ffd45ecf150 R14: 0000000000000000 R15: 00007ffd45ecf148 [59.225] </TASK> What happens is the following: 1) A scrub is running, so fs_info->scrubs_running is 1; 2) Task A starts block group relocation, and at btrfs_relocate_chunk() it pauses scrub by calling btrfs_scrub_pause(). That increments fs_info->scrub_pause_req from 0 to 1 and waits for the scrub task to pause (for fs_info->scrubs_paused to be == to fs_info->scrubs_running); 3) The scrub task pauses at scrub_pause_off(), waiting for fs_info->scrub_pause_req to decrease to 0; 4) Task A then enters btrfs_relocate_block_group(), and down that call chain we start a transaction and then attempt to commit it; 5) When task A calls btrfs_commit_transaction(), it either will do the commit itself or wait for some other task that already started the commit of the transaction - it doesn't matter which case; 6) The transaction commit enters state TRANS_STATE_COMMIT_START; 7) An error happens during the transaction commit, like -ENOSPC when running delayed refs or delayed items for example; 8) This results in calling transaction.c:cleanup_transaction(), where we call btrfs_scrub_cancel(), incrementing fs_info->scrub_cancel_req from 0 to 1, and blocking this task waiting for fs_info->scrubs_running to decrease to 0; 9) From this point on, both the transaction commit and the scrub task hang forever: 1) The transaction commit is waiting for fs_info->scrubs_running to be decreased to 0; 2) The scrub task is at scrub_pause_off() waiting for fs_info->scrub_pause_req to decrease to 0 - so it can not proceed to stop the scrub and decrement fs_info->scrubs_running from 0 to 1. Therefore resulting in a deadlock. Fix this by having cleanup_transaction(), called if a transaction commit fails, not call btrfs_scrub_cancel() if relocation is in progress, and having btrfs_relocate_block_group() call btrfs_scrub_cancel() instead if the relocation failed and a transaction abort happened. This was triggered with btrfs/061 from fstests. Fixes: 55e3a601c81c ("btrfs: Fix data checksum error cause by replace with io-load.") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-28btrfs: scan device in non-exclusive modeAnand Jain
This fixes mkfs/mount/check failures due to race with systemd-udevd scan. During the device scan initiated by systemd-udevd, other user space EXCL operations such as mkfs, mount, or check may get blocked and result in a "Device or resource busy" error. This is because the device scan process opens the device with the EXCL flag in the kernel. Two reports were received: - btrfs/179 test case, where the fsck command failed with the -EBUSY error - LTP pwritev03 test case, where mkfs.vfs failed with the -EBUSY error, when mkfs.vfs tried to overwrite old btrfs filesystem on the device. In both cases, fsck and mkfs (respectively) were racing with a systemd-udevd device scan, and systemd-udevd won, resulting in the -EBUSY error for fsck and mkfs. Reproducing the problem has been difficult because there is a very small window during which these userspace threads can race to acquire the exclusive device open. Even on the system where the problem was observed, the problem occurrences were anywhere between 10 to 400 iterations and chances of reproducing decreases with debug printk()s. However, an exclusive device open is unnecessary for the scan process, as there are no write operations on the device during scan. Furthermore, during the mount process, the superblock is re-read in the below function call chain: btrfs_mount_root btrfs_open_devices open_fs_devices btrfs_open_one_device btrfs_get_bdev_and_sb So, to fix this issue, removes the FMODE_EXCL flag from the scan operation, and add a comment. The case where mkfs may still write to the device and a scan is running, the btrfs signature is not written at that time so scan will not recognize such device. Reported-by: Sherry Yang <sherry.yang@oracle.com> Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/oe-lkp/202303170839.fdf23068-oliver.sang@intel.com CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>