summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-05-02net/mlx5e: flower: check for unsupported control flagsAsbjørn Sloth Tønnesen
Use flow_rule_is_supp_control_flags() to reject filters with unsupported control flags. In case any unsupported control flags are masked, flow_rule_is_supp_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Remove FLOW_DIS_FIRST_FRAG specific error message, and treat it as any other unsupported control flag. Only compile-tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240422152728.175677-1-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-03Merge tag 'drm-misc-fixes-2024-05-02' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: imagination: - fix page-count macro nouveau: - avoid page-table allocation failures - fix firmware memory allocation panel: - ili9341: avoid OF for device properties; respect deferred probe; fix usage of errno codes ttm: - fix status output vmwgfx: - fix legacy display unit - fix read length in fence signalling Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240502192117.GA12158@linux.fritz.box
2024-05-03Merge tag 'drm-xe-fixes-2024-05-02' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes - Fix UAF on rebind worker - Fix ADL-N display integration Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/6bontwst3mbxozs6u3ad5n3g5zmaucrngbfwv4hkfhpscnwlym@wlwjgjx6pwue
2024-05-03Merge tag 'drm-xe-next-fixes-2024-05-02' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next Driver Changes: - Fix for a backmerge going slightly wrong. - An UAF fix - Avoid a WA error on LNL. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZjOijQA43zhu3SZ4@fedora
2024-05-03Merge tag 'amd-drm-fixes-6.9-2024-05-01' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.9-2024-05-01: amdgpu: - Fix VRAM memory accounting - DCN 3.1 fixes - DCN 2.0 fix - DCN 3.1.5 fix - DCN 3.5 fix - DCN 3.2.1 fix - DP fixes - Seamless boot fix - Fix call order in amdgpu_ttm_move() - Fix doorbell regression - Disable panel replay temporarily amdkfd: - Flush wq before creating kfd process Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240501135054.1919108-1-alexander.deucher@amd.com
2024-05-02bdev: move ->bd_make_it_fail to ->__bd_flagsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02bdev: move ->bd_ro_warned to ->__bd_flagsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02bdev: move ->bd_has_subit_bio to ->__bd_flagsAl Viro
In bdev_alloc() we have all flags initialized to false, so assignment to ->bh_has_submit_bio n there is a no-op unless we have partno != 0 and flag already set on entire device. In device_add_disk() we have just allocated the block_device in question and it had been a full-device one, so the flag is guaranteed to be still clear when we get to assignment. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02bdev: move ->bd_write_holder into ->__bd_flagsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02bdev: move ->bd_read_only to ->__bd_flagsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02bdev: infrastructure for flagsAl Viro
Replace bd_partno with a 32bit field (__bd_flags). The lower 8 bits contain the partition number, the upper 24 are for flags. Helpers: bdev_{test,set,clear}_flag(bdev, flag), with atomic_or() and atomic_andnot() used to set/clear. NOTE: this commit does not actually move any flags over there - they are still bool fields. As the result, it shifts the fields wrt cacheline boundaries; that's going to be restored once the first 3 flags are dealt with. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02libbpf: fix ring_buffer__consume_n() return result logicAndrii Nakryiko
Add INT_MAX check to ring_buffer__consume_n(). We do the similar check to handle int return result of all these ring buffer APIs in other APIs and ring_buffer__consume_n() is missing one. This patch fixes this omission. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20240430201952.888293-2-andrii@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-02libbpf: fix potential overflow in ring__consume_n()Andrii Nakryiko
ringbuf_process_ring() return int64_t, while ring__consume_n() assigns it to int. It's highly unlikely, but possible for ringbuf_process_ring() to return value larger than INT_MAX, so use int64_t. ring__consume_n() does check INT_MAX before returning int result to the user. Fixes: 4d22ea94ea33 ("libbpf: Add ring__consume_n / ring_buffer__consume_n") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20240430201952.888293-1-andrii@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-02Merge branch 'Add new args into tcp_congestion_ops' cong_control'Martin KaFai Lau
Miao Xu says: ==================== This patchset attempts to add two new arguments into the hookpoint cong_control in tcp_congestion_ops. The new arguments are inherited from the caller tcp_cong_control and can be used by any bpf cc prog that implements its own logic inside this hookpoint. Please review. Thanks a lot! Changelog ===== v2->v3: - Fixed the broken selftest caused by the new arguments. - Renamed the selftest file name and bpf prog name. v1->v2: - Split the patchset into 3 separate patches. - Added highlights in the selftest prog. - Removed the dependency on bpf_tcp_helpers.h. ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-02selftests/bpf: Add test for the use of new args in cong_controlMiao Xu
This patch adds a selftest to show the usage of the new arguments in cong_control. For simplicity's sake, the testing example reuses cubic's kernel functions. Signed-off-by: Miao Xu <miaxu@meta.com> Link: https://lore.kernel.org/r/20240502042318.801932-4-miaxu@meta.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-02bpf: tcp: Allow to write tp->snd_cwnd_stamp in bpf_tcp_caMiao Xu
This patch allows the write of tp->snd_cwnd_stamp in a bpf tcp ca program. An use case of writing this field is to keep track of the time whenever tp->snd_cwnd is raised or reduced inside the `cong_control` callback. Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Miao Xu <miaxu@meta.com> Link: https://lore.kernel.org/r/20240502042318.801932-3-miaxu@meta.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-02tcp: Add new args for cong_control in tcp_congestion_opsMiao Xu
This patch adds two new arguments for cong_control of struct tcp_congestion_ops: - ack - flag These two arguments are inherited from the caller tcp_cong_control in tcp_intput.c. One use case of them is to update cwnd and pacing rate inside cong_control based on the info they provide. For example, the flag can be used to decide if it is the right time to raise or reduce a sender's cwnd. Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Miao Xu <miaxu@meta.com> Link: https://lore.kernel.org/r/20240502042318.801932-2-miaxu@meta.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-02KVM: selftests: Require KVM_CAP_USER_MEMORY2 for tests that create memslotsSean Christopherson
Explicitly require KVM_CAP_USER_MEMORY2 for selftests that create memslots, i.e. skip selftests that need memslots instead of letting them fail on KVM_SET_USER_MEMORY_REGION2. While it's ok to take a dependency on new kernel features, selftests should skip gracefully instead of failing hard when run on older kernels. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/69ae0694-8ca3-402c-b864-99b500b24f5d@moroto.mountain Suggested-by: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/20240430162133.337541-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-05-02KVM: selftests: Allow skipping the KVM_RUN sanity check in rseq_testZide Chen
The rseq test's migration worker delays 1-10 us, assuming that one KVM_RUN iteration only takes a few microseconds. But if the CPU low power wakeup latency is large enough, for example, hundreds or even thousands of microseconds for deep C-state exit latencies on x86 server CPUs, it may happen that the target CPU is unable to wakeup and run the vCPU before the migration worker starts to migrate the vCPU thread to the _next_ CPU. If the system workload is light, most CPUs could be at a certain low power state, which may result in less successful migrations and fail the migration/KVM_RUN ratio sanity check. But this is not supposed to be deemed a test failure. Add a command line option to skip the sanity check, along with a comment and a verbose assert message to try to help the user resolve the potential source of failures without having to resort to disabling the check. Co-developed-by: Dongsheng Zhang <dongsheng.x.zhang@intel.com> Signed-off-by: Dongsheng Zhang <dongsheng.x.zhang@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Link: https://lore.kernel.org/r/20240502213936.27619-1-zide.chen@intel.com [sean: massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-05-02Merge branch 'selftests/bpf: Add sockaddr tests for kernel networking'Martin KaFai Lau
Jordan Rife says: ==================== This patch series adds test coverage for BPF sockaddr hooks and their interactions with kernel socket functions (i.e. kernel_bind(), kernel_connect(), kernel_sendmsg(), sock_sendmsg(), kernel_getpeername(), and kernel_getsockname()) while also rounding out IPv4 and IPv6 sockaddr hook coverage in prog_tests/sock_addr.c. As with v1 of this patch series, we add regression coverage for the issues addressed by these patches, - commit 0bdf399342c5("net: Avoid address overwrite in kernel_connect") - commit 86a7e0b69bd5("net: prevent rewrite of msg_name in sock_sendmsg()") - commit c889a99a21bf("net: prevent address rewrite in kernel_bind()") - commit 01b2885d9415("net: Save and restore msg_namelen in sock_sendmsg") but broaden the focus a bit. In order to extend prog_tests/sock_addr.c to test these kernel functions, we add a set of new kfuncs that wrap individual socket operations to bpf_testmod and invoke them through set of corresponding SYSCALL programs (progs/sock_addr_kern.c). Each test case can be configured to use a different set of "sock_ops" depending on whether it is testing kernel calls (kernel_bind(), kernel_connect(), etc.) or system calls (bind(), connect(), etc.). ======= Patches ======= * Patch 1 fixes the sock_addr bind test program to work for big endian architectures such as s390x. * Patch 2 introduces the new kfuncs to bpf_testmod. * Patch 3 introduces the BPF program which allows us to invoke these kfuncs invividually from the test program. * Patch 4 lays the groundwork for IPv4 and IPv6 sockaddr hook coverage by migrating much of the environment setup logic from bpf/test_sock_addr.sh into prog_tests/sock_addr.c and moves test cases to cover bind4/6, connect4/6, sendmsg4/6 and recvmsg4/6 hooks. * Patch 5 makes the set of socket operations for each test case configurable, laying the groundwork for Patch 6. * Patch 6 introduces two sets of sock_ops that invoke the kernel equivalents of connect(), bind(), etc. and uses these to add coverage for the kernel socket functions. ======= Changes ======= v2->v3 ------ * Renamed bind helpers. Dropped "_ntoh" suffix. * Added guards to kfuncs to make sure addrlen and msglen do not exceed the buffer capacity. * Added KF_SLEEPABLE flag to kfuncs. * Added a mutex (sock_lock) to kfuncs to serialize access to sock. * Added NULL check for sock to each kfunc. * Use the "sock_addr" networking namespace for all network interface setup and testing. * Use "nodad" when calling "ip -6 addr add" during interface setup to avoid delays and remove ping loop. * Removed test cases from test_sock_addr.c to make it clear what remains to be migrated. * Removed unused parameter (expect_change) from sock_addr_op(). Link: https://lore.kernel.org/bpf/20240412165230.2009746-1-jrife@google.com/T/#u v1->v2 ------ * Dropped test_progs/sock_addr_kern.c and the sock_addr_kern test module in favor of simply expanding bpf_testmod and test_progs/sock_addr.c. * Migrated environment setup logic from bpf/test_sock_addr.sh into prog_tests/sock_addr.c rather than invoking the script from the test program. * Added kfuncs to bpf_testmod as well as the sock_addr_kern BPF program to enable us to invoke kernel socket functions from test_progs/sock_addr.c. * Added test coverage for kernel socket functions to test_progs/sock_addr.c. Link: https://lore.kernel.org/bpf/20240329191907.1808635-1-jrife@google.com/T/#u ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-02Revert "ext4: apply umask if ACL support is disabled"Max Kellermann
This reverts commit 484fd6c1de13b336806a967908a927cc0356e312. The commit caused a regression because now the umask was applied to symlinks and the fix is unnecessary because the umask/O_TMPFILE bug has been fixed somewhere else already. Fixes: https://lore.kernel.org/lkml/28DSITL9912E1.2LSZUVTGTO52Q@mforney.org/ Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Tested-by: Michael Forney <mforney@mforney.org> Link: https://lore.kernel.org/r/20240315142956.2420360-1-max.kellermann@ionos.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-05-02selftests/bpf: Add kernel socket operation testsJordan Rife
This patch creates two sets of sock_ops that call out to the SYSCALL hooks in the sock_addr_kern BPF program and uses them to construct test cases for the range of supported operations (kernel_connect(), kernel_bind(), kernel_sendms(), sock_sendmsg(), kernel_getsockname(), kenel_getpeername()). This ensures that these interact with BPF sockaddr hooks as intended. Beyond this it also ensures that these operations do not modify their address parameter, providing regression coverage for the issues addressed by this set of patches: - commit 0bdf399342c5("net: Avoid address overwrite in kernel_connect") - commit 86a7e0b69bd5("net: prevent rewrite of msg_name in sock_sendmsg()") - commit c889a99a21bf("net: prevent address rewrite in kernel_bind()") - commit 01b2885d9415("net: Save and restore msg_namelen in sock_sendmsg") Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240429214529.2644801-7-jrife@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-02selftests/bpf: Make sock configurable for each test caseJordan Rife
In order to reuse the same test code for both socket system calls (e.g. connect(), bind(), etc.) and kernel socket functions (e.g. kernel_connect(), kernel_bind(), etc.), this patch introduces the "ops" field to sock_addr_test. This field allows each test cases to configure the set of functions used in the test case to create, manipulate, and tear down a socket. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240429214529.2644801-6-jrife@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-02selftests/bpf: Move IPv4 and IPv6 sockaddr test casesJordan Rife
This patch lays the groundwork for testing IPv4 and IPv6 sockaddr hooks and their interaction with both socket syscalls and kernel functions (e.g. kernel_connect, kernel_bind, etc.). It moves some of the test cases from the old-style bpf/test_sock_addr.c self test into the sock_addr prog_test in a step towards fully retiring bpf/test_sock_addr.c. We will expand the test dimensions in the sock_addr prog_test in a later patch series in order to migrate the remaining test cases. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20240429214529.2644801-5-jrife@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-05-02Merge tag 'md-6.10-20240502' of ↵Jens Axboe
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.10/block Pull MD fix from Song: "This fixes an issue observed with dm-raid." * tag 'md-6.10-20240502' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md: fix resync softlockup when bitmap size is less than array size
2024-05-02PCI: of_property: Return error for int_map allocation failureDuoming Zhou
Return -ENOMEM from of_pci_prop_intr_map() if kcalloc() fails to prevent a NULL pointer dereference in this case. Fixes: 407d1a51921e ("PCI: Create device tree node for bridge") Link: https://lore.kernel.org/r/20240303105729.78624-1-duoming@zju.edu.cn Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-05-02PCI/ASPM: Clean up ASPM disable/enable mask calculationIlpo Järvinen
With only one set of defines remaining, state can be almost used as-is to set ->aspm_disable/default. Only CLKPM and L1 PM substates need special handling. Remove unnecessary if conditions that can use the state variable bits directly. Move ASPM mask calculation into pci_calc_aspm_enable_mask() and pci_calc_aspm_disable_mask() helpers which makes it easier to alter state variable directly. Link: https://lore.kernel.org/r/20240322123952.6384-3-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-05-02PCI/ASPM: Consolidate link state definesIlpo Järvinen
The linux/pci.h and aspm.c files define their own sets of link state related defines which are almost the same. Consolidate the use of defines into those defined by linux/pci.h and expand PCIE_LINK_STATE_L0S to match earlier ASPM_STATE_L0S that includes both upstream and downstream bits. Rename also the defines that are internal to aspm.c to start with PCIE_LINK_STATE for consistency. While the PCIE_LINK_STATE_L0S BIT(0) -> (BIT(0) | BIT(1)) transformation is not 1:1, in practice aspm.c already used ASPM_STATE_L0S that has both bits enabled except during mapping. While at it, place the PCIE_LINK_STATE_CLKPM define last to have more logical grouping. Use static_assert() to ensure PCIE_LINK_STATE_L0S is strictly equal to the combination of PCIE_LINK_STATE_L0S_UP/DW. Link: https://lore.kernel.org/r/20240322123952.6384-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-05-02wrapper for access to ->bd_partnoAl Viro
On the next step it's going to get folded into a field where flags will go. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02Use bdev_is_paritition() instead of open-coding itAl Viro
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02md: fix resync softlockup when bitmap size is less than array sizeYu Kuai
Is is reported that for dm-raid10, lvextend + lvchange --syncaction will trigger following softlockup: kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [mdX_resync:6976] CPU: 7 PID: 3588 Comm: mdX_resync Kdump: loaded Not tainted 6.9.0-rc4-next-20240419 #1 RIP: 0010:_raw_spin_unlock_irq+0x13/0x30 Call Trace: <TASK> md_bitmap_start_sync+0x6b/0xf0 raid10_sync_request+0x25c/0x1b40 [raid10] md_do_sync+0x64b/0x1020 md_thread+0xa7/0x170 kthread+0xcf/0x100 ret_from_fork+0x30/0x50 ret_from_fork_asm+0x1a/0x30 And the detailed process is as follows: md_do_sync j = mddev->resync_min while (j < max_sectors) sectors = raid10_sync_request(mddev, j, &skipped) if (!md_bitmap_start_sync(..., &sync_blocks)) // md_bitmap_start_sync set sync_blocks to 0 return sync_blocks + sectors_skippe; // sectors = 0; j += sectors; // j never change Root cause is that commit 301867b1c168 ("md/raid10: check slab-out-of-bounds in md_bitmap_get_counter") return early from md_bitmap_get_counter(), without setting returned blocks. Fix this problem by always set returned blocks from md_bitmap_get_counter"(), as it used to be. Noted that this patch just fix the softlockup problem in kernel, the case that bitmap size doesn't match array size still need to be fixed. Fixes: 301867b1c168 ("md/raid10: check slab-out-of-bounds in md_bitmap_get_counter") Reported-and-tested-by: Nigel Croxon <ncroxon@redhat.com> Closes: https://lore.kernel.org/all/71ba5272-ab07-43ba-8232-d2da642acb4e@redhat.com/ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240422065824.2516-1-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org>
2024-05-02drm: zynqmp_dpsub: Always register bridgeSean Anderson
We must always register the DRM bridge, since zynqmp_dp_hpd_work_func calls drm_bridge_hpd_notify, which in turn expects hpd_mutex to be initialized. We do this before zynqmp_dpsub_drm_init since that calls drm_bridge_attach. This fixes the following lockdep warning: [ 19.217084] ------------[ cut here ]------------ [ 19.227530] DEBUG_LOCKS_WARN_ON(lock->magic != lock) [ 19.227768] WARNING: CPU: 0 PID: 140 at kernel/locking/mutex.c:582 __mutex_lock+0x4bc/0x550 [ 19.241696] Modules linked in: [ 19.244937] CPU: 0 PID: 140 Comm: kworker/0:4 Not tainted 6.6.20+ #96 [ 19.252046] Hardware name: xlnx,zynqmp (DT) [ 19.256421] Workqueue: events zynqmp_dp_hpd_work_func [ 19.261795] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 19.269104] pc : __mutex_lock+0x4bc/0x550 [ 19.273364] lr : __mutex_lock+0x4bc/0x550 [ 19.277592] sp : ffffffc085c5bbe0 [ 19.281066] x29: ffffffc085c5bbe0 x28: 0000000000000000 x27: ffffff88009417f8 [ 19.288624] x26: ffffff8800941788 x25: ffffff8800020008 x24: ffffffc082aa3000 [ 19.296227] x23: ffffffc080d90e3c x22: 0000000000000002 x21: 0000000000000000 [ 19.303744] x20: 0000000000000000 x19: ffffff88002f5210 x18: 0000000000000000 [ 19.311295] x17: 6c707369642e3030 x16: 3030613464662072 x15: 0720072007200720 [ 19.318922] x14: 0000000000000000 x13: 284e4f5f4e524157 x12: 0000000000000001 [ 19.326442] x11: 0001ffc085c5b940 x10: 0001ff88003f388b x9 : 0001ff88003f3888 [ 19.334003] x8 : 0001ff88003f3888 x7 : 0000000000000000 x6 : 0000000000000000 [ 19.341537] x5 : 0000000000000000 x4 : 0000000000001668 x3 : 0000000000000000 [ 19.349054] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff88003f3880 [ 19.356581] Call trace: [ 19.359160] __mutex_lock+0x4bc/0x550 [ 19.363032] mutex_lock_nested+0x24/0x30 [ 19.367187] drm_bridge_hpd_notify+0x2c/0x6c [ 19.371698] zynqmp_dp_hpd_work_func+0x44/0x54 [ 19.376364] process_one_work+0x3ac/0x988 [ 19.380660] worker_thread+0x398/0x694 [ 19.384736] kthread+0x1bc/0x1c0 [ 19.388241] ret_from_fork+0x10/0x20 [ 19.392031] irq event stamp: 183 [ 19.395450] hardirqs last enabled at (183): [<ffffffc0800b9278>] finish_task_switch.isra.0+0xa8/0x2d4 [ 19.405140] hardirqs last disabled at (182): [<ffffffc081ad3754>] __schedule+0x714/0xd04 [ 19.413612] softirqs last enabled at (114): [<ffffffc080133de8>] srcu_invoke_callbacks+0x158/0x23c [ 19.423128] softirqs last disabled at (110): [<ffffffc080133de8>] srcu_invoke_callbacks+0x158/0x23c [ 19.432614] ---[ end trace 0000000000000000 ]--- Fixes: eb2d64bfcc17 ("drm: xlnx: zynqmp_dpsub: Report HPD through the bridge") Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240308204741.3631919-1-sean.anderson@linux.dev (cherry picked from commit 61ba791c4a7a09a370c45b70a81b8c7d4cf6b2ae) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2024-05-02Revert "drm/bridge: ti-sn65dsi83: Fix enable error path"Luca Ceresoli
This reverts commit 8a91b29f1f50ce7742cdbe5cf11d17f128511f3f. The regulator_disable() added by the original commit solves one kind of regulator imbalance but adds another one as it allows the regulator to be disabled one more time than it is enabled in the following scenario: 1. Start video pipeline -> sn65dsi83_atomic_pre_enable -> regulator_enable 2. PLL lock fails -> regulator_disable 3. Stop video pipeline -> sn65dsi83_atomic_disable -> regulator_disable The reason is clear from the code flow, which looks like this (after removing unrelated code): static void sn65dsi83_atomic_pre_enable() { regulator_enable(ctx->vcc); if (PLL failed locking) { regulator_disable(ctx->vcc); <---- added by patch being reverted return; } } static void sn65dsi83_atomic_disable() { regulator_disable(ctx->vcc); } The use case for introducing the additional regulator_disable() was removing the module for debugging (see link below for the discussion). If the module is removed after a .atomic_pre_enable, i.e. with an active pipeline from the DRM point of view, .atomic_disable is not called and thus the regulator would not be disabled. According to the discussion however there is no actual use case for removing the module with an active pipeline, except for debugging/development. On the other hand, the occurrence of a PLL lock failure is possible due to any physical reason (e.g. a temporary hardware failure for electrical reasons) so handling it gracefully should be supported. As there is no way for .atomic[_pre]_enable to report an error to the core, the only clean way to support it is calling regulator_disabled() only in .atomic_disable, unconditionally, as it was before. Link: https://lore.kernel.org/all/15244220.uLZWGnKmhe@steina-w/ Fixes: 8a91b29f1f50 ("drm/bridge: ti-sn65dsi83: Fix enable error path") Reviewed-by: Alexander Stein <alexander.stein@ew.tq-group.com> Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Signed-off-by: Robert Foss <rfoss@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240426122259.46808-1-luca.ceresoli@bootlin.com (cherry picked from commit 2940ee03b23281071620dda1d790cd644dabd394) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2024-05-02make set_blocksize() fail unless block device is opened exclusiveAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02set_blocksize(): switch to passing struct file *Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02btrfs_get_bdev_and_sb(): call set_blocksize() only for exclusive opensAl Viro
btrfs_get_bdev_and_sb() has two callers - btrfs_open_one_device(), which asks for open to be exclusive and btrfs_get_dev_args_from_path(), which doesn't. Currently it does set_blocksize() in all cases. I'm rather dubious about the need to do set_blocksize() anywhere in btrfs, to be honest - there's some access to page cache of underlying block devices in there, but it's nowhere near the hot paths, AFAICT. In any case, btrfs_get_dev_args_from_path() only needs to read the on-disk superblock and copy several fields out of it; all callers are only interested in devices that are already opened and brought into per-filesystem set, so setting the block size is redundant for those and actively harmful if we are given a pathname of unrelated device. So we only need btrfs_get_bdev_and_sb() to call set_blocksize() when it's asked to open exclusive. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02swsusp: don't bother with setting block sizeAl Viro
same as with the swap... Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02zram: don't bother with reopening - just use O_EXCL for openAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02drm/fb_dma: Add checks in drm_fb_dma_get_scanout_buffer()Jocelyn Falempe
plane->state and plane->state->fb can be NULL, so add a check before dereferencing them. Found by testing with the imx driver. Fixes: 879b3b6511fe ("drm/fb_dma: Add generic get_scanout_buffer() for drm_panic") Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240426121121.241366-1-jfalempe@redhat.com (cherry picked from commit 986c12d8c9a677c094c37bd6aa636b4d4c5ccd46) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2024-05-02drm/fbdev-generic: Do not set physical framebuffer addressThomas Zimmermann
Framebuffer memory is allocated via vzalloc() from non-contiguous physical pages. The physical framebuffer start address is therefore meaningless. Do not set it. The value is not used within the kernel and only exported to userspace on dedicated ARM configs. No functional change is expected. v2: - refer to vzalloc() in commit message (Javier) Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Fixes: a5b44c4adb16 ("drm/fbdev-generic: Always use shadow buffering") Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Javier Martinez Canillas <javierm@redhat.com> Cc: Zack Rusin <zackr@vmware.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: <stable@vger.kernel.org> # v6.4+ Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Reviewed-by: Zack Rusin <zack.rusin@broadcom.com> Reviewed-by: Sui Jingfeng <sui.jingfeng@linux.dev> Tested-by: Sui Jingfeng <sui.jingfeng@linux.dev> Acked-by: Maxime Ripard <mripard@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240419083331.7761-2-tzimmermann@suse.de (cherry picked from commit 73ef0aecba78aa9ebd309b10b6cd17d94e632892) Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2024-05-02swapon(2): open swap with O_EXCLAl Viro
... eliminating the need to reopen block devices so they could be exclusively held. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02swapon(2)/swapoff(2): don't bother with block sizeAl Viro
once upon a time that used to matter; these days we do swap IO for swap devices at the level that doesn't give a damn about block size, buffer_head or anything of that sort - just attach the page to bio, set the location and size (the latter to PAGE_SIZE) and feed into queue. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02pktcdvd: sort set_blocksize() calls outAl Viro
1) it doesn't make any sense to have ->open() call set_blocksize() on the device being opened - the caller will override that anyway. 2) setting block size on underlying device, OTOH, ought to be done when we are opening it exclusive - i.e. as part of pkt_open_dev(). Having it done at setup time doesn't guarantee us anything about the state at the time we start talking to it. Worse, if you happen to have the underlying device containing e.g. ext2 with 4Kb blocks that is currently mounted r/o, that set_blocksize() will confuse the hell out of filesystem. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02bcache_register(): don't bother with set_blocksize()Al Viro
We are not using __bread() anymore and read_cache_page_gfp() doesn't care about block size. Moreover, we should *not* change block size on a device that is currently held exclusive - filesystems that use buffer cache expect the block numbers to be interpreted in units set by filesystem. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> ACKed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02btrfs: make sure that WRITTEN is set on all metadata blocksJosef Bacik
We previously would call btrfs_check_leaf() if we had the check integrity code enabled, which meant that we could only run the extended leaf checks if we had WRITTEN set on the header flags. This leaves a gap in our checking, because we could end up with corruption on disk where WRITTEN isn't set on the leaf, and then the extended leaf checks don't get run which we rely on to validate all of the item pointers to make sure we don't access memory outside of the extent buffer. However, since 732fab95abe2 ("btrfs: check-integrity: remove CONFIG_BTRFS_FS_CHECK_INTEGRITY option") we no longer call btrfs_check_leaf() from btrfs_mark_buffer_dirty(), which means we only ever call it on blocks that are being written out, and thus have WRITTEN set, or that are being read in, which should have WRITTEN set. Add checks to make sure we have WRITTEN set appropriately, and then make sure __btrfs_check_leaf() always does the item checking. This will protect us from file systems that have been corrupted and no longer have WRITTEN set on some of the blocks. This was hit on a crafted image tweaking the WRITTEN bit and reported by KASAN as out-of-bound access in the eb accessors. The example is a dir item at the end of an eb. [2.042] BTRFS warning (device loop1): bad eb member start: ptr 0x3fff start 30572544 member offset 16410 size 2 [2.040] general protection fault, probably for non-canonical address 0xe0009d1000000003: 0000 [#1] PREEMPT SMP KASAN NOPTI [2.537] KASAN: maybe wild-memory-access in range [0x0005088000000018-0x000508800000001f] [2.729] CPU: 0 PID: 2587 Comm: mount Not tainted 6.8.2 #1 [2.729] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [2.621] RIP: 0010:btrfs_get_16+0x34b/0x6d0 [2.621] RSP: 0018:ffff88810871fab8 EFLAGS: 00000206 [2.621] RAX: 0000a11000000003 RBX: ffff888104ff8720 RCX: ffff88811b2288c0 [2.621] RDX: dffffc0000000000 RSI: ffffffff81dd8aca RDI: ffff88810871f748 [2.621] RBP: 000000000000401a R08: 0000000000000001 R09: ffffed10210e3ee9 [2.621] R10: ffff88810871f74f R11: 205d323430333737 R12: 000000000000001a [2.621] R13: 000508800000001a R14: 1ffff110210e3f5d R15: ffffffff850011e8 [2.621] FS: 00007f56ea275840(0000) GS:ffff88811b200000(0000) knlGS:0000000000000000 [2.621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [2.621] CR2: 00007febd13b75c0 CR3: 000000010bb50000 CR4: 00000000000006f0 [2.621] Call Trace: [2.621] <TASK> [2.621] ? show_regs+0x74/0x80 [2.621] ? die_addr+0x46/0xc0 [2.621] ? exc_general_protection+0x161/0x2a0 [2.621] ? asm_exc_general_protection+0x26/0x30 [2.621] ? btrfs_get_16+0x33a/0x6d0 [2.621] ? btrfs_get_16+0x34b/0x6d0 [2.621] ? btrfs_get_16+0x33a/0x6d0 [2.621] ? __pfx_btrfs_get_16+0x10/0x10 [2.621] ? __pfx_mutex_unlock+0x10/0x10 [2.621] btrfs_match_dir_item_name+0x101/0x1a0 [2.621] btrfs_lookup_dir_item+0x1f3/0x280 [2.621] ? __pfx_btrfs_lookup_dir_item+0x10/0x10 [2.621] btrfs_get_tree+0xd25/0x1910 Reported-by: lei lu <llfamsec@gmail.com> CC: stable@vger.kernel.org # 6.7+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ copy more details from report ] Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-02Merge branch 'shared-zeropage' into featuresAlexander Gordeev
David Hildenbrand says: =================== This series fixes one issue with uffd + shared zeropages on s390x and fixes that "ordinary" KVM guests can make use of shared zeropages again. userfaultfd could currently end up mapping shared zeropages into processes that forbid shared zeropages. This only apples to s390x, relevant for handling PV guests and guests that use storage kets correctly. Fix it by placing a zeroed folio instead of the shared zeropage during UFFDIO_ZEROPAGE instead. I stumbled over this issue while looking into a customer scenario that is using: (1) Memory ballooning for dynamic resizing. Start a VM with, say, 100 GiB and inflate the balloon during boot to 60 GiB. The VM has ~40 GiB available and additional memory can be "fake hotplugged" to the VM later on demand by deflating the balloon. Actual memory overcommit is not desired, so physical memory would only be moved between VMs. (2) Live migration of VMs between sites to evacuate servers in case of emergency. Without the shared zeropage, during (2), the VM would suddenly consume 100 GiB on the migration source and destination. On the migration source, where we don't excpect memory overcommit, we could easilt end up crashing the VM during migration. Independent of that, memory handed back to the hypervisor using "free page reporting" would end up consuming actual memory after the migration on the destination, not getting freed up until reused+freed again. While there might be ways to optimize parts of this in QEMU, we really should just support the shared zeropage again for ordinary VMs. We only expect legcy guests to make use of storage keys, so let's handle zeropages again when enabling storage keys or when enabling PV. To not break userfaultfd like we did in the past, don't zap the shared zeropages, but instead trigger unsharing faults, just like we do for unsharing KSM pages in break_ksm(). Unsharing faults will simply replace the shared zeropage by a zeroed anonymous folio. We can already trigger the same fault path using GUP, when trying to long-term pin a shared zeropage, but also when unmerging a KSM-placed zeropages, so this is nothing new. Patch #1 tested on 86-64 by forcing mm_forbids_zeropage() to be 1, and running the uffd selftests. Patch #2 tested on s390x: the live migration scenario now works as expected, and kvm-unit-tests that trigger usage of skeys work well, whereby I can see detection and unsharing of shared zeropages. Further (as broken in v2), I tested that the shared zeropage is no longer populated after skeys are used -- that mm_forbids_zeropage() works as expected: ./s390x-run s390x/skey.elf \ -no-shutdown \ -chardev socket,id=monitor,path=/var/tmp/mon,server,nowait \ -mon chardev=monitor,mode=readline Then, in another shell: # cat /proc/`pgrep qemu`/smaps_rollup | grep Rss Rss: 31484 kB # echo "dump-guest-memory tmp" | sudo nc -U /var/tmp/mon ... # cat /proc/`pgrep qemu`/smaps_rollup | grep Rss Rss: 160452 kB -> Reading guest memory does not populate the shared zeropage Doing the same with selftest.elf (no skeys) # cat /proc/`pgrep qemu`/smaps_rollup | grep Rss Rss: 30900 kB # echo "dump-guest-memory tmp" | sudo nc -U /var/tmp/mon ... # cat /proc/`pgrep qemu`/smaps_rollup | grep Rsstmp/mon Rss: 30924 kB -> Reading guest memory does populate the shared zeropage =================== Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-05-02perf maps: Remove check_invariants() from maps__lock()Namhyung Kim
I found that the debug build was a slowed down a lot by the maps lock code since it checks the invariants whenever it gets the pointer to the lock. This means it checks twice the invariants before and after the access. Instead, let's move the checking code within the lock area but after any modification and remove it from the read paths. This would remove (more than) half of the maps lock overhead. The time for perf report with a huge data file (200k+ of MMAP2 events). Non-debug Before After --------- -------- -------- 2m 43s 6m 45s 4m 21s Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240429225738.1491791-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-02btrfs: qgroup: do not check qgroup inherit if qgroup is disabledQu Wenruo
[BUG] After kernel commit 86211eea8ae1 ("btrfs: qgroup: validate btrfs_qgroup_inherit parameter"), user space tool snapper will fail to create snapshot using its timeline feature. [CAUSE] It turns out that, if using timeline snapper would unconditionally pass btrfs_qgroup_inherit parameter (assigning the new snapshot to qgroup 1/0) for snapshot creation. In that case, since qgroup is disabled there would be no qgroup 1/0, and btrfs_qgroup_check_inherit() would return -ENOENT and fail the whole snapshot creation. [FIX] Just skip the check if qgroup is not enabled. This is to keep the older behavior for user space tools, as if the kernel behavior changed for user space, it is a regression of kernel. Thankfully snapper is also fixing the behavior by detecting if qgroup is running in the first place, so the effect should not be that huge. Link: https://github.com/openSUSE/snapper/issues/894 Fixes: 86211eea8ae1 ("btrfs: qgroup: validate btrfs_qgroup_inherit parameter") CC: stable@vger.kernel.org # 6.8+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-05-02ext4: add support for FS_IOC_GETFSSYSFSPATHKent Overstreet
The new sysfs path ioctl lets us get the /sys/fs/ path for a given filesystem in a fs agnostic way, potentially nudging us towards standarizing some of our reporting. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: linux-ext4@vger.kernel.org Link: https://lore.kernel.org/r/20240315035308.3563511-4-kent.overstreet@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-05-02cxl/cper: Remove duplicated GUID definesIra Weiny
Commit 54ce1927eb78 ("cxl/cper: Fix errant CPER prints for CXL events") moved the CXL CPER section defines to include/linux/cper.h from ghes.c When the latest cxl/cper series was reworked those defines were kept in ghes.c by accident. Thus they were duplicated. Delete the duplicate defines keeping them in the header to be shared between efi and apei. Suggested-by: Fabio M. De Francesco <fabio.maria.de.francesco@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20240502-cper-fix-dup-guid-v1-1-283cc447c7bf@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>