summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-08-25bpf/scripts: Assert helper enum value is aligned with comment orderEyal Birger
The helper value is ABI as defined by enum bpf_func_id. As bpf_helper_defs.h is used for the userpace part, it must be consistent with this enum. Before this change the comments order was used by the bpf_doc script in order to set the helper values defined in the helpers file. When adding new helpers it is very puzzling when the userspace application breaks in weird places if the comment is inserted instead of appended - because the generated helper ABI is incorrect and shifted. This commit sets the helper value to the enum value. In addition it is currently the practice to have the comments appended and kept in the same order as the enum. As such, add an assertion validating the comment order is consistent with enum value. In case a different comments ordering is desired, this assertion can be lifted. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20220824181043.1601429-1-eyal.birger@gmail.com
2022-08-25docs: Update version number from 5.x to 6.x in README.rstLukas Bulwahn
A quick 'grep "5\.x" . -R' on Documentation shows that README.rst, 2.Process.rst and applying-patches.rst all mention the version number "5.x" for kernel releases. As the next release will be version 6.0, updating the version number to 6.x in README.rst seems reasonable. The description in 2.Process.rst is just a description of recent kernel releases, it was last updated in the beginning of 2020, and can be revisited at any time on a regular basis, independent of changing the version number from 5 to 6. So, there is no need to update this document now when transitioning from 5.x to 6.x numbering. The document applying-patches.rst is probably obsolete for most users anyway, a reader will sufficiently well understand the steps, even it mentions version 5 rather than version 6. So, do not update that to a version 6.x numbering scheme. Update version number from 5.x to 6.x in README.rst only. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Link: https://lore.kernel.org/r/20220824080836.23087-1-lukas.bulwahn@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-08-25bpftool: Fix a wrong type cast in btf_dumper_intLam Thai
When `data` points to a boolean value, casting it to `int *` is problematic and could lead to a wrong value being passed to `jsonw_bool`. Change the cast to `bool *` instead. Fixes: b12d6ec09730 ("bpf: btf: add btf print functionality") Signed-off-by: Lam Thai <lamthai@arista.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220824225859.9038-1-lamthai@arista.com
2022-08-25Merge branch 'bpf: rstat: cgroup hierarchical'Alexei Starovoitov
Hao Luo says: ==================== This patch series allows for using bpf to collect hierarchical cgroup stats efficiently by integrating with the rstat framework. The rstat framework provides an efficient way to collect cgroup stats percpu and propagate them through the cgroup hierarchy. The stats are exposed to userspace in textual form by reading files in bpffs, similar to cgroupfs stats by using a cgroup_iter program. cgroup_iter is a type of bpf_iter. It walks over cgroups in four modes: - walking a cgroup's descendants in pre-order. - walking a cgroup's descendants in post-order. - walking a cgroup's ancestors. - process only a single object. When attaching cgroup_iter, one needs to set a cgroup to the iter_link created from attaching. This cgroup can be passed either as a file descriptor or a cgroup id. That cgroup serves as the starting point of the walk. One can also terminate the walk early by returning 1 from the iter program. Note that because walking cgroup hierarchy holds cgroup_mutex, the iter program is called with cgroup_mutex held. ** Background on rstat for stats collection ** (I am using a subscriber analogy that is not commonly used) The rstat framework maintains a tree of cgroups that have updates and which cpus have updates. A subscriber to the rstat framework maintains their own stats. The framework is used to tell the subscriber when and what to flush, for the most efficient stats propagation. The workflow is as follows: - When a subscriber updates a cgroup on a cpu, it informs the rstat framework by calling cgroup_rstat_updated(cgrp, cpu). - When a subscriber wants to read some stats for a cgroup, it asks the rstat framework to initiate a stats flush (propagation) by calling cgroup_rstat_flush(cgrp). - When the rstat framework initiates a flush, it makes callbacks to subscribers to aggregate stats on cpus that have updates, and propagate updates to their parent. Currently, the main subscribers to the rstat framework are cgroup subsystems (e.g. memory, block). This patch series allow bpf programs to become subscribers as well. Patches in this series are organized as follows: * Patches 1-2 introduce cgroup_iter prog, and a selftest. * Patches 3-5 allow bpf programs to integrate with rstat by adding the necessary hook points and kfunc. A comprehensive selftest that demonstrates the entire workflow for using bpf and rstat to efficiently collect and output cgroup stats is added. --- Changelog: v8 -> v9: - Make UNSPEC (an invalid option) as the default order for cgroup_iter. - Use enum for specifying cgroup_iter order, instead of u32. - Add BPF_ITER_RESHCED to cgroup_iter. - Add cgroup_hierarchical_stats to s390x denylist. v7 -> v8: - Removed the confusing BPF_ITER_DEFAULT (Andrii) - s/SELF/SELF_ONLY/g - Fixed typo (e.g. outputing) (Andrii) - Use "descendants_pre", "descendants_post" etc. instead of "pre", "post" (Andrii) v6 -> v7: - Updated commit/comments in cgroup_iter for read() behavior (Yonghong) - Extracted BPF_ITER_SELF and other options out of cgroup_iter, so that they can be used in other iters. Also renamed them. (Andrii) - Supports both cgroup_fd and cgroup_id when specifying target cgroup. (Andrii) - Avoided using macro for formatting expected output in cgroup_iter selftest. (Andrii) - Applied 'static' on all vars and functions in cgroup_iter selftest. (Andrii) - Fixed broken buf reading in cgroup_iter selftest. (Andrii) - Switched to use bpf_link__destroy() unconditionally. (Andrii) - Removed 'volatile' for non-const global vars in selftests. (Andrii) - Started using bpf_core_enum_value() to get memory_cgrp_id. (Andrii) v5 -> v6: - Rebased on bpf-next - Tidy up cgroup_hierarchical_stats test (Andrii) * 'static' and 'inline' * avoid using libbpf_get_error() * string literals of cgroup paths. - Rename patch 8/8 to 'selftests/bpf' (Yonghong) - Fix cgroup_iter comments (e.g. PAGE_SIZE and uapi) (Yonghong) - Make sure further read() returns OK after previous read() finished properly (Yonghong) - Release cgroup_mutex before the last call of show() (Kumar) v4 -> v5: - Rebased on top of new kfunc flags infrastructure, updated patch 1 and patch 6 accordingly. - Added docs for sleepable kfuncs. v3 -> v4: - cgroup_iter: * reorder fields in bpf_link_info to avoid break uapi (Yonghong) * comment the behavior when cgroup_fd=0 (Yonghong) * comment on the limit of number of cgroups supported by cgroup_iter. (Yonghong) - cgroup_hierarchical_stats selftest: * Do not return -1 if stats are not found (causes overflow in userspace). * Check if child process failed to join cgroup. * Make buf and path arrays in get_cgroup_vmscan_delay() static. * Increase the test map sizes to accomodate cgroups that are not created by the test. v2 -> v3: - cgroup_iter: * Added conditional compilation of cgroup_iter.c in kernel/bpf/Makefile (kernel test) and dropped the !CONFIG_CGROUP patch. * Added validation of traversal_order when attaching (Yonghong). * Fixed previous wording "two modes" to "three modes" (Yonghong). * Fixed the btf_dump selftest broken by this patch (Yonghong). * Fixed ctx_arg_info[0] to use "PTR_TO_BTF_ID_OR_NULL" instead of "PTR_TO_BTF_ID", because the "cgroup" pointer passed to iter prog can be null. - Use __diag_push to eliminate __weak noinline warning in bpf_rstat_flush(). - cgroup_hierarchical_stats selftest: * Added write_cgroup_file_parent() helper. * Added error handling for failed map updates. * Added null check for cgroup in vmscan_flush. * Fixed the signature of vmscan_[start/end]. * Correctly return error code when attaching trace programs fail. * Make sure all links are destroyed correctly and not leaking in cgroup_hierarchical_stats selftest. * Use memory.reclaim instead of memory.high as a more reliable way to invoke reclaim. * Eliminated sleeps, the test now runs faster. v1 -> v2: - Redesign of cgroup_iter from v1, based on Alexei's idea [1]: * supports walking cgroup subtree. * supports walking ancestors of a cgroup. (Andrii) * supports terminating the walk early. * uses fd instead of cgroup_id as parameter for iter_link. Using fd is a convention in bpf. * gets cgroup's ref at attach time and deref at detach. * brought back cgroup1 support for cgroup_iter. - Squashed the patches adding the rstat flush hook points and kfuncs (Tejun). - Added a comment explaining why bpf_rstat_flush() needs to be weak (Tejun). - Updated the final selftest with the new cgroup_iter design. - Changed CHECKs in the selftest with ASSERTs (Yonghong, Andrii). - Removed empty line at the end of the selftest (Yonghong). - Renamed test files to cgroup_hierarchical_stats.c. - Reordered CGROUP_PATH params order to match struct declaration in the selftest (Michal). - Removed memory_subsys_enabled() and made sure memcg controller enablement checks make sense and are documented (Michal). RFC v2 -> v1: - Instead of introducing a new program type for rstat flushing, add an empty hook point, bpf_rstat_flush(), and use fentry bpf programs to attach to it and flush bpf stats. - Instead of using helpers, use kfuncs for rstat functions. - These changes simplify the patchset greatly, with minimal changes to uapi. RFC v1 -> RFC v2: - Instead of rstat flush programs attach to subsystems, they now attach to rstat (global flushers, not per-subsystem), based on discussions with Tejun. The first patch is entirely rewritten. - Pass cgroup pointers to rstat flushers instead of cgroup ids. This is much more flexibility and less likely to need a uapi update later. - rstat helpers are now only defined if CGROUP_CONFIG. - Most of the code is now only defined if CGROUP_CONFIG and CONFIG_BPF_SYSCALL. - Move rstat helper protos from bpf_base_func_proto() to tracing_prog_func_proto(). - rstat helpers argument (cgroup pointer) is now ARG_PTR_TO_BTF_ID, not ARG_ANYTHING. - Rewrote the selftest to use the cgroup helpers. - Dropped bpf_map_lookup_percpu_elem (already added by Feng). - Dropped patch to support cgroup v1 for cgroup_iter. - Dropped patch to define some cgroup_put() when !CONFIG_CGROUP. The code that calls it is no longer compiled when !CONFIG_CGROUP. cgroup_iter was originally introduced in a different patch series[2]. Hao and I agreed that it fits better as part of this series. RFC v1 of this patch series had the following changes from [2]: - Getting the cgroup's reference at the time at attaching, instead of at the time when iterating. (Yonghong) - Remove .init_seq_private and .fini_seq_private callbacks for cgroup_iter. They are not needed now. (Yonghong) [1] https://lore.kernel.org/bpf/20220520221919.jnqgv52k4ajlgzcl@MBP-98dd607d3435.dhcp.thefacebook.com/ [2] https://lore.kernel.org/lkml/20220225234339.2386398-9-haoluo@google.com/ Hao Luo (2): bpf: Introduce cgroup iter selftests/bpf: Test cgroup_iter. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-25selftests/bpf: add a selftest for cgroup hierarchical stats collectionYosry Ahmed
Add a selftest that tests the whole workflow for collecting, aggregating (flushing), and displaying cgroup hierarchical stats. TL;DR: - Userspace program creates a cgroup hierarchy and induces memcg reclaim in parts of it. - Whenever reclaim happens, vmscan_start and vmscan_end update per-cgroup percpu readings, and tell rstat which (cgroup, cpu) pairs have updates. - When userspace tries to read the stats, vmscan_dump calls rstat to flush the stats, and outputs the stats in text format to userspace (similar to cgroupfs stats). - rstat calls vmscan_flush once for every (cgroup, cpu) pair that has updates, vmscan_flush aggregates cpu readings and propagates updates to parents. - Userspace program makes sure the stats are aggregated and read correctly. Detailed explanation: - The test loads tracing bpf programs, vmscan_start and vmscan_end, to measure the latency of cgroup reclaim. Per-cgroup readings are stored in percpu maps for efficiency. When a cgroup reading is updated on a cpu, cgroup_rstat_updated(cgroup, cpu) is called to add the cgroup to the rstat updated tree on that cpu. - A cgroup_iter program, vmscan_dump, is loaded and pinned to a file, for each cgroup. Reading this file invokes the program, which calls cgroup_rstat_flush(cgroup) to ask rstat to propagate the updates for all cpus and cgroups that have updates in this cgroup's subtree. Afterwards, the stats are exposed to the user. vmscan_dump returns 1 to terminate iteration early, so that we only expose stats for one cgroup per read. - An ftrace program, vmscan_flush, is also loaded and attached to bpf_rstat_flush. When rstat flushing is ongoing, vmscan_flush is invoked once for each (cgroup, cpu) pair that has updates. cgroups are popped from the rstat tree in a bottom-up fashion, so calls will always be made for cgroups that have updates before their parents. The program aggregates percpu readings to a total per-cgroup reading, and also propagates them to the parent cgroup. After rstat flushing is over, all cgroups will have correct updated hierarchical readings (including all cpus and all their descendants). - Finally, the test creates a cgroup hierarchy and induces memcg reclaim in parts of it, and makes sure that the stats collection, aggregation, and reading workflow works as expected. Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/r/20220824233117.1312810-6-haoluo@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-25selftests/bpf: extend cgroup helpersYosry Ahmed
This patch extends bpf selft cgroup_helpers [ID] n various ways: - Add enable_controllers() that allows tests to enable all or a subset of controllers for a specific cgroup. - Add join_cgroup_parent(). The cgroup workdir is based on the pid, therefore a spawned child cannot join the same cgroup hierarchy of the test through join_cgroup(). join_cgroup_parent() is used in child processes to join a cgroup under the parent's workdir. - Add write_cgroup_file() and write_cgroup_file_parent() (similar to join_cgroup_parent() above). - Add get_root_cgroup() for tests that need to do checks on root cgroup. - Distinguish relative and absolute cgroup paths in function arguments. Now relative paths are called relative_path, and absolute paths are called cgroup_path. Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/r/20220824233117.1312810-5-haoluo@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-25cgroup: bpf: enable bpf programs to integrate with rstatYosry Ahmed
Enable bpf programs to make use of rstat to collect cgroup hierarchical stats efficiently: - Add cgroup_rstat_updated() kfunc, for bpf progs that collect stats. - Add cgroup_rstat_flush() sleepable kfunc, for bpf progs that read stats. - Add an empty bpf_rstat_flush() hook that is called during rstat flushing, for bpf progs that flush stats to attach to. Attaching a bpf prog to this hook effectively registers it as a flush callback. Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/r/20220824233117.1312810-4-haoluo@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-25selftests/bpf: Test cgroup_iter.Hao Luo
Add a selftest for cgroup_iter. The selftest creates a mini cgroup tree of the following structure: ROOT (working cgroup) | PARENT / \ CHILD1 CHILD2 and tests the following scenarios: - invalid cgroup fd. - pre-order walk over descendants from PARENT. - post-order walk over descendants from PARENT. - walk of ancestors from PARENT. - process only a single object (i.e. PARENT). - early termination. Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/r/20220824233117.1312810-3-haoluo@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-25bpf: Introduce cgroup iterHao Luo
Cgroup_iter is a type of bpf_iter. It walks over cgroups in four modes: - walking a cgroup's descendants in pre-order. - walking a cgroup's descendants in post-order. - walking a cgroup's ancestors. - process only the given cgroup. When attaching cgroup_iter, one can set a cgroup to the iter_link created from attaching. This cgroup is passed as a file descriptor or cgroup id and serves as the starting point of the walk. If no cgroup is specified, the starting point will be the root cgroup v2. For walking descendants, one can specify the order: either pre-order or post-order. For walking ancestors, the walk starts at the specified cgroup and ends at the root. One can also terminate the walk early by returning 1 from the iter program. Note that because walking cgroup hierarchy holds cgroup_mutex, the iter program is called with cgroup_mutex held. Currently only one session is supported, which means, depending on the volume of data bpf program intends to send to user space, the number of cgroups that can be walked is limited. For example, given the current buffer size is 8 * PAGE_SIZE, if the program sends 64B data for each cgroup, assuming PAGE_SIZE is 4kb, the total number of cgroups that can be walked is 512. This is a limitation of cgroup_iter. If the output data is larger than the kernel buffer size, after all data in the kernel buffer is consumed by user space, the subsequent read() syscall will signal EOPNOTSUPP. In order to work around, the user may have to update their program to reduce the volume of data sent to output. For example, skip some uninteresting cgroups. In future, we may extend bpf_iter flags to allow customizing buffer size. Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/r/20220824233117.1312810-2-haoluo@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-25drm/amdgpu: mmVM_L2_CNTL3 register not initialized correctlyQu Huang
The mmVM_L2_CNTL3 register is not assigned an initial value Signed-off-by: Qu Huang <jinsdb@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-25drm/amdgpu: add MGCG perfmon setting for gfx11Likun Gao
Enable GFX11 MGCG perfmon setting. V2: set rlc to saft mode before setting. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-25drm/amdkfd: Fix isa version for the GC 10.3.7Prike Liang
Correct the isa version for handling KFD test. Fixes: 7c4f4f197e0c ("drm/amdkfd: Add GC 10.3.6 and 10.3.7 KFD definitions") Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-25drm/amdgpu: Fix page table setup on ArcturusMukul Joshi
When translate_further is enabled, page table depth needs to be updated. This was missing on Arcturus MMHUB init. This was causing address translations to fail for SDMA user-mode queues. Fixes: 352e683b72e7 ("drm/amdgpu: Enable translate_further to extend UTCL2 reach") Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-25drm/amd/pm: update SMU 13.0.0 driver_if headerEvan Quan
To fit the latest 78.53 PMFW. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-25drm/amdgpu: add sdma instance check for gfx11 CGCGTim Huang
For some ASICs, like GFX IP v11.0.1, only have one SDMA instance, so not need to configure SDMA1_RLC_CGCG_CTRL for this case. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-25drm/amd/display: enable PCON support for dcn314Roman Li
[Why] DCN314 supports PCON. [How] Explicitly enable it in dcn314 resources. Signed-off-by: Roman Li <roman.li@amd.com> Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-25drm/amdgpu: enable NBIO IP v7.7.0 Clock GatingTim Huang
Enable AMD_CG_SUPPORT_BIF_MGCG and AMD_CG_SUPPORT_BIF_LS support. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-25drm/amdgpu: add NBIO IP v7.7.0 Clock Gating supportTim Huang
Add BIF Clock Gating MGCG and LS support for NBIO IP v7.7.0. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-25drm/amdgpu: add TX_POWER_CTRL_1 macro definitions for NBIO IP v7.7.0Tim Huang
Add the BIF0_PCIE_TX_POWER_CTRL_1 register offset and mask macro definitions for AMD_CG_SUPPORT_BIF_LS. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-25Merge tag 'cgroup-for-6.0-rc2-fixes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull another cgroup fix from Tejun Heo: "Commit 4f7e7236435c ("cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock") required the cgroup core to grab cpus_read_lock() before invoking ->attach(). Unfortunately, it missed adding cpus_read_lock() in cgroup_attach_task_all(). Fix it" * tag 'cgroup-for-6.0-rc2-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Add missing cpus_read_lock() to cgroup_attach_task_all()
2022-08-25cgroup: Add missing cpus_read_lock() to cgroup_attach_task_all()Tetsuo Handa
syzbot is hitting percpu_rwsem_assert_held(&cpu_hotplug_lock) warning at cpuset_attach() [1], for commit 4f7e7236435ca0ab ("cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock") missed that cpuset_attach() is also called from cgroup_attach_task_all(). Add cpus_read_lock() like what cgroup_procs_write_start() does. Link: https://syzkaller.appspot.com/bug?extid=29d3a3b4d86c8136ad9e [1] Reported-by: syzbot <syzbot+29d3a3b4d86c8136ad9e@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: 4f7e7236435ca0ab ("cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock") Signed-off-by: Tejun Heo <tj@kernel.org>
2022-08-25Revert "xhci: turn off port power in shutdown"Mathias Nyman
This reverts commit 83810f84ecf11dfc5a9414a8b762c3501b328185. Turning off port power in shutdown did cause issues such as a laptop not proprly powering off, and some specific usb devies failing to enumerate the subsequent boot after a warm reset. So revert this. Fixes: 83810f84ecf1 ("xhci: turn off port power in shutdown") Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20220825150840.132216-4-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25xhci: Add grace period after xHC start to prevent premature runtime suspend.Mathias Nyman
After xHC controller is started, either in probe or resume, it can take a while before any of the connected usb devices are visible to the roothub due to link training. It's possible xhci driver loads, sees no acivity and suspends the host before the USB device is visible. In one testcase with a hotplugged xHC controller the host finally detected the connected USB device and generated a wake 500ms after host initial start. If hosts didn't suspend the device duringe training it probablty wouldn't take up to 500ms to detect it, but looking at specs reveal USB3 link training has a couple long timeout values, such as 120ms RxDetectQuietTimeout, and 360ms PollingLFPSTimeout. So Add a 500ms grace period that keeps polling the roothub for 500ms after start, preventing runtime suspend until USB devices are detected. Cc: stable@vger.kernel.org Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20220825150840.132216-3-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25xhci: Fix null pointer dereference in remove if xHC has only one roothubMathias Nyman
The remove path in xhci platform driver tries to remove and put both main and shared hcds even if only a main hcd exists (one roothub) This causes a null pointer dereference in reboot for those controllers. Check that the shared_hcd exists before trying to remove it. Fixes: e0fe986972f5 ("usb: host: xhci-plat: prepare operation w/o shared hcd") Reported-by: Alexey Sheplyakov <asheplyakov@basealt.ru> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20220825150840.132216-2-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25xen/privcmd: fix error exit of privcmd_ioctl_dm_op()Juergen Gross
The error exit of privcmd_ioctl_dm_op() is calling unlock_pages() potentially with pages being NULL, leading to a NULL dereference. Additionally lock_pages() doesn't check for pin_user_pages_fast() having been completely successful, resulting in potentially not locking all pages into memory. This could result in sporadic failures when using the related memory in user mode. Fix all of that by calling unlock_pages() always with the real number of pinned pages, which will be zero in case pages being NULL, and by checking the number of pages pinned by pin_user_pages_fast() matching the expected number of pages. Cc: <stable@vger.kernel.org> Fixes: ab520be8cd5d ("xen/privcmd: Add IOCTL_PRIVCMD_DM_OP") Reported-by: Rustam Subkhankulov <subkhankulov@ispras.ru> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Link: https://lore.kernel.org/r/20220825141918.3581-1-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2022-08-25Documentation/ABI: Mention retbleed vulnerability info file for sysfsSalvatore Bonaccorso
While reporting for the AMD retbleed vulnerability was added in 6b80b59b3555 ("x86/bugs: Report AMD retbleed vulnerability") the new sysfs file was not mentioned so far in the ABI documentation for sysfs-devices-system-cpu. Fix that. Fixes: 6b80b59b3555 ("x86/bugs: Report AMD retbleed vulnerability") Signed-off-by: Salvatore Bonaccorso <carnil@debian.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220801091529.325327-1-carnil@debian.org
2022-08-25x86/sev: Mark snp_abort() noreturnBorislav Petkov
Mark both the function prototype and definition as noreturn in order to prevent the compiler from doing transformations which confuse objtool like so: vmlinux.o: warning: objtool: sme_enable+0x71: unreachable instruction This triggers with gcc-12. Add it and sev_es_terminate() to the objtool noreturn tracking array too. Sort it while at it. Suggested-by: Michael Matz <matz@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220824152420.20547-1-bp@alien8.de
2022-08-25io_uring/net: save address for sendzc async executionPavel Begunkov
We usually copy all bits that a request needs from the userspace for async execution, so the userspace can keep them on the stack. However, send zerocopy violates this pattern for addresses and may reloads it e.g. from io-wq. Save the address if any in ->async_data as usual. Reported-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d7512d7aa9abcd36e9afe1a4d292a24cb2d157e5.1661342812.git.asml.silence@gmail.com [axboe: fold in incremental fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-25HID: nintendo: fix rumble worker null pointer derefDaniel J. Ogorchock
We can dereference a null pointer trying to queue work to a destroyed workqueue. If the device is disconnected, nintendo_hid_remove is called, in which the rumble_queue is destroyed. Avoid using that queue to defer rumble work once the controller state is set to JOYCON_CTLR_STATE_REMOVED. This eliminates the null pointer dereference. Signed-off-by: Daniel J. Ogorchock <djogorchock@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2022-08-25s390/mm: do not trigger write fault when vma does not allow VM_WRITEGerald Schaefer
For non-protection pXd_none() page faults in do_dat_exception(), we call do_exception() with access == (VM_READ | VM_WRITE | VM_EXEC). In do_exception(), vma->vm_flags is checked against that before calling handle_mm_fault(). Since commit 92f842eac7ee3 ("[S390] store indication fault optimization"), we call handle_mm_fault() with FAULT_FLAG_WRITE, when recognizing that it was a write access. However, the vma flags check is still only checking against (VM_READ | VM_WRITE | VM_EXEC), and therefore also calling handle_mm_fault() with FAULT_FLAG_WRITE in cases where the vma does not allow VM_WRITE. Fix this by changing access check in do_exception() to VM_WRITE only, when recognizing write access. Link: https://lkml.kernel.org/r/20220811103435.188481-3-david@redhat.com Fixes: 92f842eac7ee3 ("[S390] store indication fault optimization") Cc: <stable@vger.kernel.org> Reported-by: David Hildenbrand <david@redhat.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-08-25s390: fix double free of GS and RI CBs on fork() failureBrian Foster
The pointers for guarded storage and runtime instrumentation control blocks are stored in the thread_struct of the associated task. These pointers are initially copied on fork() via arch_dup_task_struct() and then cleared via copy_thread() before fork() returns. If fork() happens to fail after the initial task dup and before copy_thread(), the newly allocated task and associated thread_struct memory are freed via free_task() -> arch_release_task_struct(). This results in a double free of the guarded storage and runtime info structs because the fields in the failed task still refer to memory associated with the source task. This problem can manifest as a BUG_ON() in set_freepointer() (with CONFIG_SLAB_FREELIST_HARDENED enabled) or KASAN splat (if enabled) when running trinity syscall fuzz tests on s390x. To avoid this problem, clear the associated pointer fields in arch_dup_task_struct() immediately after the new task is copied. Note that the RI flag is still cleared in copy_thread() because it resides in thread stack memory and that is where stack info is copied. Signed-off-by: Brian Foster <bfoster@redhat.com> Fixes: 8d9047f8b967c ("s390/runtime instrumentation: simplify task exit handling") Fixes: 7b83c6297d2fc ("s390/guarded storage: simplify task exit handling") Cc: <stable@vger.kernel.org> # 4.15 Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20220816155407.537372-1-bfoster@redhat.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-08-25net: sched: delete duplicate cleanup of backlog and qlenZhengchao Shao
qdisc_reset() is clearing qdisc->q.qlen and qdisc->qstats.backlog _after_ calling qdisc->ops->reset. There is no need to clear them again in the specific reset function. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20220824005231.345727-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-25HID: intel-ish-hid: ipc: Add Meteor Lake PCI device IDEven Xu
Add device ID of Meteor Lake P into ishtp support list. Signed-off-by: Even Xu <even.xu@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2022-08-25xen: move from strlcpy with unused retval to strscpyWolfram Sang
Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Link: https://lore.kernel.org/r/20220818210122.7613-1-wsa+renesas@sang-engineering.com Signed-off-by: Juergen Gross <jgross@suse.com>
2022-08-25xen: x86: remove setting the obsolete config XEN_MAX_DOMAIN_MEMORYLukas Bulwahn
Commit c70727a5bc18 ("xen: allow more than 512 GB of RAM for 64 bit pv-domains") from July 2015 replaces the config XEN_MAX_DOMAIN_MEMORY with a new config XEN_512GB, but misses to adjust arch/x86/configs/xen.config. As XEN_512GB defaults to yes, there is no need to explicitly set any config in xen.config. Just remove setting the obsolete config XEN_MAX_DOMAIN_MEMORY. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20220817044333.22310-1-lukas.bulwahn@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com>
2022-08-25LoongArch: mm: Avoid unnecessary page fault retires on shared memory typesHuacai Chen
Commit d92725256b4f22d0 ("mm: avoid unnecessary page fault retires on shared memory types") modifies do_page_fault() to handle the VM_FAULT_ COMPLETED case, but forget to change for LoongArch, so fix it as other architectures does. Fixes: d92725256b4f22d0 ("mm: avoid unnecessary page fault retires on shared memory types") Reviewed-by: Guo Ren <guoren@kernel.org> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-08-25LoongArch: Add subword xchg/cmpxchg emulationHuacai Chen
LoongArch only support 32-bit/64-bit xchg/cmpxchg in native. But percpu operation, qspinlock and some drivers need 8-bit/16-bit xchg/cmpxchg. We add subword xchg/cmpxchg emulation in this patch because the emulation has better performance than the generic implementation (on NUMA system), and it can fix some build errors meanwhile [1]. LoongArch's guarantee for forward progress (avoid many ll/sc happening at the same time and no one succeeds): We have the "exclusive access (with timeout) of ll" feature to avoid simultaneous ll (which also blocks other memory load/store on the same address), and the "random delay of sc" feature to avoid simultaneous sc. It is a mandatory requirement for multi-core LoongArch processors to implement such features, only except those single-core and dual-core processors (they also don't support multi-chip interconnection). Feature bits are introduced in CPUCFG3, bit 3 and bit 4 [2]. [1] https://lore.kernel.org/loongarch/CAAhV-H6vvkuOzy8OemWdYK3taj5Jn3bFX0ZTwE=twM8ywpBUYA@mail.gmail.com/T/#t [2] https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#_cpucfg Reported-by: Sudip Mukherjee (Codethink) <sudipm.mukherjee@gmail.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Rui Wang <wangrui@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-08-25LoongArch: Cleanup headers to avoid circular dependencyHuacai Chen
When enable GENERIC_IOREMAP, there will be circular dependency to cause build errors. The root cause is that pgtable.h shouldn't include io.h but pgtable.h need some macros defined in io.h. So cleanup those macros and remove the unnecessary inclusions, as other architectures do. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-08-25LoongArch: Cleanup reset routines with new APIHuacai Chen
Cleanup reset routines by using new do_kernel_power_off() instead of old pm_power_off(), and then simplify the whole file (reset.c) organization by inlining some functions. This cleanup also fix a poweroff error if EFI runtime is disabled. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-08-25LoongArch: Fix build warnings in VDSOHuacai Chen
Fix build warnings in VDSO as below: arch/loongarch/vdso/vgettimeofday.c:9:5: warning: no previous prototype for '__vdso_clock_gettime' [-Wmissing-prototypes] 9 | int __vdso_clock_gettime(clockid_t clock, | ^~~~~~~~~~~~~~~~~~~~ arch/loongarch/vdso/vgettimeofday.c:15:5: warning: no previous prototype for '__vdso_gettimeofday' [-Wmissing-prototypes] 15 | int __vdso_gettimeofday(struct __kernel_old_timeval *tv, | ^~~~~~~~~~~~~~~~~~~ arch/loongarch/vdso/vgettimeofday.c:21:5: warning: no previous prototype for '__vdso_clock_getres' [-Wmissing-prototypes] 21 | int __vdso_clock_getres(clockid_t clock_id, | ^~~~~~~~~~~~~~~~~~~ arch/loongarch/vdso/vgetcpu.c:27:5: warning: no previous prototype for '__vdso_getcpu' [-Wmissing-prototypes] 27 | int __vdso_getcpu(unsigned int *cpu, unsigned int *node, struct getcpu_cache *unused) Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-08-25LoongArch: Select PCI_QUIRKS to avoid build errorHuacai Chen
PCI_LOONGSON is a mandatory for LoongArch and it is selected in Kconfig unconditionally, but its dependency PCI_QUIRKS is missing and may cause a build error when "make randconfig": arch/loongarch/pci/acpi.c: In function 'pci_acpi_setup_ecam_mapping': >> arch/loongarch/pci/acpi.c:103:29: error: 'loongson_pci_ecam_ops' undeclared (first use in this function) 103 | ecam_ops = &loongson_pci_ecam_ops; | ^~~~~~~~~~~~~~~~~~~~~ arch/loongarch/pci/acpi.c:103:29: note: each undeclared identifier is reported only once for each function it appears in Kconfig warnings: (for reference only) WARNING: unmet direct dependencies detected for PCI_LOONGSON Depends on [n]: PCI [=y] && (MACH_LOONGSON64 [=y] || COMPILE_TEST [=y]) && (OF [=y] || ACPI [=y]) && PCI_QUIRKS [=n] Selected by [y]: - LOONGARCH [=y] Fix it by selecting PCI_QUIRKS unconditionally, too. Reported-by: kernel test robot <lkp@intel.com> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2022-08-25ACPI: property: Remove default association from integer maximum valuesSakari Ailus
Remove the default association from integer maximum value checks. It is not necessary and has caused a bug in other associations being unnoticed. Fixes: 923044133367 ("ACPI: property: Unify integer value reading functions") Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-08-25ACPI: property: Ignore already existing data node tagsSakari Ailus
ACPI node pointers are attached to data node handles, in order to resolve string references to them. _DSD guide allows the same node to be reached from multiple parent nodes, leading the node enumeration algorithm to each such nodes more than once. As attached data already already exists, attaching data with the same tag will fail. Address this problem by ignoring nodes that have been already tagged. Fixes: 1d52f10917a7 ("ACPI: property: Tie data nodes to acpi handles") Reported-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Tested-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-08-25ACPI: property: Fix type detection of unified integer reading functionsStefan Binding
The current code expects the type of the value to be an integer type, instead the value passed to the macro is a pointer. Ensure the size comparison uses the correct pointer type to choose the max value, instead of using the integer type. Fixes: 923044133367 ("ACPI: property: Unify integer value reading functions") Signed-off-by: Stefan Binding <sbinding@opensource.cirrus.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Tested-by: Sakari Ailus <sakari.ailus@linux.intel.com> Tested-by: John Garry <john.garry@huawei.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-08-25net: ethernet: mtk_eth_soc: fix hw hash reporting for MTK_NETSYS_V2Lorenzo Bianconi
Properly report hw rx hash for mt7986 chipset accroding to the new dma descriptor layout. Fixes: 197c9e9b17b11 ("net: ethernet: mtk_eth_soc: introduce support for mt7986 chipset") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/091394ea4e705fbb35f828011d98d0ba33808f69.1661257293.git.lorenzo@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-25dma-buf/dma-resv: check if the new fence is really laterChristian König
Previously when we added a fence to a dma_resv object we always assumed the the newer than all the existing fences. With Jason's work to add an UAPI to explicit export/import that's not necessary the case any more. So without this check we would allow userspace to force the kernel into an use after free error. Since the change is very small and defensive it's probably a good idea to backport this to stable kernels as well just in case others are using the dma_resv object in the same way. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220810172617.140047-1-christian.koenig@amd.com Cc: stable@vger.kernel.org # v5.19+
2022-08-25HID: input: fix uclogic tabletsBenjamin Tissoires
commit 87562fcd1342 ("HID: input: remove the need for HID_QUIRK_INVERT") made the assumption that it was the only one handling tablets and thus kept an internal state regarding the tool. Turns out that the uclogic driver has a timer to release the in range bit, effectively making hid-input ignoring all in range information after the very first one. Fix that by having a more rationale approach which consists in forwarding every event and let the input stack filter out the duplicates. Reported-by: Stefan Hansson <newbie13xd@gmail.com> Fixes: 87562fcd1342 ("HID: input: remove the need for HID_QUIRK_INVERT") Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2022-08-25HID: Add Apple Touchbar on T2 Macs in hid_have_special_driver listAditya Garg
The touchbar on Apple T2 Macs has 2 modes, one that shows the function keys and other that shows the media controls. The user can use the fn key on his keyboard to switch between the 2 modes. On Linux, if people were using an external keyboard or mouse, the touchbar failed to change modes on pressing the fn key with the following in dmesg :- [ 10.661445] apple-ib-als 0003:05AC:8262.0001: : USB HID v1.01 Device [Apple Inc. Ambient Light Sensor] on usb-bce-vhci-3/input0 [ 11.830992] apple-ib-touchbar 0003:05AC:8302.0007: input: USB HID v1.01 Keyboard [Apple Inc. Touch Bar Display] on usb-bce-vhci-6/input0 [ 12.139407] apple-ib-touchbar 0003:05AC:8102.0008: : USB HID v1.01 Device [Apple Inc. Touch Bar Backlight] on usb-bce-vhci-7/input0 [ 12.211824] apple-ib-touchbar 0003:05AC:8102.0009: : USB HID v1.01 Device [Apple Inc. Touch Bar Backlight] on usb-bce-vhci-7/input1 [ 14.219759] apple-ib-touchbar 0003:05AC:8302.0007: tb: Failed to set touch bar mode to 2 (-110) [ 24.395670] apple-ib-touchbar 0003:05AC:8302.0007: tb: Failed to set touch bar mode to 2 (-110) [ 34.635791] apple-ib-touchbar 0003:05AC:8302.0007: tb: Failed to set touch bar mode to 2 (-110) [ 269.579233] apple-ib-touchbar 0003:05AC:8302.0007: tb: Failed to set touch bar mode to 1 (-110) Add the USB IDs of the touchbar found in T2 Macs to HID have special driver list to fix the issue. Signed-off-by: Aditya Garg <gargaditya08@live.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2022-08-25HID: add Lenovo Yoga C630 battery quirkSteev Klimaszewski
Similar to the Surface Go devices, the Elantech touchscreen/digitizer in the Lenovo Yoga C630 mistakenly reports the battery of the stylus, and always reports an empty battery. Apply the HID_BATTERY_QUIRK_IGNORE quirk to ignore this battery and prevent the erroneous low battery warnings. Signed-off-by: Steev Klimaszewski <steev@kali.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2022-08-25HID: AMD_SFH: Add a DMI quirk entry for ChromebooksAkihiko Odaki
Google Chromebooks use Chrome OS Embedded Controller Sensor Hub instead of Sensor Hub Fusion and leaves MP2 uninitialized, which disables all functionalities, even including the registers necessary for feature detections. The behavior was observed with Lenovo ThinkPad C13 Yoga. Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com> Suggested-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>