summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-08-25thermal: intel: Allow processing of HWP interruptSrinivas Pandruvada
Add a weak function to process HWP (Hardware P-states) notifications and move updating HWP_STATUS MSR to this function. This allows HWP interrupts to be processed by the intel_pstate driver in HWP mode by overriding the implementation. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-25ACPI: tables: FPDT: Do not print FW_BUG message if record types are reservedAdrian Huang
In ACPI 6.4 spec, record types "0x0002-0xffff" of FPDT Performance Record Types [1] and record types "0x0003-0xffff" of Runtime Performance Record Types [2] are reserved. Users might be confused with the FW_BUG message, and they think this is the FW issue. Here is the example in a Lenovo box: ACPI: FPDT 0x00000000A820A000 000044 (v01 LENOVO THINKSYS 00000100 01000013) ACPI: Reserving FPDT table memory at [mem 0xa820a000-0xa820a043] ACPI FPDT: [Firmware Bug]: Invalid record 4113 found So, remove the FW_BUG message to avoid confusion since those types are reserved in ACPI 6.4 spec. [1] https://uefi.org/specs/ACPI/6.4/05_ACPI_Software_Programming_Model/ACPI_Software_Programming_Model.html#fpdt-performance-record-types-table [2] https://uefi.org/specs/ACPI/6.4/05_ACPI_Software_Programming_Model/ACPI_Software_Programming_Model.html#runtime-performance-record-types-table Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-25ACPI: button: Add DMI quirk for Lenovo Yoga 9 (14INTL5)Ulrich Huber
The Lenovo Yoga 9 (14INTL5)'s ACPI _LID is bugged: After hibernation the lid is initially reported as closed. Once closing and then reopening the lid reports the lid as open again. This leads to the conclusion that the initial notification of the lid is missing but subsequent notifications are correct. In order fo the Linux LID code to handle this device properly the lid_init_state must be set to ACPI_BUTTON_LID_INIT_OPEN. Signed-off-by: Ulrich Huber <ulrich@huberulrich.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-25RDMA/hfi1: Convert to SPDX identifierCai Huoqing
use SPDX-License-Identifier instead of a verbose license text Link: https://lore.kernel.org/r/20210823042622.109-1-caihuoqing@baidu.com Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-25IB/rdmavt: Convert to SPDX identifierCai Huoqing
use SPDX-License-Identifier instead of a verbose license text Link: https://lore.kernel.org/r/20210823023530.48-1-caihuoqing@baidu.com Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-25ACPI: Add memory semantics to acpi_os_map_memory()Lorenzo Pieralisi
The memory attributes attached to memory regions depend on architecture specific mappings. For some memory regions, the attributes specified by firmware (eg uncached) are not sufficient to determine how a memory region should be mapped by an OS (for instance a region that is define as uncached in firmware can be mapped as Normal or Device memory on arm64) and therefore the OS must be given control on how to map the region to match the expected mapping behaviour (eg if a mapping is requested with memory semantics, it must allow unaligned accesses). Rework acpi_os_map_memory() and acpi_os_ioremap() back-end to split them into two separate code paths: acpi_os_memmap() -> memory semantics acpi_os_ioremap() -> MMIO semantics The split allows the architectural implementation back-ends to detect the default memory attributes required by the mapping in question (ie the mapping API defines the semantics memory vs MMIO) and map the memory accordingly. Link: https://lore.kernel.org/linux-arm-kernel/31ffe8fc-f5ee-2858-26c5-0fd8bdd68702@arm.com Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-25Merge branch 'bpf: Add bpf_task_pt_regs() helper'Alexei Starovoitov
Daniel Xu says: ==================== The motivation behind this helper is to access userspace pt_regs in a kprobe handler. uprobe's ctx is the userspace pt_regs. kprobe's ctx is the kernelspace pt_regs. bpf_task_pt_regs() allows accessing userspace pt_regs in a kprobe handler. The final case (kernelspace pt_regs in uprobe) is pretty rare (usermode helper) so I think that can be solved later if necessary. More concretely, this helper is useful in doing BPF-based DWARF stack unwinding. Currently the kernel can only do framepointer based stack unwinds for userspace code. This is because the DWARF state machines are too fragile to be computed in kernelspace [0]. The idea behind DWARF-based stack unwinds w/ BPF is to copy a chunk of the userspace stack (while in prog context) and send it up to userspace for unwinding (probably with libunwind) [1]. This would effectively enable profiling applications with -fomit-frame-pointer using kprobes and uprobes. [0]: https://lkml.org/lkml/2012/2/10/356 [1]: https://github.com/danobi/bpf-dwarf-walk Changes from v1: - Conwolidate BTF_ID decls for task_struct - Enable bpf_get_current_task_btf() for all prog types - Enable bpf_task_pt_regs() for all prog types - Use ASSERT_* macros instead of CHECK ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-08-25bpf: selftests: Add bpf_task_pt_regs() selftestDaniel Xu
This test retrieves the uprobe's pt_regs in two different ways and compares the contents in an arch-agnostic way. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/5581eb8800f6625ec8813fe21e9dce1fbdef4937.1629772842.git.dxu@dxuuu.xyz
2021-08-25bpf: Add bpf_task_pt_regs() helperDaniel Xu
The motivation behind this helper is to access userspace pt_regs in a kprobe handler. uprobe's ctx is the userspace pt_regs. kprobe's ctx is the kernelspace pt_regs. bpf_task_pt_regs() allows accessing userspace pt_regs in a kprobe handler. The final case (kernelspace pt_regs in uprobe) is pretty rare (usermode helper) so I think that can be solved later if necessary. More concretely, this helper is useful in doing BPF-based DWARF stack unwinding. Currently the kernel can only do framepointer based stack unwinds for userspace code. This is because the DWARF state machines are too fragile to be computed in kernelspace [0]. The idea behind DWARF-based stack unwinds w/ BPF is to copy a chunk of the userspace stack (while in prog context) and send it up to userspace for unwinding (probably with libunwind) [1]. This would effectively enable profiling applications with -fomit-frame-pointer using kprobes and uprobes. [0]: https://lkml.org/lkml/2012/2/10/356 [1]: https://github.com/danobi/bpf-dwarf-walk Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/e2718ced2d51ef4268590ab8562962438ab82815.1629772842.git.dxu@dxuuu.xyz
2021-08-25bpf: Extend bpf_base_func_proto helpers with bpf_get_current_task_btf()Daniel Xu
bpf_get_current_task() is already supported so it's natural to also include the _btf() variant for btf-powered helpers. This is required for non-tracing progs to use bpf_task_pt_regs() in the next commit. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/f99870ed5f834c9803d73b3476f8272b1bb987c0.1629772842.git.dxu@dxuuu.xyz
2021-08-25bpf: Consolidate task_struct BTF_ID declarationsDaniel Xu
No need to have it defined 5 times. Once is enough. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/6dcefa5bed26fe1226f26683f36819bb53ec19a2.1629772842.git.dxu@dxuuu.xyz
2021-08-25bpf: Add BTF_ID_LIST_GLOBAL_SINGLE macroDaniel Xu
Same as BTF_ID_LIST_SINGLE macro except defines a global ID. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/a867a97517df42fd3953eeb5454402b57e74538f.1629772842.git.dxu@dxuuu.xyz
2021-08-25pipe: do FASYNC notifications for every pipe IO, not just state changesLinus Torvalds
It turns out that the SIGIO/FASYNC situation is almost exactly the same as the EPOLLET case was: user space really wants to be notified after every operation. Now, in a perfect world it should be sufficient to only notify user space on "state transitions" when the IO state changes (ie when a pipe goes from unreadable to readable, or from unwritable to writable). User space should then do as much as possible - fully emptying the buffer or what not - and we'll notify it again the next time the state changes. But as with EPOLLET, we have at least one case (stress-ng) where the kernel sent SIGIO due to the pipe being marked for asynchronous notification, but the user space signal handler then didn't actually necessarily read it all before returning (it read more than what was written, but since there could be multiple writes, it could leave data pending). The user space code then expected to get another SIGIO for subsequent writes - even though the pipe had been readable the whole time - and would only then read more. This is arguably a user space bug - and Colin King already fixed the stress-ng code in question - but the kernel regression rules are clear: it doesn't matter if kernel people think that user space did something silly and wrong. What matters is that it used to work. So if user space depends on specific historical kernel behavior, it's a regression when that behavior changes. It's on us: we were silly to have that non-optimal historical behavior, and our old kernel behavior was what user space was tested against. Because of how the FASYNC notification was tied to wakeup behavior, this was first broken by commits f467a6a66419 and 1b6b26ae7053 ("pipe: fix and clarify pipe read/write wakeup logic"), but at the time it seems nobody noticed. Probably because the stress-ng problem case ends up being timing-dependent too. It was then unwittingly fixed by commit 3a34b13a88ca ("pipe: make pipe writes always wake up readers") only to be broken again when by commit 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads"). And at that point the kernel test robot noticed the performance refression in the stress-ng.sigio.ops_per_sec case. So the "Fixes" tag below is somewhat ad hoc, but it matches when the issue was noticed. Fix it for good (knock wood) by simply making the kill_fasync() case separate from the wakeup case. FASYNC is quite rare, and we clearly shouldn't even try to use the "avoid unnecessary wakeups" logic for it. Link: https://lore.kernel.org/lkml/20210824151337.GC27667@xsang-OptiPlex-9020/ Fixes: 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads") Reported-by: kernel test robot <oliver.sang@intel.com> Tested-by: Oliver Sang <oliver.sang@intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Colin Ian King <colin.king@canonical.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-25arm64: mm: fix comment typo of pud_offset_phys()Xujun Leng
Fix a typo in the comment of macro pud_offset_phys(). Signed-off-by: Xujun Leng <lengxujun2007@126.com> Link: https://lore.kernel.org/r/20210825150526.12582-1-lengxujun2007@126.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-25Merge branch 'for-v5.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ucount fixes from Eric Biederman: "This branch fixes a regression that made it impossible to increase rlimits that had been converted to the ucount infrastructure, and also fixes a reference counting bug where the reference was not incremented soon enough. The fixes are trivial and the bugs have been encountered in the wild, and the fixes have been tested" * 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ucounts: Increase ucounts reference counter before the security hook ucounts: Fix regression preventing increasing of rlimits in init_user_ns
2021-08-25RDMA/hns: Bugfix for incorrect association between dip_idx and dgidJunxian Huang
dip_idx and dgid should be a one-to-one mapping relationship, but when qp_num loops back to the start number, it may happen that two different dgid are assiociated to the same dip_idx incorrectly. One solution is to store the qp_num that is not assigned to dip_idx in an array. When a dip_idx needs to be allocated to a new dgid, an spare qp_num is extracted and assigned to dip_idx. Fixes: f91696f2f053 ("RDMA/hns: Support congestion control type selection according to the FW") Link: https://lore.kernel.org/r/1629884592-23424-4-git-send-email-liangwenpeng@huawei.com Signed-off-by: Junxian Huang <huangjunxian4@hisilicon.com> Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-25RDMA/hns: Bugfix for the missing assignment for dip_idxJunxian Huang
When the dgid-dip_idx mapping relationship exists, dip should be assigned. Fixes: f91696f2f053 ("RDMA/hns: Support congestion control type selection according to the FW") Link: https://lore.kernel.org/r/1629884592-23424-3-git-send-email-liangwenpeng@huawei.com Signed-off-by: Junxian Huang <huangjunxian4@hisilicon.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-25RDMA/hns: Bugfix for data type of dip_idxJunxian Huang
dip_idx is associated with qp_num whose data type is u32. However, dip_idx is incorrectly defined as u8 data in the hns_roce_dip struct, which leads to data truncation during value assignment. Fixes: f91696f2f053 ("RDMA/hns: Support congestion control type selection according to the FW") Link: https://lore.kernel.org/r/1629884592-23424-2-git-send-email-liangwenpeng@huawei.com Signed-off-by: Junxian Huang <huangjunxian4@hisilicon.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-25RDMA/hns: Fix incorrect lsn fieldYixing Liu
In RNR NAK screnario, according to the specification, when no credit is available, only the first fragment of the send request can be sent. The LSN(Limit Sequence Number) field should be 0 or the entire packet will be resent. Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC") Link: https://lore.kernel.org/r/1629883169-2306-1-git-send-email-liangwenpeng@huawei.com Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-25cgroup/cpuset: Avoid memory migration when nodemasks matchNicolas Saenz Julienne
With the introduction of ee9707e8593d ("cgroup/cpuset: Enable memory migration for cpuset v2") attaching a process to a different cgroup will trigger a memory migration regardless of whether it's really needed. Memory migration is an expensive operation, so bypass it if the nodemasks passed to cpuset_migrate_mm() are equal. Note that we're not only avoiding the migration work itself, but also a call to lru_cache_disable(), which triggers and flushes an LRU drain work on every online CPU. Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-08-25arm64: signal32: Drop pointless call to sigdelsetmask()Will Deacon
Commit 77097ae503b1 ("most of set_current_blocked() callers want SIGKILL/SIGSTOP removed from set") extended set_current_blocked() to remove SIGKILL and SIGSTOP from the new signal set and updated all callers accordingly. Unfortunately, this collided with the merge of the arm64 architecture, which duly removes these signals when restoring the compat sigframe, as this was what was previously done by arch/arm/. Remove the redundant call to sigdelsetmask() from compat_restore_sigframe(). Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210825093911.24493-1-will@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-08-25RDMA/irdma: Remove the repeated declarationShaokun Zhang
Functions 'irdma_alloc_ws_node_id' and 'irdma_free_ws_node_id' are declared twice, so remove the repeated declaration. Link: https://lore.kernel.org/r/1629861674-53343-1-git-send-email-zhangshaokun@hisilicon.com Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-25RDMA/core/sa_query: Retry SA queriesHåkon Bugge
A MAD packet is sent as an unreliable datagram (UD). SA requests are sent as MAD packets. As such, SA requests or responses may be silently dropped. IB Core's MAD layer has a timeout and retry mechanism, which amongst other, is used by RDMA CM. But it is not used by SA queries. The lack of retries of SA queries leads to long specified timeout, and error being returned in case of packet loss. The ULP or user-land process has to perform the retry. Fix this by taking advantage of the MAD layer's retry mechanism. First, a check against a zero timeout is added in rdma_resolve_route(). In send_mad(), we set the MAD layer timeout to one tenth of the specified timeout and the number of retries to 10. The special case when timeout is less than 10 is handled. With this fix: # ucmatose -c 1000 -S 1024 -C 1 runs stable on an Infiniband fabric. Without this fix, we see an intermittent behavior and it errors out with: cmatose: event: RDMA_CM_EVENT_ROUTE_ERROR, error: -110 (110 is ETIMEDOUT) Link: https://lore.kernel.org/r/1628784755-28316-1-git-send-email-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-25Merge remote-tracking branch 'regulator/for-5.15' into regulator-nextMark Brown
2021-08-25Merge remote-tracking branch 'regulator/for-5.14' into regulator-linusMark Brown
2021-08-25ceph: fix possible null-pointer dereference in ceph_mdsmap_decode()Tuo Li
kcalloc() is called to allocate memory for m->m_info, and if it fails, ceph_mdsmap_destroy() behind the label out_err will be called: ceph_mdsmap_destroy(m); In ceph_mdsmap_destroy(), m->m_info is dereferenced through: kfree(m->m_info[i].export_targets); To fix this possible null-pointer dereference, check m->m_info before the for loop to free m->m_info[i].export_targets. [ jlayton: fix up whitespace damage only kfree(m->m_info) if it's non-NULL ] Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Tuo Li <islituo@gmail.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-08-25ceph: correctly handle releasing an embedded cap flushXiubo Li
The ceph_cap_flush structures are usually dynamically allocated, but the ceph_cap_snap has an embedded one. When force umounting, the client will try to remove all the session caps. During this, it will free them, but that should not be done with the ones embedded in a capsnap. Fix this by adding a new boolean that indicates that the cap flush is embedded in a capsnap, and skip freeing it if that's set. At the same time, switch to using list_del_init() when detaching the i_list and g_list heads. It's possible for a forced umount to remove these objects but then handle_cap_flushsnap_ack() races in and does the list_del_init() again, corrupting memory. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/52283 Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-08-25cachefiles: Use file_inode() rather than accessing ->f_inodeDavid Howells
Use the file_inode() helper rather than accessing ->f_inode directly. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/162431192403.2908479.4590814090994846904.stgit@warthog.procyon.org.uk/
2021-08-25netfs: Move cookie debug ID to struct netfs_cache_resourcesDavid Howells
Move the cookie debug ID from struct netfs_read_request to struct netfs_cache_resources and drop the 'cookie_' prefix. This makes it available for things that want to use netfs_cache_resources without having a netfs_read_request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/162431190784.2908479.13386972676539789127.stgit@warthog.procyon.org.uk/
2021-08-25fscache: Select netfs stats if fscache stats are enabledDavid Howells
Unconditionally select the stats produced by the netfs lib if fscache stats are enabled as the former are displayed in the latter's procfile. Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-cachefs@redhat.com Reviewed-by: Jeff Layton <jlayton@redhat.com> Link: https://lore.kernel.org/r/162280352566.3319242.10615341893991206961.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/162431189627.2908479.9165349423842063755.stgit@warthog.procyon.org.uk/
2021-08-25erofs: fix double free of 'copied'Gao Xiang
Dan reported a new smatch warning [1] "fs/erofs/inode.c:210 erofs_read_inode() error: double free of 'copied'" Due to new chunk-based format handling logic, the error path can be called after kfree(copied). Set "copied = NULL" after kfree(copied) to fix this. [1] https://lore.kernel.org/r/202108251030.bELQozR7-lkp@intel.com Link: https://lore.kernel.org/r/20210825120757.11034-1-hsiangkao@linux.alibaba.com Fixes: c5aa903a59db ("erofs: support reading chunk-based uncompressed files") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-08-25locking/rtmutex: Dequeue waiter on ww_mutex deadlockThomas Gleixner
The rt_mutex based ww_mutex variant queues the new waiter first in the lock's rbtree before evaluating the ww_mutex specific conditions which might decide that the waiter should back out. This check and conditional exit happens before the waiter is enqueued into the PI chain. The failure handling at the call site assumes that the waiter, if it is the top most waiter on the lock, is queued in the PI chain and then proceeds to adjust the unmodified PI chain, which results in RB tree corruption. Dequeue the waiter from the lock waiter list in the ww_mutex error exit path to prevent this. Fixes: add461325ec5 ("locking/rtmutex: Extend the rtmutex core to support ww_mutex") Reported-by: Sebastian Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210825102454.042280541@linutronix.de
2021-08-25locking/rtmutex: Dont dereference waiter locklessThomas Gleixner
The new rt_mutex_spin_on_onwer() loop checks whether the spinning waiter is still the top waiter on the lock by utilizing rt_mutex_top_waiter(), which is broken because that function contains a sanity check which dereferences the top waiter pointer to check whether the waiter belongs to the lock. That's wrong in the lockless spinwait case: CPU 0 CPU 1 rt_mutex_lock(lock) rt_mutex_lock(lock); queue(waiter0) waiter0 == rt_mutex_top_waiter(lock) rt_mutex_spin_on_onwer(lock, waiter0) { queue(waiter1) waiter1 == rt_mutex_top_waiter(lock) ... top_waiter = rt_mutex_top_waiter(lock) leftmost = rb_first_cached(&lock->waiters); -> signal dequeue(waiter1) destroy(waiter1) w = rb_entry(leftmost, ....) BUG_ON(w->lock != lock) <- UAF The BUG_ON() is correct for the case where the caller holds lock->wait_lock which guarantees that the leftmost waiter entry cannot vanish. For the lockless spinwait case it's broken. Create a new helper function which avoids the pointer dereference and just compares the leftmost entry pointer with current's waiter pointer to validate that currrent is still elegible for spinning. Fixes: 992caf7f1724 ("locking/rtmutex: Add adaptive spinwait mechanism") Reported-by: Sebastian Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210825102453.981720644@linutronix.de
2021-08-25perf/x86/intel/pt: Fix mask of num_address_rangesXiaoyao Li
Per SDM, bit 2:0 of CPUID(0x14,1).EAX[2:0] reports the number of configurable address ranges for filtering, not bit 1:0. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: https://lkml.kernel.org/r/20210824040622.4081502-1-xiaoyao.li@intel.com
2021-08-25regulator: vctrl: Avoid lockdep warning in enable/disable opsChen-Yu Tsai
vctrl_enable() and vctrl_disable() call regulator_enable() and regulator_disable(), respectively. However, vctrl_* are regulator ops and should not be calling the locked regulator APIs. Doing so results in a lockdep warning. Instead of exporting more internal regulator ops, model the ctrl supply as an actual supply to vctrl-regulator. At probe time this driver still needs to use the consumer API to fetch its constraints, but otherwise lets the regulator core handle the upstream supply for it. The enable/disable/is_enabled ops are not removed, but now only track state internally. This preserves the original behavior with the ops being available, but one could argue that the original behavior was already incorrect: the internal state would not match the upstream supply if that supply had another consumer that enabled the supply, while vctrl-regulator was not enabled. The lockdep warning is as follows: WARNING: possible circular locking dependency detected 5.14.0-rc6 #2 Not tainted ------------------------------------------------------ swapper/0/1 is trying to acquire lock: ffffffc011306d00 (regulator_list_mutex){+.+.}-{3:3}, at: regulator_lock_dependent (arch/arm64/include/asm/current.h:19 include/linux/ww_mutex.h:111 drivers/regulator/core.c:329) but task is already holding lock: ffffff8004a77160 (regulator_ww_class_mutex){+.+.}-{3:3}, at: regulator_lock_recursive (drivers/regulator/core.c:156 drivers/regulator/core.c:263) which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (regulator_ww_class_mutex){+.+.}-{3:3}: __mutex_lock_common (include/asm-generic/atomic-instrumented.h:606 include/asm-generic/atomic-long.h:29 kernel/locking/mutex.c:103 kernel/locking/mutex.c:144 kernel/locking/mutex.c:963) ww_mutex_lock (kernel/locking/mutex.c:1199) regulator_lock_recursive (drivers/regulator/core.c:156 drivers/regulator/core.c:263) regulator_lock_dependent (drivers/regulator/core.c:343) regulator_enable (drivers/regulator/core.c:2808) set_machine_constraints (drivers/regulator/core.c:1536) regulator_register (drivers/regulator/core.c:5486) devm_regulator_register (drivers/regulator/devres.c:196) reg_fixed_voltage_probe (drivers/regulator/fixed.c:289) platform_probe (drivers/base/platform.c:1427) [...] -> #1 (regulator_ww_class_acquire){+.+.}-{0:0}: regulator_lock_dependent (include/linux/ww_mutex.h:129 drivers/regulator/core.c:329) regulator_enable (drivers/regulator/core.c:2808) set_machine_constraints (drivers/regulator/core.c:1536) regulator_register (drivers/regulator/core.c:5486) devm_regulator_register (drivers/regulator/devres.c:196) reg_fixed_voltage_probe (drivers/regulator/fixed.c:289) [...] -> #0 (regulator_list_mutex){+.+.}-{3:3}: __lock_acquire (kernel/locking/lockdep.c:3052 (discriminator 4) kernel/locking/lockdep.c:3174 (discriminator 4) kernel/locking/lockdep.c:3789 (discriminator 4) kernel/locking/lockdep.c:5015 (discriminator 4)) lock_acquire (arch/arm64/include/asm/percpu.h:39 kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5627) __mutex_lock_common (include/asm-generic/atomic-instrumented.h:606 include/asm-generic/atomic-long.h:29 kernel/locking/mutex.c:103 kernel/locking/mutex.c:144 kernel/locking/mutex.c:963) mutex_lock_nested (kernel/locking/mutex.c:1125) regulator_lock_dependent (arch/arm64/include/asm/current.h:19 include/linux/ww_mutex.h:111 drivers/regulator/core.c:329) regulator_enable (drivers/regulator/core.c:2808) vctrl_enable (drivers/regulator/vctrl-regulator.c:400) _regulator_do_enable (drivers/regulator/core.c:2617) _regulator_enable (drivers/regulator/core.c:2764) regulator_enable (drivers/regulator/core.c:308 drivers/regulator/core.c:2809) _set_opp (drivers/opp/core.c:819 drivers/opp/core.c:1072) dev_pm_opp_set_rate (drivers/opp/core.c:1164) set_target (drivers/cpufreq/cpufreq-dt.c:62) __cpufreq_driver_target (drivers/cpufreq/cpufreq.c:2216 drivers/cpufreq/cpufreq.c:2271) cpufreq_online (drivers/cpufreq/cpufreq.c:1488 (discriminator 2)) cpufreq_add_dev (drivers/cpufreq/cpufreq.c:1563) subsys_interface_register (drivers/base/bus.c:?) cpufreq_register_driver (drivers/cpufreq/cpufreq.c:2819) dt_cpufreq_probe (drivers/cpufreq/cpufreq-dt.c:344) [...] other info that might help us debug this: Chain exists of: regulator_list_mutex --> regulator_ww_class_acquire --> regulator_ww_class_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(regulator_ww_class_mutex); lock(regulator_ww_class_acquire); lock(regulator_ww_class_mutex); lock(regulator_list_mutex); *** DEADLOCK *** 6 locks held by swapper/0/1: #0: ffffff8002d32188 (&dev->mutex){....}-{3:3}, at: __device_driver_lock (drivers/base/dd.c:1030) #1: ffffffc0111a0520 (cpu_hotplug_lock){++++}-{0:0}, at: cpufreq_register_driver (drivers/cpufreq/cpufreq.c:2792 (discriminator 2)) #2: ffffff8002a8d918 (subsys mutex#9){+.+.}-{3:3}, at: subsys_interface_register (drivers/base/bus.c:1033) #3: ffffff800341bb90 (&policy->rwsem){+.+.}-{3:3}, at: cpufreq_online (include/linux/bitmap.h:285 include/linux/cpumask.h:405 drivers/cpufreq/cpufreq.c:1399) #4: ffffffc011f0b7b8 (regulator_ww_class_acquire){+.+.}-{0:0}, at: regulator_enable (drivers/regulator/core.c:2808) #5: ffffff8004a77160 (regulator_ww_class_mutex){+.+.}-{3:3}, at: regulator_lock_recursive (drivers/regulator/core.c:156 drivers/regulator/core.c:263) stack backtrace: CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.14.0-rc6 #2 7c8f8996d021ed0f65271e6aeebf7999de74a9fa Hardware name: Google Scarlet (DT) Call trace: dump_backtrace (arch/arm64/kernel/stacktrace.c:161) show_stack (arch/arm64/kernel/stacktrace.c:218) dump_stack_lvl (lib/dump_stack.c:106 (discriminator 2)) dump_stack (lib/dump_stack.c:113) print_circular_bug (kernel/locking/lockdep.c:?) check_noncircular (kernel/locking/lockdep.c:?) __lock_acquire (kernel/locking/lockdep.c:3052 (discriminator 4) kernel/locking/lockdep.c:3174 (discriminator 4) kernel/locking/lockdep.c:3789 (discriminator 4) kernel/locking/lockdep.c:5015 (discriminator 4)) lock_acquire (arch/arm64/include/asm/percpu.h:39 kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5627) __mutex_lock_common (include/asm-generic/atomic-instrumented.h:606 include/asm-generic/atomic-long.h:29 kernel/locking/mutex.c:103 kernel/locking/mutex.c:144 kernel/locking/mutex.c:963) mutex_lock_nested (kernel/locking/mutex.c:1125) regulator_lock_dependent (arch/arm64/include/asm/current.h:19 include/linux/ww_mutex.h:111 drivers/regulator/core.c:329) regulator_enable (drivers/regulator/core.c:2808) vctrl_enable (drivers/regulator/vctrl-regulator.c:400) _regulator_do_enable (drivers/regulator/core.c:2617) _regulator_enable (drivers/regulator/core.c:2764) regulator_enable (drivers/regulator/core.c:308 drivers/regulator/core.c:2809) _set_opp (drivers/opp/core.c:819 drivers/opp/core.c:1072) dev_pm_opp_set_rate (drivers/opp/core.c:1164) set_target (drivers/cpufreq/cpufreq-dt.c:62) __cpufreq_driver_target (drivers/cpufreq/cpufreq.c:2216 drivers/cpufreq/cpufreq.c:2271) cpufreq_online (drivers/cpufreq/cpufreq.c:1488 (discriminator 2)) cpufreq_add_dev (drivers/cpufreq/cpufreq.c:1563) subsys_interface_register (drivers/base/bus.c:?) cpufreq_register_driver (drivers/cpufreq/cpufreq.c:2819) dt_cpufreq_probe (drivers/cpufreq/cpufreq-dt.c:344) [...] Reported-by: Brian Norris <briannorris@chromium.org> Fixes: f8702f9e4aa7 ("regulator: core: Use ww_mutex for regulators locking") Fixes: e9153311491d ("regulator: vctrl-regulator: Avoid deadlock getting and setting the voltage") Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Link: https://lore.kernel.org/r/20210825033704.3307263-3-wenst@chromium.org Signed-off-by: Mark Brown <broonie@kernel.org>
2021-08-25regulator: vctrl: Use locked regulator_get_voltage in probe pathChen-Yu Tsai
In commit e9153311491d ("regulator: vctrl-regulator: Avoid deadlock getting and setting the voltage"), all calls to get/set the voltage of the control regulator were switched to unlocked versions to avoid deadlocks. However, the call in the probe path is done without regulator locks held. In this case the locked version should be used. Switch back to the locked regulator_get_voltage() in the probe path to avoid any mishaps. Fixes: e9153311491d ("regulator: vctrl-regulator: Avoid deadlock getting and setting the voltage") Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Link: https://lore.kernel.org/r/20210825033704.3307263-2-wenst@chromium.org Signed-off-by: Mark Brown <broonie@kernel.org>
2021-08-25ASoC: imx-rpmsg: change dev_err to dev_err_probe for -EPROBE_DEFERShengjiu Wang
Change dev_err to dev_err_probe for no need print error message when defer probe happens. Fixes: 39f8405c3e50 ("ASoC: imx-rpmsg: Add machine driver for audio base on rpmsg") Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com> Link: https://lore.kernel.org/r/1629875681-16373-1-git-send-email-shengjiu.wang@nxp.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-08-25ASoC: rt5682: Fix the vol+ button detection issueDerek Fang
Fix the wrong button vol+ detection issue with some brand headsets by fine tuning the threshold of button vol+ and SAR ADC button accuracy. Signed-off-by: Derek Fang <derek.fang@realtek.com> Link: https://lore.kernel.org/r/20210825040346.28346-1-derek.fang@realtek.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-08-25ASoC: Intel: bytcr_rt5640: Make rt5640_jack_gpio/rt5640_jack2_gpio staticPeter Ujfalusi
Marking the two jack gpio as static fixes the following Sparse errors: sound/soc/intel/boards/bytcr_rt5640.c:468:26: error: symbol 'rt5640_jack_gpio' was not declared. Should it be static? sound/soc/intel/boards/bytcr_rt5640.c:475:26: error: symbol 'rt5640_jack2_gpio' was not declared. Should it be static? Fixes: 9ba00856686ad ("ASoC: Intel: bytcr_rt5640: Add support for HP Elite Pad 1000G2 jack-detect") Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Link: https://lore.kernel.org/r/20210825122519.3364-1-peter.ujfalusi@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2021-08-25Merge branch 'for-5.14' of ↵Mark Brown
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into asoc-5.15
2021-08-25Revert "btrfs: compression: don't try to compress if we don't have enough pages"Qu Wenruo
This reverts commit f2165627319ffd33a6217275e5690b1ab5c45763. [BUG] It's no longer possible to create compressed inline extent after commit f2165627319f ("btrfs: compression: don't try to compress if we don't have enough pages"). [CAUSE] For compression code, there are several possible reasons we have a range that needs to be compressed while it's no more than one page. - Compressed inline write The data is always smaller than one sector and the test lacks the condition to properly recognize a non-inline extent. - Compressed subpage write For the incoming subpage compressed write support, we require page alignment of the delalloc range. And for 64K page size, we can compress just one page into smaller sectors. For those reasons, the requirement for the data to be more than one page is not correct, and is already causing regression for compressed inline data writeback. The idea of skipping one page to avoid wasting CPU time could be revisited in the future. [FIX] Fix it by reverting the offending commit. Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Link: https://lore.kernel.org/linux-btrfs/afa2742.c084f5d6.17b6b08dffc@tnonline.net Fixes: f2165627319f ("btrfs: compression: don't try to compress if we don't have enough pages") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-25sg: pass the device name to blk_trace_setupChristoph Hellwig
Fix a regression that passed a NULL device name to blk_trace_setup accidentally. Fixes: aebbb5831fbd ("sg: do not allocate a gendisk") Reported-by: syzbot+f74aa89114a236643919@syzkaller.appspotmail.com Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Link: https://lore.kernel.org/r/20210825075438.1883687-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-25block, bfq: cleanup the repeated declarationShaokun Zhang
Function 'bfq_entity_to_bfqq' is declared twice, so remove the repeated declaration and blank line. Cc: Paolo Valente <paolo.valente@linaro.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Link: https://lore.kernel.org/r/1629872391-46399-1-git-send-email-zhangshaokun@hisilicon.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-25blk-crypto: fix check for too-large dun_bytesEric Biggers
dun_bytes needs to be less than or equal to the IV size of the encryption mode, not just less than or equal to BLK_CRYPTO_MAX_IV_SIZE. Currently this doesn't matter since blk_crypto_init_key() is never actually passed invalid values, but we might as well fix this. Fixes: a892c8d52c02 ("block: Inline encryption support for blk-mq") Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20210825055918.51975-1-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-25Merge branch 'pktgen-samples-next'David S. Miller
Juhee Kang says: ==================== samples: pktgen: enhance the ability to print the execution results of samples This patch series improves the ability to print the execution result of pktgen samples by adding a line which calls the function before termination and adding trap SIGINT. Also, this series documents the latest pktgen usage options. Currently, pktgen samples print the execution result when terminated usually. However, sample03 is not working properly. This is results of sample04 and sample03: # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample04_many_flows.sh -n 1 Running... ctrl^C to stop Device: eth0@0 Result: OK: 19(c5+d13) usec, 1 (60byte,0frags) 51762pps 24Mb/sec (24845760bps) errors: 0 # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample03_burst_single_flow.sh -n 1 Running... ctrl^C to stop Because sample03 doesn't call the function which prints the execution result when terminated normally, unlike other samples. So the first commit solves this issue by adding a line which calls the function before termination. Also, all pktgen samples are able to send infinite messages per thread by setting the count option to 0, and pktgen is stopped by Ctrl-C. However, the sample besides sample{3...5} don't work appropriately because Ctrl-C stops the script, not just pktgen. This is results of samples: # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample04_many_flows.sh -n 0 Running... ctrl^C to stop ^CDevice: eth0@0 Result: OK: 569657(c569538+d118) usec, 84650 (60byte,0frags) 148597pps 71Mb/sec (71326560bps) errors: 0 # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample01_simple.sh -n 0 Running... ctrl^C to stop ^C # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample02_multiqueue.sh -n 0 Running... ctrl^C to stop ^C # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample06_numa_awared_queue_irq_affinity.sh -n 0 Running... ctrl^C to stop ^C # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_bench_xmit_mode_netif_receive.sh -n 0 Running... ctrl^C to stop ^C # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_bench_xmit_mode_queue_xmit.sh -n 0 Running... ctrl^C to stop ^C So the second commit solves this issue by adding trap SIGINT. Also, changes control_c function to print_results to maintain consistency with other samples on the first commit and second commit. And current pktgen.rst documentation doesn't add the latest pktgen sample usage options such as count and IPv6, and so on. Also, the old pktgen sample scripts are still included in the document. The old scripts were removed by the commit a4b6ade8359f ("samples/pktgen: remove remaining old pktgen sample scripts"). Thus, the last commit documents the latest pktgen sample usage and removes old sample scripts. And fixes a minor typo. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25pktgen: document the latest pktgen usage optionsJuhee Kang
Currently, the pktgen.rst documentation doesn't cover the latest pktgen sample usage options such as count and IPv6, and so on. Also, this documentation includes the old sample scripts which are no longer use because it was removed by the commit a4b6ade8359f ("samples/pktgen : remove remaining old pktgen sample scripts") Thus, this commit documents pktgen sample usage using the latest options and removes old sample scripts, and fixes a minor typo. Signed-off-by: Juhee Kang <claudiajkang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25samples: pktgen: add trap SIGINT for printing execution resultJuhee Kang
All pktgen samples can send indefinitely num messages per thread by setting the count option to 0(-n 0). If running sample with setting count 0 and press Ctrl-C to stop this program, the program prints the result of the execution so far. Currently, the samples besides sample{3...5} don't work properly. Because Ctrl-C stops the script, not just pktgen. This is results of samples: # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample04_many_flows.sh -n 0 Running... ctrl^C to stop ^CDevice: eth0@0 Result: OK: 569657(c569538+d118) usec, 84650 (60byte,0frags) 148597pps 71Mb/sec (71326560bps) errors: 0 # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample01_simple.sh -n 0 Running... ctrl^C to stop ^C In order to solve this, this commit adds trap SIGINT. Also, this commit changes control_c function to print_result to maintain consistency with other samples. Signed-off-by: Juhee Kang <claudiajkang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25samples: pktgen: fix to print when terminated normallyJuhee Kang
Currently, most pktgen samples print the execution result when the program is terminated normally. However, sample03 doesn't work appropriately. This is results of samples: # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample04_many_flows.sh -n 1 Running... ctrl^C to stop Device: eth0@0 Result: OK: 19(c5+d13) usec, 1 (60byte,0frags) 51762pps 24Mb/sec (24845760bps) errors: 0 # DEV=eth0 DEST_IP=10.1.0.1 DST_MAC=00:11:22:33:44:55 ./pktgen_sample03_burst_single_flow.sh -n 1 Running... ctrl^C to stop The reason why it doesn't print the execution result when the program is terminated usually is that sample03 doesn't call the function which prints the result, unlike other samples. So, this commit solves this issue by calling the function before termination. Also, this commit changes control_c function to print_result to maintain consistency with other samples. Signed-off-by: Juhee Kang <claudiajkang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25Merge branch 'octeontx2-traffic-shaping'David S. Miller
Sunil Goutham says: ==================== Octeontx2: Traffic shaping and SDP link config support This patch series adds support for traffic shaping configuration on all silicons available after 96xx C0. And also adds SDP link related configuration needed when Octeon is connected as an end-point and traffic needs to flow from end-point to host and vice versa. Series also has other changes like - New mbox messages in admin function driver for PF/VF drivers to retrieve available HW resource count. HW resources like block LFs, bandwidth profiles etc are covered. - Added PTP device ID for new CN10K and 95O silicons. - etc ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-25octeontx2-af: Add mbox to retrieve bandwidth profile free countSunil Goutham
Added mbox for PF/VF drivers to retrieve current ingress bandwidth profile free count. Also added current policer timeunit configuration info based on which ratelimiting decisions can be taken by PF/VF drivers. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>