summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-06-16arm64/fpsimd: Exit streaming mode when flushing tasksMark Brown
Ensure there is no path where we might attempt to save SME state after we flush a task by updating the SVCR register state as well as updating our in memory state. I haven't seen a specific case where this is happening or seen a path where it might happen but for the cost of a single low overhead instruction it seems sensible to close the potential gap. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230607-arm64-flush-svcr-v2-1-827306001841@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-16PM: domains: Move the verification of in-params from genpd_add_device()Ulf Hansson
Commit f38d1a6d0025 ("PM: domains: Allocate governor data dynamically based on a genpd governor") started to use the in-parameters in genpd_add_device(), without first doing a verification of them. This isn't really a big problem, as most callers do a verification already. Therefore, let's drop the verification from genpd_add_device() and make sure all the callers take care of it instead. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: f38d1a6d0025 ("PM: domains: Allocate governor data dynamically based on a genpd governor") Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-16cpufreq: amd-pstate: Make amd-pstate EPP driver name hyphenatedWyes Karny
amd-pstate passive mode driver is hyphenated. So make amd-pstate active mode driver consistent with that rename "amd_pstate_epp" to "amd-pstate-epp". Fixes: ffa5096a7c33 ("cpufreq: amd-pstate: implement Pstate EPP support for the AMD processors") Cc: All applicable <stable@vger.kernel.org> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Wyes Karny <wyes.karny@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Perry Yuan <Perry.Yuan@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-16cpufreq: amd-pstate: Write CPPC enable bit per-socketWyes Karny
Currently amd_pstate sets CPPC enable bit in MSR_AMD_CPPC_ENABLE only for the CPU where the module_init happened. But MSR_AMD_CPPC_ENABLE is per-socket. This causes CPPC enable bit to set for only one socket for servers with more than one physical packages. To fix this write MSR_AMD_CPPC_ENABLE per-socket. Also, handle duplicate calls for cppc_enable, because it's called from per-policy/per-core callbacks and can result in duplicate MSR writes. Before the fix: amd@amd:~$ sudo rdmsr -a 0xc00102b1 | uniq --count 192 0 192 1 After the fix: amd@amd:~$ sudo rdmsr -a 0xc00102b1 | uniq --count 384 1 Suggested-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Wyes Karny <wyes.karny@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-16intel_idle: Add support for using intel_idle in a VM guest using just hltArjan van de Ven
In a typical VM guest, the mwait instruction is not available, leaving only the 'hlt' instruction (which causes a VMEXIT to the host). So for this common case, intel_idle will detect the lack of mwait, and fail to initialize (after which another idle method would step in which will just use hlt always). Other (non-common) cases exist; the table below shows the before/after for these: +------------+--------------------------+-------------------------+ | Hypervisor | Idle method before patch | Idle method after patch | | exposes | | | +============+==========================+=========================+ | nothing | default_idle fallback | intel_idle VM table | | (common) | (straight "hlt") | | +------------+--------------------------+-------------------------+ | mwait | intel_idle mwait table | intel_idle mwait table | +------------+--------------------------+-------------------------+ | ACPI | ACPI C1 state ("hlt") | intel_idle VM table | +------------+--------------------------+-------------------------+ This is only applicable to CPUs known by intel_idle. For the bare metal case, unknown CPU models will use the ACPI tables (when available) to get estimates for latency and break even point for longer idle states. In guests, the common case is that ACPI tables are not available, but even when they are available, they can't and don't provide the latency information for the longer (mwait based) states. For this scenario (unknown CPU model), the default_idle mode (no ACPI) or ACPI C1 (ACPI avaible) will be used. By providing capability to do this with the intel_idle driver, we can do better than the fallback or ACPI table methods. While this current change only gets us to the existing behavior, later patches in this series will add new capabilities such as optimized TLB flushing. In order to do this, a simplified version of the initialization function for VM guests is created, and this will be called if the CPU is recognized, but mwait is not supported, and we're in a VM guest. One thing to note is that the max latency (and break even) of this C1 state is higher than the typical bare metal C1 state. Because hlt causes a vmexit, and the cost of vmexit + hypervisor overhead + vmenter is typically in the order of upto 5 microseconds... even if the hypervisor does not actually goes into a hardware power saving state. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> [ rjw: Dropped redundant checks from should_verify_mwait() ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-16blk-mq: fix NULL dereference on q->elevator in blk_mq_elv_switch_noneMing Lei
After grabbing q->sysfs_lock, q->elevator may become NULL because of elevator switch. Fix the NULL dereference on q->elevator by checking it with lock. Reported-by: Guangwu Zhang <guazhang@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230616132354.415109-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-16iov_iter: remove iov_iter_get_pages and iov_iter_get_pages_allocChristoph Hellwig
Now that the direct I/O helpers have switched to use iov_iter_extract_pages, these helpers are unused. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20230614140341.521331-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-16block: remove BIO_PAGE_REFFEDChristoph Hellwig
Now that all block direct I/O helpers use page pinning, this flag is unused. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20230614140341.521331-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-16splice: simplify a conditional in copy_splice_readChristoph Hellwig
Check for -EFAULT instead of wrapping the check in an ret < 0 block. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20230614140341.521331-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-16splice: don't call file_accessed in copy_splice_readChristoph Hellwig
copy_splice_read calls into ->read_iter to read the data, which already calls file_accessed. Fixes: 33b3b041543e ("splice: Add a func to do a splice from an O_DIRECT file without ITER_PIPE") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20230614140341.521331-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-16Merge tag 'nvme-6.5-2023-06-16' of git://git.infradead.org/nvme into ↵Jens Axboe
for-6.5/block Pull NVMe updates from Keith: "nvme updates for Linux 6.5 - Various cleanups all around (Irvin, Chaitanya, Christophe) - Better struct packing (Christophe JAILLET) - Reduce controller error logs for optional commands (Keith) - Support for >=64KiB block sizes (Daniel Gomez) - Fabrics fixes and code organization (Max, Chaitanya, Daniel Wagner)" * tag 'nvme-6.5-2023-06-16' of git://git.infradead.org/nvme: (27 commits) nvme: forward port sysfs delete fix nvme: skip optional id ctrl csi if it failed nvme-core: use nvme_ns_head_multipath instead of ns->head->disk nvmet-fcloop: Do not wait on completion when unregister fails nvme-fabrics: open code __nvmf_host_find() nvme-fabrics: error out to unlock the mutex nvme: Increase block size variable size to 32-bit nvme-fcloop: no need to return from void function nvmet-auth: remove unnecessary break after goto nvmet-auth: remove some dead code nvme-core: remove redundant check from nvme_init_ns_head nvme: move sysfs code to a dedicated sysfs.c file nvme-fabrics: prevent overriding of existing host nvme-fabrics: check hostid using uuid_equal nvme-fabrics: unify common code in admin and io queue connect nvmet: reorder fields in 'struct nvmefc_fcp_req' nvmet: reorder fields in 'struct nvme_dhchap_queue_context' nvmet: reorder fields in 'struct nvmf_ctrl_options' nvme: reorder fields in 'struct nvme_ctrl' nvmet: reorder fields in 'struct nvmet_sq' ...
2023-06-16x86/unwind/orc: Add ELF section with ORC version identifierOmar Sandoval
Commits ffb1b4a41016 ("x86/unwind/orc: Add 'signal' field to ORC metadata") and fb799447ae29 ("x86,objtool: Split UNWIND_HINT_EMPTY in two") changed the ORC format. Although ORC is internal to the kernel, it's the only way for external tools to get reliable kernel stack traces on x86-64. In particular, the drgn debugger [1] uses ORC for stack unwinding, and these format changes broke it [2]. As the drgn maintainer, I don't care how often or how much the kernel changes the ORC format as long as I have a way to detect the change. It suffices to store a version identifier in the vmlinux and kernel module ELF files (to use when parsing ORC sections from ELF), and in kernel memory (to use when parsing ORC from a core dump+symbol table). Rather than hard-coding a version number that needs to be manually bumped, Peterz suggested hashing the definitions from orc_types.h. If there is a format change that isn't caught by this, the hashing script can be updated. This patch adds an .orc_header allocated ELF section containing the 20-byte hash to vmlinux and kernel modules, along with the corresponding __start_orc_header and __stop_orc_header symbols in vmlinux. 1: https://github.com/osandov/drgn 2: https://github.com/osandov/drgn/issues/303 Fixes: ffb1b4a41016 ("x86/unwind/orc: Add 'signal' field to ORC metadata") Fixes: fb799447ae29 ("x86,objtool: Split UNWIND_HINT_EMPTY in two") Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lkml.kernel.org/r/aef9c8dc43915b886a8c48509a12ec1b006ca1ca.1686690801.git.osandov@osandov.com
2023-06-16nvme: forward port sysfs delete fixKeith Busch
We had a late fix that modified nvme_sysfs_delete() after the staging branch for the next merge window relocated the function to a new file. Port commit 2eb94dd56a4a4 ("nvme: do not let the user delete a ctrl before a complete") to the latest to avoid a potentially confusing merge conflict. Cc: Maurizio Lombardi <mlombard@redhat.com> Cc: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-06-16sched/wait: Fix a kthread_park race with wait_woken()Arve Hjønnevåg
kthread_park and wait_woken have a similar race that kthread_stop and wait_woken used to have before it was fixed in commit cb6538e740d7 ("sched/wait: Fix a kthread race with wait_woken()"). Extend that fix to also cover kthread_park. [jstultz: Made changes suggested by Peter to optimize memory loads] Signed-off-by: Arve Hjønnevåg <arve@android.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230602212350.535358-1-jstultz@google.com
2023-06-16sched/topology: Mark set_sched_topology() __initMiaohe Lin
All callers of set_sched_topology() are within __init section. Mark it __init too. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Link: https://lore.kernel.org/r/20230603073645.1173332-1-linmiaohe@huawei.com
2023-06-16sched/fair: Rename variable cpu_util eff_utilTom Rix
cppcheck reports kernel/sched/fair.c:7436:17: style: Local variable 'cpu_util' shadows outer function [shadowFunction] unsigned long cpu_util; ^ Clean this up by renaming the variable to eff_util Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/20230611122535.183654-1-trix@redhat.com
2023-06-16perf/x86/intel: Fix the FRONTEND encoding on GNR and MTLKan Liang
When counting a FRONTEND event, the MSR_PEBS_FRONTEND is not correctly set on GNR and MTL p-core. The umask value for the FRONTEND events is changed on GNR and MTL. The new umask is missing in the extra_regs[] table. Add a dedicated intel_gnr_extra_regs[] for GNR and MTL p-core. Fixes: bc4000fdb009 ("perf/x86/intel: Add Granite Rapids") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20230615173242.3726364-1-kan.liang@linux.intel.com
2023-06-16perf/core: Drop __weak attribute from arch_perf_update_userpage() prototypeMarc Zyngier
Reiji reports that the arm64 implementation of arch_perf_update_userpage() is now ignored and replaced by the dummy stub in core code. This seems to happen since the PMUv3 driver was moved to driver/perf. As it turns out, dropping the __weak attribute from the *prototype* of the function solves the problem. You're right, this doesn't seem to make much sense. And yet... It appears that both symbols get flagged as weak, and that the first one to appear in the link order wins: $ nm drivers/perf/arm_pmuv3.o|grep arch_perf_update_userpage 0000000000001db0 W arch_perf_update_userpage Dropping the attribute from the prototype restores the expected behaviour, and arm64 is able to enjoy arch_perf_update_userpage() again. Fixes: 7755cec63ade ("arm64: perf: Move PMUv3 driver to drivers/perf") Fixes: f1ec3a517b43 ("kernel/events: Add a missing prototype for arch_perf_update_userpage()") Reported-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Reiji Watanabe <reijiw@google.com> Link: https://lkml.kernel.org/r/20230616114831.3186980-1-maz@kernel.org
2023-06-16locking/atomic: scripts: fix ${atomic}_dec_if_positive() kerneldocMark Rutland
The ${atomic}_dec_if_positive() ops are unlike all the other conditional atomic ops. Rather than returning a boolean success value, these return the value that the atomic variable would be updated to, even when no update is performed. We missed this when adding kerneldoc comments, and the documentation for ${atomic}_dec_if_positive() erroneously states: | Return: @true if @v was updated, @false otherwise. Ideally we'd clean this up by aligning ${atomic}_dec_if_positive() with the usual atomic op conventions: with ${atomic}_fetch_dec_if_positive() for those who care about the value of the varaible, and ${atomic}_dec_if_positive() returning a boolean success value. In the mean time, align the documentation with the current reality. Fixes: ad8110706f381170 ("locking/atomic: scripts: generate kerneldoc comments") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20230615132734.1119765-1-mark.rutland@arm.com
2023-06-16iommu/amd: Fix possible memory leak of 'domain'Su Hui
Move allocation code down to avoid memory leak. Fixes: 29f54745f245 ("iommu/amd: Add missing domain type checks") Signed-off-by: Su Hui <suhui@nfschina.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230608021933.856045-1-suhui@nfschina.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-06-16dt-bindings: Update Documentation/arm referencesJonathan Corbet
The Arm documentation has moved to Documentation/arch/arm; update one devicetree reference to match. Cc: Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org> Cc: devicetree@vger.kernel.org Acked-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-06-16docs: update some straggling Documentation/arm referencesJonathan Corbet
The Arm documentation has moved to Documentation/arch/arm; update the last remaining references to match. Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Chen-Yu Tsai <wens@csie.org> Cc: Samuel Holland <samuel@sholland.org> Cc: Thierry Reding <thierry.reding@gmail.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> # for pwm Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-06-16Documentation: KVM: make corrections to vcpu-requests.rstRandy Dunlap
Make corrections to punctuation and grammar. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Andrew Jones <drjones@redhat.com> Cc: Christoffer Dall <cdall@linaro.org> Cc: kvm@vger.kernel.org Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20230612030810.23376-5-rdunlap@infradead.org
2023-06-16Documentation: KVM: make corrections to ppc-pv.rstRandy Dunlap
Correct the path of a header file. Change "guest to ... guest" to "guest to ... host" in one place. Hyphenate "32-bit" systems. Add a comma at one parenthetical phrase. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: kvm@vger.kernel.org Cc: Alexander Graf <agraf@suse.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20230612030810.23376-4-rdunlap@infradead.org
2023-06-16Documentation: KVM: make corrections to locking.rstRandy Dunlap
Correct grammar and punctuation. Use "read-only" for consistency. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20230612030810.23376-3-rdunlap@infradead.org
2023-06-16Documentation: KVM: make corrections to halt-polling.rstRandy Dunlap
Module parameters are in sysfs, not debugfs, so change that. Remove superfluous "that" following "Note:". Hyphenate "system-wide" values. Hyphenate "trade-off". Don't treat "denial of service" as a verb. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: kvm@vger.kernel.org Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20230612030810.23376-2-rdunlap@infradead.org
2023-06-16Documentation: virt: correct location of haltpoll module paramsRandy Dunlap
Module parameters are located in sysfs, not debugfs, so correct the statement. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20230610054302.6223-1-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2023-06-16Documentation/mm: Initial page table documentationLinus Walleij
This is based on an earlier blog post at people.kernel.org, it describes the concepts about page tables that were hardest for me to grasp when dealing with them for the first time, such as the prevalent three-letter acronyms pfn, pgd, p4d, pud, pmd and pte. I don't know if this is what people want, but it's what I would have wanted. The wording, introduction, choice of initial subjects and choice of style is mine. I discussed at one point with Mike Rapoport to bring this into the kernel documentation, so here is a small proposal. The current form is augmented in response to feedback from Mike Rapoport, Matthew Wilcox, Jonathan Cameron, Kuan-Ying Lee, Randy Dunlap and Bagas Sanjaya. Cc: Matthew Wilcox <willy@infradead.org> Reviewed-by: Mike Rapoport <rppt@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://people.kernel.org/linusw/arm32-page-tables Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20230614072548.996940-1-linus.walleij@linaro.org
2023-06-16irqchip/loongson-eiointc: Fix irq affinity setting during resumeJianmin Lv
The hierarchy of PCH PIC, PCH PCI MSI and EIONTC is as following: PCH PIC ------->| |---->EIOINTC PCH PCI MSI --->| so the irq_data list of irq_desc for IRQs on PCH PIC and PCH PCI MSI is like this: irq_desc->irq_data(domain: PCH PIC)->parent_data(domain: EIOINTC) irq_desc->irq_data(domain: PCH PCI MSI)->parent_data(domain: EIOINTC) In eiointc_resume(), the irq_data passed into eiointc_set_irq_affinity() should be matched to EIOINTC domain instead of PCH PIC or PCH PCI MSI domain, so fix it. Fixes: a90335c2dfb4 ("irqchip/loongson-eiointc: Add suspend/resume support") Reported-by: yangqiming <yangqiming@loongson.cn> Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230614115936.5950-6-lvjianmin@loongson.cn
2023-06-16irqchip/loongson-liointc: Add IRQCHIP_SKIP_SET_WAKE flagYinbo Zhu
LIOINTC doesn't require specific logic to work with wakeup IRQs, and no irq_set_wake callback is needed. To allow registered IRQs from LIOINTC to be used as a wakeup-source, and ensure irq_set_irq_wake() works well, the flag IRQCHIP_SKIP_SET_WAKE should be added. Reviewed-by: Huacai Chen <chenhuacai@kernel.org> Signed-off-by: Yinbo Zhu <zhuyinbo@loongson.cn> Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230614115936.5950-5-lvjianmin@loongson.cn
2023-06-16irqchip/loongson-liointc: Fix IRQ trigger polarityJianmin Lv
For the INT_POLARITY register of Loongson-2K series IRQ controller, '0' indicates high level or rising edge triggered, '1' indicates low level or falling edge triggered, and we can find out the information from the Loongson 2K1000LA User Manual v1.0, Table 9-2, Section 9.3 (中断寄存器描述 / Description of the Interrupt Registers). For Loongson-3 CPU series, setting INT_POLARITY register is not supported and writting it has no effect. So trigger polarity setting shouled be fixed for Loongson-2K CPU series. Fixes: 17343d0b4039 ("irqchip/loongson-liointc: Support to set IRQ type for ACPI path") Cc: stable@vger.kernel.org Reviewed-by: Huacai Chen <chenhuacai@kernel.org> Co-developed-by: Chong Qiao <qiaochong@loongson.cn> Signed-off-by: Chong Qiao <qiaochong@loongson.cn> Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230614115936.5950-4-lvjianmin@loongson.cn
2023-06-16irqchip/loongson-pch-pic: Fix potential incorrect hwirq assignmentLiu Peibao
In DeviceTree path, when ht_vec_base is not zero, the hwirq of PCH PIC will be assigned incorrectly. Because when pch_pic_domain_translate() adds the ht_vec_base to hwirq, the hwirq does not have the ht_vec_base subtracted when calling irq_domain_set_info(). The ht_vec_base is designed for the parent irq chip/domain of the PCH PIC. It seems not proper to deal this in callbacks of the PCH PIC domain and let's put this back like the initial commit ef8c01eb64ca ("irqchip: Add Loongson PCH PIC controller"). Fixes: bcdd75c596c8 ("irqchip/loongson-pch-pic: Add ACPI init support") Cc: stable@vger.kernel.org Reviewed-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Liu Peibao <liupeibao@loongson.cn> Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230614115936.5950-3-lvjianmin@loongson.cn
2023-06-16irqchip/loongson-pch-pic: Fix initialization of HT vector registerJianmin Lv
In an ACPI-based dual-bridge system, IRQ of each bridge's PCH PIC sent to CPU is always a zero-based number, which means that the IRQ on PCH PIC of each bridge is mapped into vector range from 0 to 63 of upstream irqchip(e.g. EIOINTC). EIOINTC N: [0 ... 63 | 64 ... 255] -------- ---------- ^ ^ | | PCH PIC N | PCH MSI N For example, the IRQ vector number of sata controller on PCH PIC of each bridge is 16, which is sent to upstream irqchip of EIOINTC when an interrupt occurs, which will set bit 16 of EIOINTC. Since hwirq of 16 on EIOINTC has been mapped to a irq_desc for sata controller during hierarchy irq allocation, the related mapped IRQ will be found through irq_resolve_mapping() in the IRQ domain of EIOINTC. So, the IRQ number set in HT vector register should be fixed to be a zero-based number. Cc: stable@vger.kernel.org Reviewed-by: Huacai Chen <chenhuacai@loongson.cn> Co-developed-by: liuyun <liuyun@loongson.cn> Signed-off-by: liuyun <liuyun@loongson.cn> Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230614115936.5950-2-lvjianmin@loongson.cn
2023-06-16docs: perf: Add new description for HiSilicon UC PMUJunhao He
A new function is added on HiSilicon uncore UC PMU. The UC PMU support to filter statistical information based on the specified tx request uring channel. Make user configuration through "uring_channel" parameter. Document them to provide guidance on how to use them. Signed-off-by: Junhao He <hejunhao3@huawei.com> Reviewed-by: Jonathan Cameron <Jonthan.Cameron@huawei.com> Reviewed-by: Yicong Yang <yangyicong@hisilicon.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230615125926.29832-4-hejunhao3@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2023-06-16drivers/perf: hisi: Add support for HiSilicon UC PMU driverJunhao He
On HiSilicon Hip09 platform, there are 4 UC (unified cache) modules on each chip CCL (CPU Cluster). UC is a cache that provides coherence between NUMA and UMA domains. It is located between L2 and Memory System. Many PMU events are supported. Let's support the UC PMU driver using the HiSilicon uncore PMU framework. * rd_req_en : rd_req_en is the abbreviation of read request tracetag enable and allows user to count only read operations. Details are listed in the hisi-pmu document at Documentation/admin-guide/perf/hisi-pmu.rst * srcid_en & srcid: Allows users to filter statistical information based on specific CPU/ICL by srcid. srcid_en depends on rd_req_en being enabled. * uring_channel: Allows users to filter statistical information based on the specified tx request uring channel. uring_channel only supported events: [0x47 ~ 0x59]. Signed-off-by: Junhao He <hejunhao3@huawei.com> Reviewed-by: Yicong Yang <yangyicong@hisilicon.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230615125926.29832-3-hejunhao3@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2023-06-16drivers/perf: hisi: Add support for HiSilicon H60PA and PAv3 PMU driverJunhao He
Compared to the original PA device, H60PA offers higher bandwidth. The H60PA is a new device and we use HID to differentiate them. The events supported by PAv3 and PAv2 are different. The PAv3 PMU removed some events which are supported by PAv2 PMU. The older PA PMU driver will probe v3 as v2. Therefore PA events displayed by "perf list" cannot work properly. We add the HISI0275 HID for PAv3 PMU to distinguish different. For each H60PA PMU, except for the overflow interrupt register, other functions of the H60PA PMU are the same as the original PA PMU module. It has 8-programable counters and each counter is free-running. Interrupt is supported to handle counter (64-bits) overflow. Signed-off-by: Junhao He <hejunhao3@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Yicong Yang <yangyicong@hisilicon.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230615125926.29832-2-hejunhao3@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2023-06-16Merge branch irq/lpi-resend into irq/irqchip-nextMarc Zyngier
* irq/lpi-resend: : . : Patch series from James Gowans, working around an issue with : GICv3 LPIs that can fire concurrently on multiple CPUs. : . irqchip/gic-v3-its: Enable RESEND_WHEN_IN_PROGRESS for LPIs genirq: Allow fasteoi handler to resend interrupts on concurrent handling genirq: Expand doc for PENDING and REPLAY flags genirq: Use BIT() for the IRQD_* state flags Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-06-16irqchip/gic-v3-its: Enable RESEND_WHEN_IN_PROGRESS for LPIsJames Gowans
GICv3 LPIs are impacted by an architectural design issue: they do not have a global active state and as such a given LPI can be delivered to a new CPU after an affinity change while the previous instance of the same LPI handler has not yet completed on the original CPU. If LPIs had an active state, this second LPI would not be delivered until the first CPU deactivated the initial LPI, just like SPIs. To solve this issue, use the newly introduced IRQD_RESEND_WHEN_IN_PROGRESS flag, ensuring that we do not lose an LPI being delivered during that window by getting the GIC to resend it. This workaround gets enabled for all LPIs, including the VPE doorbells. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: James Gowans <jgowans@amazon.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <maz@kernel.org> Cc: KarimAllah Raslan <karahmed@amazon.com> Cc: Yipeng Zou <zouyipeng@huawei.com> Cc: Zhang Jianhua <chris.zjh@huawei.com> [maz: massaged commit message] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230608120021.3273400-4-jgowans@amazon.com
2023-06-16genirq: Allow fasteoi handler to resend interrupts on concurrent handlingJames Gowans
There is a class of interrupt controllers out there that, once they have signalled a given interrupt number, will still signal incoming instances of the *same* interrupt despite the original interrupt not having been EOIed yet. As long as the new interrupt reaches the *same* CPU, nothing bad happens, as that CPU still has its interrupts globally disabled, and we will only take the new interrupt once the interrupt has been EOIed. However, things become more "interesting" if an affinity change comes in while the interrupt is being handled. More specifically, while the per-irq lock is being dropped. This results in the affinity change taking place immediately. At this point, there is nothing that prevents the interrupt from firing on the new target CPU. We end-up with the interrupt running concurrently on two CPUs, which isn't a good thing. And that's where things become worse: the new CPU notices that the interrupt handling is in progress (irq_may_run() return false), and *drops the interrupt on the floor*. The whole race looks like this: CPU 0 | CPU 1 -----------------------------|----------------------------- interrupt start | handle_fasteoi_irq | set_affinity(CPU 1) handler | ... | interrupt start ... | handle_fasteoi_irq -> early out handle_fasteoi_irq return | interrupt end interrupt end | If the interrupt was an edge, too bad. The interrupt is lost, and the system will eventually die one way or another. Not great. A way to avoid this situation is to detect this problem at the point we handle the interrupt on the new target. Instead of dropping the interrupt, use the resend mechanism to force it to be replayed. Also, in order to limit the impact of this workaround to the pathetic architectures that require it, gate it behind a new irq flag aptly named IRQD_RESEND_WHEN_IN_PROGRESS. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: James Gowans <jgowans@amazon.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <maz@kernel.org> Cc: KarimAllah Raslan <karahmed@amazon.com> Cc: Yipeng Zou <zouyipeng@huawei.com> Cc: Zhang Jianhua <chris.zjh@huawei.com> [maz: reworded commit mesage] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230608120021.3273400-3-jgowans@amazon.com
2023-06-16genirq: Expand doc for PENDING and REPLAY flagsJames Gowans
Adding a bit more info about what the flags are used for may help future code readers. Signed-off-by: James Gowans <jgowans@amazon.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <maz@kernel.org> Cc: Liao Chang <liaochang1@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230608120021.3273400-2-jgowans@amazon.com
2023-06-16genirq: Use BIT() for the IRQD_* state flagsMarc Zyngier
As we're about to use the last bit available in the IRQD_* state flags, rewrite these flags with BIT(), which ensures that these constant do not represent a signed value. Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-06-16perf: arm_cspmu: Add missing MODULE_DEVICE_TABLEIlkka Koskinen
Add missing MODULE_DEVICE_TABLE definition to generate modalias, which enables module autoloading. Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Link: https://lore.kernel.org/r/20230615232630.304870-1-ilkka@os.amperecomputing.com Signed-off-by: Will Deacon <will@kernel.org>
2023-06-16perf/arm-cmn: Add sysfs identifierRobin Murphy
Expose a sysfs identifier encapsulating the CMN part number and revision so that jevents can narrow down a fundamental set of possible events for calculating metrics. Configuration-dependent aspects - such as whether a given node type is present, and/or a given node ID is valid - are still not covered, and in general it's hard to see how userspace could handle them, so we won't be removing any data or validation logic from the driver any time soon, but at least it's a step in a useful direction. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-and-tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Tested-by: Jing Zhang <renyu.zj@linux.alibaba.com> Link: https://lore.kernel.org/r/b8a14c14fcdf028939ebf57849863e8ae01743de.1686588640.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-06-16perf/arm-cmn: Revamp model detectionRobin Murphy
CMN implements a set of CoreSight-format peripheral ID registers which in principle we should be able to use to identify the hardware. However so far we have avoided trying to use the part number field since the TRMs have all described it as "configuration dependent". It turns out, though, that this is a quirk of the documentation generation process, and in fact the part number should always be a stable well-defined field which we can trust. To that end, revamp our model detection to rely less on ACPI/DT, and pave the way towards further using the hardware information as an identifier for userspace jevent metrics. This includes renaming the revision constants to maximise readability. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-and-tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Link: https://lore.kernel.org/r/3c791eaae814b0126f9adbd5419bfb4a600dade7.1686588640.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-06-16perf/arm_dmc620: Add cpumaskXin Yang
Add a cpumask for the DMC620 PMU. As it is an uncore PMU, perf userspace tool only needs to open a single counter on the CPU specified by the CPU mask for each event on a given DMC620 device. Signed-off-by: Xin Yang <xin.yang@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20230613013423.2078397-1-xin.yang@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-06-16x86/xen: Set default memory type for PV guests to WBJuergen Gross
When running as an unprivileged PV guest under Xen (not dom0), the default MTRR memory type should be write-back. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20230615123959.12298-1-jgross@suse.com
2023-06-16x86/mm: Remove unused current_untag_mask()Borislav Petkov (AMD)
e0bddc19ba95 ("x86/mm: Reduce untagged_addr() overhead for systems without LAM") removed its only usage site so drop it. Move the tlbstate_untag_mask up in the header and drop the ugly ifdeffery as the unused declaration should be properly discarded. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/r/20230614174148.5439-1-bp@alien8.de
2023-06-16xfrm: Linearize the skb after offloading if needed.Sebastian Andrzej Siewior
With offloading enabled, esp_xmit() gets invoked very late, from within validate_xmit_xfrm() which is after validate_xmit_skb() validates and linearizes the skb if the underlying device does not support fragments. esp_output_tail() may add a fragment to the skb while adding the auth tag/ IV. Devices without the proper support will then send skb->data points to with the correct length so the packet will have garbage at the end. A pcap sniffer will claim that the proper data has been sent since it parses the skb properly. It is not affected with INET_ESP_OFFLOAD disabled. Linearize the skb after offloading if the sending hardware requires it. It was tested on v4, v6 has been adopted. Fixes: 7785bba299a8d ("esp: Add a software GRO codepath") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-06-16x86/fpu: Move FPU initialization into arch_cpu_finalize_init()Thomas Gleixner
Initializing the FPU during the early boot process is a pointless exercise. Early boot is convoluted and fragile enough. Nothing requires that the FPU is set up early. It has to be initialized before fork_init() because the task_struct size depends on the FPU register buffer size. Move the initialization to arch_cpu_finalize_init() which is the perfect place to do so. No functional change. This allows to remove quite some of the custom early command line parsing, but that's subject to the next installment. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230613224545.902376621@linutronix.de
2023-06-16x86/fpu: Mark init functions __initThomas Gleixner
No point in keeping them around. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20230613224545.841685728@linutronix.de