summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-03-08powerpc/pseries/vas: Add 'update_total_credits' entry for QoS capabilitiesHaren Myneni
pseries supports two types of credits - Default (uses normal priority FIFO) and Qality of service (QoS uses high priority FIFO). The user decides the number of QoS credits and sets this value with HMC interface. The total credits for QoS capabilities can be changed dynamically with HMC interface which invokes drmgr to communicate to the kernel. This patch creats 'update_total_credits' entry for QoS capabilities so that drmgr command can write the new target QoS credits in sysfs. Instead of using this value, the kernel gets the new QoS capabilities from the hypervisor whenever update_total_credits is updated to make sure sync with the QoS target credits in the hypervisor. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b01ef31a0f964686d00243e7de7f09c73c07e69e.camel@linux.ibm.com
2022-03-08powerpc/pseries/vas: sysfs interface to export capabilitiesHaren Myneni
The hypervisor provides the available VAS GZIP capabilities such as default or QoS window type and the target available credits in each type. This patch creates sysfs entries and exports the target, used and the available credits for each feature. This interface can be used by the user space to determine the credits usage or to set the target credits in the case of QoS type (for DLPAR). /sys/devices/vas/vas0/gzip/default_capabilities (default GZIP capabilities) nr_total_credits /* Total credits available. Can be /* changed with DLPAR operation */ nr_used_credits /* Used credits */ /sys/devices/vas/vas0/gzip/qos_capabilities (QoS GZIP capabilities) nr_total_credits nr_used_credits Signed-off-by: Haren Myneni <haren@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/702d8b626ebfac2b52f4995eebeafe1c9a6fcb75.camel@linux.ibm.com
2022-03-08powerpc/pseries/vas: Reopen windows with DLPAR core addHaren Myneni
VAS windows can be closed in the hypervisor due to lost credits when the core is removed and the kernel gets fault for NX requests on these inactive windows. If the NX requests are issued on these inactive windows, OS gets page faults and the paste failure will be returned to the user space. If the lost credits are available later with core add, reopen these windows and set them active. Later when the OS sees page faults on these active windows, it creates mapping on the new paste address. Then the user space can continue to use these windows and send HW compression requests to NX successfully. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d9f360e21355e6826142c81146acfa9b60bc7ecc.camel@linux.ibm.com
2022-03-08powerpc/pseries/vas: Close windows with DLPAR core removalHaren Myneni
The hypervisor assigns vas credits (windows) for each LPAR based on the number of cores configured in that system. The OS is expected to release credits when cores are removed, and may allocate more when cores are added. So there is a possibility of using excessive credits (windows) in the LPAR and the hypervisor expects the system to close the excessive windows so that NX load can be equally distributed across all LPARs in the system. When the OS closes the excessive windows in the hypervisor, it sets the window status inactive and invalidates window virtual address mapping. The user space receives paste instruction failure if any NX requests are issued on the inactive window. Then the user space can use with the available open windows or retry NX requests until this window active again. This patch also adds the notifier for core removal/add to close windows in the hypervisor if the system lost credits (core removal) and reopen windows in the hypervisor when the previously lost credits are available. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/108928f9c00a48cc6a722315d482d07cf66acf5a.camel@linux.ibm.com
2022-03-08powerpc/vas: Map paste address only if window is activeHaren Myneni
The paste address mapping is done with mmap() after the window is opened with ioctl. The partition has to close VAS windows in the hypervisor if it lost credits due to DLPAR core removal. But the kernel marks these windows inactive until the previously lost credits are available later. If the window is inactive due to DLPAR after this mmap(), the paste instruction returns failure until the the OS reopens this window again. Before the user space issuing mmap(), there is a possibility of happening DLPAR core removal event which causes the corresponding window inactive. So if the window is not active, return mmap() failure with -EACCES and expects the user space reissue mmap() when the window is active or open a new window when the credit is available. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/bbb203c26b324534e25658cb1dbbcb5160a2f93a.camel@linux.ibm.com
2022-03-08powerpc/vas: Return paste instruction failure if no active windowHaren Myneni
The VAS window may not be active if the system looses credits and the NX generates page fault when it receives request on unmap paste address. The kernel handles the fault by remap new paste address if the window is active again, Otherwise return the paste instruction failure if the executed instruction that caused the fault was a paste. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/492b9aefd593061d51dda67ee4d2fc449c000dce.camel@linux.ibm.com
2022-03-08powerpc/vas: Add paste address mmap fault handlerHaren Myneni
The user space opens VAS windows and issues NX requests by pasting CRB on the corresponding paste address mmap. When the system lost credits due to core removal, the kernel has to close the window in the hypervisor and make the window inactive by unmapping this paste address. Also the OS has to handle NX request page faults if the user space issue NX requests. This handler maps the new paste address with the same VMA when the window is active again (due to core add with DLPAR). Otherwise returns paste failure. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3956e1c1fdfde69127055ff1c0256c7d71104030.camel@linux.ibm.com
2022-03-08powerpc/pseries/vas: Save PID in pseries_vas_window structHaren Myneni
The kernel sets the VAS window with PID when it is opened in the hypervisor. During DLPAR operation, windows can be closed and reopened in the hypervisor when the credit is available. So saves this PID in pseries_vas_window struct when the window is opened initially and reuse it later during DLPAR operation. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a57cbe6d292fe49ad55a0b49c5679d6a24d8fe73.camel@linux.ibm.com
2022-03-08powerpc/pseries/vas: Use common names in VAS capability structureHaren Myneni
nr_total/nr_used_credits provides credits usage to user space via sysfs and the same interface can be used on PowerNV in future. Changed with proper naming so that applicable on both pseries and PowerNV. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f4313e9f198ee4f8d4fa4d015d8d1873e17851e6.camel@linux.ibm.com
2022-03-08Merge branch 'topic/ppc-kvm' into nextMichael Ellerman
Merge our topic branch containing powerpc KVM related commits. Alexey Kardashevskiy (1): KVM: PPC: Merge powerpc's debugfs entry content into generic entry Fabiano Rosas (9): KVM: PPC: Book3S HV: Stop returning internal values to userspace KVM: PPC: Fix vmx/vsx mixup in mmio emulation KVM: PPC: mmio: Reject instructions that access more than mmio.data size KVM: PPC: mmio: Return to guest after emulation failure KVM: PPC: Book3s: mmio: Deliver DSI after emulation failure KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init KVM: PPC: Book3S HV: Delay setting of kvm ops KVM: PPC: Book3S HV: Free allocated memory if module init fails KVM: PPC: Decrement module refcount if init_vm fails Jason Wang (1): powerpc/kvm: no need to initialise statics to 0 Nour-eddine Taleb (1): KVM: PPC: Book3S HV: remove unnecessary casts
2022-03-07Merge branch 'topic/func-desc-lkdtm' into nextMichael Ellerman
Merge a topic branch we are maintaining with some cross-architecture changes to function descriptor handling and their use in LKDTM. From Christophe's cover letter: Fix LKDTM for PPC64/IA64/PARISC PPC64/IA64/PARISC have function descriptors. LKDTM doesn't work on those three architectures because LKDTM messes up function descriptors with functions. This series does some cleanup in the three architectures and refactors function descriptors so that it can then easily use it in a generic way in LKDTM.
2022-03-04KVM: PPC: Book3S HV: remove unnecessary castsNour-eddine Taleb
Remove unnecessary casts, from "void *" to "struct kvmppc_xics *" Signed-off-by: Nour-eddine Taleb <kernel.noureddine@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220303143416.201851-1-kernel.noureddine@gmail.com
2022-03-01powerpc/lib/sstep: Fix build errors with newer binutilsAnders Roxell
Building tinyconfig with gcc (Debian 11.2.0-16) and assembler (Debian 2.37.90.20220207) the following build error shows up: {standard input}: Assembler messages: {standard input}:10576: Error: unrecognized opcode: `stbcx.' {standard input}:10680: Error: unrecognized opcode: `lharx' {standard input}:10694: Error: unrecognized opcode: `lbarx' Rework to add assembler directives [1] around the instruction. The problem with this might be that we can trick a power6 into single-stepping through an stbcx. for instance, and it will execute that in kernel mode. [1] https://sourceware.org/binutils/docs/as/PowerPC_002dPseudo.html#PowerPC_002dPseudo Fixes: 350779a29f11 ("powerpc: Handle most loads and stores in instruction emulation code") Cc: stable@vger.kernel.org # v4.14+ Co-developed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220224162215.3406642-3-anders.roxell@linaro.org
2022-03-01powerpc: Fix build errors with newer binutilsAnders Roxell
Building tinyconfig with gcc (Debian 11.2.0-16) and assembler (Debian 2.37.90.20220207) the following build error shows up: {standard input}: Assembler messages: {standard input}:1190: Error: unrecognized opcode: `stbcix' {standard input}:1433: Error: unrecognized opcode: `lwzcix' {standard input}:1453: Error: unrecognized opcode: `stbcix' {standard input}:1460: Error: unrecognized opcode: `stwcix' {standard input}:1596: Error: unrecognized opcode: `stbcix' ... Rework to add assembler directives [1] around the instruction. Going through them one by one shows that the changes should be safe. Like __get_user_atomic_128_aligned() is only called in p9_hmi_special_emu(), which according to the name is specific to power9. And __raw_rm_read*() are only called in things that are powernv or book3s_hv specific. [1] https://sourceware.org/binutils/docs/as/PowerPC_002dPseudo.html#PowerPC_002dPseudo Cc: stable@vger.kernel.org Co-developed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> [mpe: Make commit subject more descriptive] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220224162215.3406642-2-anders.roxell@linaro.org
2022-03-01powerpc/lib/sstep: Fix 'sthcx' instructionAnders Roxell
Looks like there been a copy paste mistake when added the instruction 'stbcx' twice and one was probably meant to be 'sthcx'. Changing to 'sthcx' from 'stbcx'. Fixes: 350779a29f11 ("powerpc: Handle most loads and stores in instruction emulation code") Cc: stable@vger.kernel.org # v4.14+ Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220224162215.3406642-1-anders.roxell@linaro.org
2022-03-01powerpc/Makefile: Don't pass -mcpu=powerpc64 when building 32-bitMichael Ellerman
When CONFIG_GENERIC_CPU=y (true for all our defconfigs) we pass -mcpu=powerpc64 to the compiler, even when we're building a 32-bit kernel. This happens because we have an ifdef CONFIG_PPC_BOOK3S_64/else block in the Makefile that was written before 32-bit supported GENERIC_CPU. Prior to that the else block only applied to 64-bit Book3E. The GCC man page says -mcpu=powerpc64 "[specifies] a pure ... 64-bit big endian PowerPC ... architecture machine [type], with an appropriate, generic processor model assumed for scheduling purposes." It's unclear how that interacts with -m32, which we are also passing, although obviously -m32 is taking precedence in some sense, as the 32-bit kernel only contains 32-bit instructions. This was noticed by inspection, not via any bug reports, but it does affect code generation. Comparing before/after code generation, there are some changes to instruction scheduling, and the after case (with -mcpu=powerpc64 removed) the compiler seems more keen to use r8. Fix it by making the else case only apply to Book3E 64, which excludes 32-bit. Fixes: 0e00a8c9fd92 ("powerpc: Allow CPU selection also on PPC32") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220215112858.304779-1-mpe@ellerman.id.au
2022-03-01powerpc/mm/numa: skip NUMA_NO_NODE onlining in parse_numa_properties()Daniel Henrique Barboza
Executing node_set_online() when nid = NUMA_NO_NODE results in an undefined behavior. node_set_online() will call node_set_state(), into __node_set(), into set_bit(), and since NUMA_NO_NODE is -1 we'll end up doing a negative shift operation inside arch/powerpc/include/asm/bitops.h. This potential UB was detected running a kernel with CONFIG_UBSAN. The behavior was introduced by commit 10f78fd0dabb ("powerpc/numa: Fix a regression on memoryless node 0"), where the check for nid > 0 was removed to fix a problem that was happening with nid = 0, but the result is that now we're trying to online NUMA_NO_NODE nids as well. Checking for nid >= 0 will allow node 0 to be onlined while avoiding this UB with NUMA_NO_NODE. Fixes: 10f78fd0dabb ("powerpc/numa: Fix a regression on memoryless node 0") Reported-by: Ping Fang <pifang@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220224182312.1012527-1-danielhb413@gmail.com
2022-03-01powerpc/interrupt: Remove struct interrupt_stateChristophe Leroy
Since commit ceff77efa4f8 ("powerpc/64e/interrupt: Use new interrupt context tracking scheme") struct interrupt_state has been empty and unused. Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1d862ce3eab3da6ca7ac47d4a78a18f154462511.1645806970.git.christophe.leroy@csgroup.eu
2022-03-01powerpc/fadump: register for fadump as early as possibleHari Bathini
Crash recovery (fadump) is setup in the userspace by some service. This service rebuilds initrd with dump capture capability, if it is not already dump capture capable before proceeding to register for firmware assisted dump (echo 1 > /sys/kernel/fadump/registered). But arming the kernel with crash recovery support does not have to wait for userspace configuration. So, register for fadump while setting it up itself. This can at worst lead to a scenario, where /proc/vmcore is ready afer crash but the initrd does not know how/where to offload it, which is always better than not having a /proc/vmcore at all due to incomplete configuration in the userspace at the time of crash. Commit 0823c68b054b ("powerpc/fadump: re-register firmware-assisted dump if already registered") ensures this change does not break userspace. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> [mpe: Reword comment] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220201105305.155511-1-hbathini@linux.ibm.com
2022-03-01selftests/powerpc/pmu: Add interface test for mmcra register fieldsKajol Jain
The testcase uses event code 0x35340401e0 to verify the settings for different fields in Monitor Mode Control Register A (MMCRA). The fields include thresh_start, thresh_stop thresh_select, sdar mode, sample and marked bit. Checks if these fields are translated correctly via perf interface to MMCRA. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> [mpe: Add error checking, drop GET_MMCR_FIELD, add to .gitignore] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220127072012.662451-21-kjain@linux.ibm.com
2022-03-01selftests/powerpc/pmu/: Add interface test for mmcr3_src fieldsKajol Jain
The testcase uses event code 0x1340000001c040 to verify the settings for different src fields in Monitor Mode Control Register 3 (MMCR3). Checks if these fields are translated correctly via perf interface to MMCR3 on ISA v3.1 platform. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> [mpe: Add error checking, drop GET_MMCR_FIELD, add to .gitignore] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220127072012.662451-20-kjain@linux.ibm.com
2022-03-01selftests/powerpc/pmu/: Add interface test for mmcr2_fcs_fch fieldsMadhavan Srinivasan
The testcases uses cycles event to verify the freeze counter settings in Monitor Mode Control Register 2 (MMCR2). Event modifier (exclude_kernel) setting is used for the event attribute to check the FCxS and FCxH ( Freeze counter in privileged and hypervisor state ) settings via perf interface. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> [mpe: Add error checking, check MSR for MSR_HV, drop GET_MMCR_FIELD, add to .gitignore] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220127072012.662451-19-kjain@linux.ibm.com
2022-03-01selftests/powerpc/pmu/: Add interface test for mmcr2_l2l3 fieldMadhavan Srinivasan
The testcases uses event code 0x010000046080 to verify the l2l3 bit setting for Monitor Mode Control Register 2 (MMCR2). check if this bit is set correctly via perf interface in ISA v3.1 platform. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> [mpe: Add error checking, drop GET_MMCR_FIELD, add to .gitignore] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220127072012.662451-18-kjain@linux.ibm.com
2022-03-01selftests/powerpc/pmu/: Add interface test for mmcr1_comb fieldAthira Rajeev
The testcase uses event code "0x26880" to verify the settings for different fields in Monitor Mode Control Register 1 (MMCR1). The field include PMCxCOMB. Checks if this field are translated correctly via perf interface to MMCR1 Add selftest for mmcr1 comb field. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> [mpe: Add error checking, drop GET_MMCR_FIELD, add to .gitignore] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220127072012.662451-16-kjain@linux.ibm.com
2022-03-01selftests/powerpc/pmu/: Add interface test for mmcr0_pmc56 using pmc5Athira Rajeev
The testcase uses event code 0x500fa to verify the FC5-6 bit setting in Monitor Mode Control Register 0 (MMCR0). Check if FC5-6 bit is not set in MMCR0 when using Performance Monitor Counter 5 and 6 (PMC5 and PMC6). Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> [mpe: Add error checking, drop GET_MMCR_FIELD, add to .gitignore] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220127072012.662451-15-kjain@linux.ibm.com
2022-03-01selftests/powerpc/pmu/: Add interface test for mmcr0_fc56 field using pmc1Athira Rajeev
The testcase uses event code 0x1001e to verify two bit settings (FC5-6 and PMC1CE) in Monitor Mode Control Register 0 (MMCR0). Check if FC5-6 bit to be set in MMCR0 when not using Performance Monitor Counter 5 and 6 (PMC5 and PMC6). And also PMC1CE is expected to be set when using PMC1. Test if these fields are programmed correctly via perf interface. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> [mpe: Add error checking, drop GET_MMCR_FIELD, add to .gitignore] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220127072012.662451-14-kjain@linux.ibm.com
2022-03-01selftests/powerpc/pmu/: Add interface test for mmcr0_pmcjce fieldAthira Rajeev
The testcase uses event code 0x500fa ("instructions") to verify the PMCjCE bit setting in Monitor Mode Control Register 0 (MMCR0). This bit is expected to be set in MMCR0 when using Performance Monitor Counter 5 (PMC5). Checks if perf interface sets this bit correctly. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> [mpe: Add error checking, drop GET_MMCR_FIELD, add to .gitignore] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220127072012.662451-13-kjain@linux.ibm.com
2022-03-01selftests/powerpc/pmu/: Add interface test for mmcr0_pmccext bitAthira Rajeev
The testcase uses cycles event to check the PMCCEXT bit setting in Monitor Mode Control Register 0 (MMCR0). Check if perf interface sets this control bit in MMCR0 on ISA v3.1 platform. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> [mpe: Add error checking, drop GET_MMCR_FIELD, add to .gitignore] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220127072012.662451-12-kjain@linux.ibm.com
2022-03-01selftests/powerpc/pmu/: Add interface test for mmcr0_cc56run fieldAthira Rajeev
The testcase uses event code 0x500fa ("instructions") to check the CC56RUN bit setting in Monitor Mode Control Register 0(MMCR0). In ISA v3.1 platform, this bit is expected to be set in MMCR0 when using Performance Monitor Counter 5 and 6 (PMC5 and PMC6). Verify this is done correctly by perf interface. CC56RUN bit makes PMC5 and PMC6 count regardless of the run latch state. This bit is set in power10 since PMC5 and PMC6 is used in power10 for counting instructions and cycles. Hence added a check to skip this test in other platforms Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> [mpe: Add error checking, drop GET_MMCR_FIELD, add to .gitignore] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220127072012.662451-11-kjain@linux.ibm.com
2022-03-01selftests/powerpc/pmu/: Add interface test for mmcr0 exception bitsAthira Rajeev
The testcase uses "instructions" event to verify two bits(PMAE and PMAO) in Monitor Mode Control Register 0 (MMCR0). At the time of interrupt, pmae bit ( which enables performance monitor exception ) is expected to be cleared and pmao (which indicates performance monitor alert) bit is expected to be set in MMCR0. And testcases handles these checks. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> [mpe: Add error checking, drop GET_MMCR_FIELD, add to .gitignore] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220127072012.662451-10-kjain@linux.ibm.com
2022-03-01selftests/powerpc/pmu: Add macro to extract mmcr3 and mmcra fieldsKajol Jain
Add macro and utility functions to fetch individual fields from Monitor Mode Control Register 3(MMCR3)and Monitor Mode Control Register A(MMCRA) PMU registers Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220127072012.662451-9-kjain@linux.ibm.com
2022-03-01selftests/powerpc/pmu: Add macro to extract mmcr0/mmcr1 fieldsAthira Rajeev
Add macro and utility functions to fetch individual fields from Monitor Mode Control Register 0(MMCR0) and Monitor Mode Control Register 1(MMCR1) PMU register. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220127072012.662451-8-kjain@linux.ibm.com
2022-03-01selftests/powerpc/pmu: Add macros to extract mmcr fieldsMadhavan Srinivasan
Along with it, Add macros and utility functions to fetch individual fields from Monitor Mode Control Register 2(MMCR2) register. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220127072012.662451-7-kjain@linux.ibm.com
2022-03-01selftests/powerpc/pmu: Add event_init_sampling functionMadhavan Srinivasan
Extended event_init_opts() to include initialization of sampling testcases. Patch adds an event_init_sampling() wrapper to initialize event attribute fields for sampling events. This includes initializing sample period, sample type and event type. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220127072012.662451-6-kjain@linux.ibm.com
2022-03-01selftests/powerpc/pmu: Add utility functions to post process the mmap bufferKajol Jain
Add couple of basic utility functions to post process the mmap buffer. It includes function to read the total number of samples present in the mmap buffer and function to get the address of the first sample. Add function "get_intr_regs" which will return pointer to interrupt registers present in the sample, incase sample type PERF_SAMPLE_REGS_INTR is set. Add functions "get_reg_value" which can be used to read any interrupt register value from a given sample. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220127072012.662451-5-kjain@linux.ibm.com
2022-03-01selftests/powerpc/pmu: Add macros to parse event codesMadhavan Srinivasan
Each platform has raw event encoding format which specifies the bit positions for different fields. The fields from event code gets translated into performance monitoring mode control register (MMCRx) settings. Patch add macros to extract individual fields from the event code. Add functions for sanity checks, since testcases currently are only supported in power9 and power10. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> [mpe: Read PVR directly rather than using /proc/cpuinfo] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220127072012.662451-4-kjain@linux.ibm.com
2022-03-01selftests/powerpc/pmu: Add support for perf sampling testsAthira Rajeev
Add support functions for enabling perf sampling test in a new folder "sampling_tests" under "selftests/powerpc/pmu". This includes support functions for allocating and processing the mmap buffer. These functions are added/defined in "sampling_tests/misc.*" files. Also updates the corresponding Makefiles in "selftests/powerpc" and "sampling_tests" folder. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> [mpe: Drop unneeded bits from the Makefile] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220127072012.662451-3-kjain@linux.ibm.com
2022-02-28selftests/powerpc/pmu: Include mmap_buffer field as part of struct eventAthira Rajeev
To enable the capturing of samples as part of perf event, add a new field "mmap_buffer" to "struct event". This field is a place-holder for sample collection Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220127072012.662451-2-kjain@linux.ibm.com
2022-02-24powerpc/module_64: fix array_size.cocci warningGuo Zhengkui
Fix following coccicheck warning: ./arch/powerpc/kernel/module_64.c:432:40-41: WARNING: Use ARRAY_SIZE. ARRAY_SIZE(arr) is a macro provided by the kernel. It makes sure that arr is an array, so it's safer than sizeof(arr) / sizeof(arr[0]) and more standard. Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220223075426.20939-1-guozhengkui@vivo.com
2022-02-24powerpc/64s/hash: Make hash faults work in NMI contextNicholas Piggin
Hash faults are not resoved in NMI context, instead causing the access to fail. This is done because perf interrupts can get backtraces including walking the user stack, and taking a hash fault on those could deadlock on the HPTE lock if the perf interrupt hits while the same HPTE lock is being held by the hash fault code. The user-access for the stack walking will notice the access failed and deal with that in the perf code. The reason to allow perf interrupts in is to better profile hash faults. The problem with this is any hash fault on a kernel access that happens in NMI context will crash, because kernel accesses must not fail. Hard lockups, system reset, machine checks that access vmalloc space including modules and including stack backtracing and symbol lookup in modules, per-cpu data, etc could all run into this problem. Fix this by disallowing perf interrupts in the hash fault code (the direct hash fault is covered by MSR[EE]=0 so the PMI disable just needs to extend to the preload case). This simplifies the tricky logic in hash faults and perf, at the cost of reduced profiling of hash faults. perf can still latch addresses when interrupts are disabled, it just won't get the stack trace at that point, so it would still find hot spots, just sometimes with confusing stack chains. An alternative could be to allow perf interrupts here but always do the slowpath stack walk if we are in nmi context, but that slows down all perf interrupt stack walking on hash though and it does not remove as much tricky code. Reported-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220204035348.545435-1-npiggin@gmail.com
2022-02-23powerpc: Remove remaining stab codesChristophe Leroy
Following commit 12318163737c ("powerpc/32: Remove remaining .stabs annotations"), stabs code are not used anymore. Remove them. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d8b33342d7454f6ca4f368f5206896558dfa06f4.1645538722.git.christophe.leroy@csgroup.eu
2022-02-16lkdtm: Add a test for function descriptors protectionChristophe Leroy
Add WRITE_OPD to check that you can't modify function descriptors. Gives the following result when function descriptors are not protected: lkdtm: Performing direct entry WRITE_OPD lkdtm: attempting bad 16 bytes write at c00000000269b358 lkdtm: FAIL: survived bad write lkdtm: do_nothing was hijacked! Looks like a standard compiler barrier() is not enough to force GCC to use the modified function descriptor. Had to add a fake empty inline assembly to force GCC to reload the function descriptor. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7eeba50d16a35e9d799820e43304150225f20197.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16lkdtm: Fix execute_[user]_location()Christophe Leroy
execute_location() and execute_user_location() intent to copy do_nothing() text and execute it at a new location. However, at the time being it doesn't copy do_nothing() function but do_nothing() function descriptor which still points to the original text. So at the end it still executes do_nothing() at its original location allthough using a copied function descriptor. So, fix that by really copying do_nothing() text and build a new function descriptor by copying do_nothing() function descriptor and updating the target address with the new location. Also fix the displayed addresses by dereferencing do_nothing() function descriptor. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4055839683d8d643cd99be121f4767c7c611b970.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16lkdtm: Really write into kernel text in WRITE_KERNChristophe Leroy
WRITE_KERN is supposed to overwrite some kernel text, namely do_overwritten() function. But at the time being it overwrites do_overwritten() function descriptor, not function text. Fix it by dereferencing the function descriptor to obtain function text pointer. Export dereference_function_descriptor() for when LKDTM is built as a module. And make do_overwritten() noinline so that it is really do_overwritten() which is called by lkdtm_WRITE_KERN(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/31e58eaffb5bc51c07d8d4891d1982100ade8cfc.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16lkdtm: Force do_nothing() out of lineChristophe Leroy
LKDTM tests display that the run do_nothing() at a given address, but in reality do_nothing() is inlined into the caller. Force it out of line so that it really runs text at the displayed address. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a5dcf4d2088e6aca47ab3b4c6d5c0f7fa064e25a.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16asm-generic: Refactor dereference_[kernel]_function_descriptor()Christophe Leroy
dereference_function_descriptor() and dereference_kernel_function_descriptor() are identical on the three architectures implementing them. Make them common and put them out-of-line in kernel/extable.c which is one of the users and has similar type of functions. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/449db09b2eba57f4ab05f80102a67d8675bc8bcd.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16asm-generic: Define 'func_desc_t' to commonly describe function descriptorsChristophe Leroy
We have three architectures using function descriptors, each with its own type and name. Add a common typedef that can be used in generic code. Also add a stub typedef for architecture without function descriptors, to avoid a forest of #ifdefs. It replaces the similar 'func_desc_t' previously defined in arch/powerpc/kernel/module_64.c Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f1f91b142b3c1082bdc1586ce71c9bac1e75213c.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16asm-generic: Define CONFIG_HAVE_FUNCTION_DESCRIPTORSChristophe Leroy
Replace HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR by a config option named CONFIG_HAVE_FUNCTION_DESCRIPTORS and use it instead of 'dereference_function_descriptor' macro to know whether an arch has function descriptors. To limit churn in one of the following patches, use an #ifdef/#else construct with empty first part instead of an #ifndef in asm-generic/sections.h On powerpc, make sure the config option matches the ABI used by the compiler with a BUILD_BUG_ON() and add missing _CALL_ELF=2 when calling 'sparse' so that sparse sees the same piece of code as GCC. And include a helper to check whether an arch has function descriptors or not : have_function_descriptors() Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4a0f11fb0ea74a3197bc44dd7ba25e53a24fd03d.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16ia64: Rename 'ip' to 'addr' in 'struct fdesc'Christophe Leroy
There are three architectures with function descriptors, try to have common names for the address they contain in order to refactor some functions into generic functions later. powerpc has 'entry' ia64 has 'ip' parisc has 'addr' Vote for 'addr' and update 'struct fdesc' accordingly. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/65b73ac614e4c002c5819d40b42f6f426d2ee52b.1644928018.git.christophe.leroy@csgroup.eu
2022-02-16powerpc: Prepare func_desc_t for refactorisationChristophe Leroy
In preparation of making func_desc_t generic, change the ELFv2 version to a struct containing 'addr' element. This allows using single helpers common to ELFv1 and ELFv2 and reduces the amount of #ifdef's Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5c36105e08b27b98450535bff48d71b690c19739.1644928018.git.christophe.leroy@csgroup.eu