Age | Commit message (Collapse) | Author |
|
Today, core ID is assumed to be unique within each package.
But an AlderLake-N platform adds a Module level between core and package,
Linux excludes the unknown modules bits from the core ID, resulting in
duplicate core ID's.
To keep core ID unique within a package, Linux must include all APIC-ID
bits for known or unknown levels above the core and below the package
in the core ID.
It is important to understand that core ID's have always come directly
from the APIC-ID encoding, which comes from the BIOS. Thus there is no
guarantee that they start at 0, or that they are contiguous.
As such, naively using them for array indexes can be problematic.
[ dhansen: un-known -> unknown ]
Fixes: 7745f03eb395 ("x86/topology: Add CPUID.1F multi-die/package support")
Suggested-by: Len Brown <len.brown@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20221014090147.1836-5-rui.zhang@intel.com
|
|
CPUID.1F/B does not enumerate Package level explicitly, instead, all the
APIC-ID bits above the enumerated levels are assumed to be package ID
bits.
Current code gets package ID by shifting out all the APIC-ID bits that
Linux supports, rather than shifting out all the APIC-ID bits that
CPUID.1F enumerates. This introduces problems when CPUID.1F enumerates a
level that Linux does not support.
For example, on a single package AlderLake-N, there are 2 Ecore Modules
with 4 atom cores in each module. Linux does not support the Module
level and interprets the Module ID bits as package ID and erroneously
reports a multi module system as a multi-package system.
Fix this by using APIC-ID bits above all the CPUID.1F enumerated levels
as package ID.
[ dhansen: spelling fix ]
Fixes: 7745f03eb395 ("x86/topology: Add CPUID.1F multi-die/package support")
Suggested-by: Len Brown <len.brown@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20221014090147.1836-4-rui.zhang@intel.com
|
|
The coretemp driver supports up to a hard-coded limit of 128 cores.
Today, the driver can not support a core with an ID above that limit.
Yet, the encoding of core ID's is arbitrary (BIOS APIC-ID) and so they
may be sparse and they may be large.
Update the driver to map arbitrary core ID numbers into appropriate
array indexes so that 128 cores can be supported, no matter the encoding
of core ID's.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Len Brown <len.brown@intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20221014090147.1836-3-rui.zhang@intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"This fixes an issue exposed by the recent change to feed untrusted
sources into /dev/random"
* tag 'v6.1-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
hwrng: bcm2835 - use hwrng_msleep() instead of cpu_relax()
|
|
MTL's media IP (Xe_LPM+) only has a single type of steering ("OAADDRM")
which selects between media slice 0 and media slice 1. We'll always
steer to media slice 0 unless it is fused off (which is the case when
VD0, VE0, and SFC0 are all reported as unavailable).
Bspec: 67789
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014230239.1023689-15-matthew.d.roper@intel.com
|
|
MTL's graphics IP (Xe_LPG) once again changes the multicast register
types and steering details. Key changes from past platforms:
* The number of instances of some MCR types (NODE, OAAL2, and GAM) vary
according to the MTL subplatform and cannot be read from fuse
registers. However steering to instance #0 will always provided a
non-terminated value, so we can lump these all into a single
"instance0" table.
* The MCR steering register (and its bitfields) has changed.
Unlike past platforms, we will be explicitly steering all types of MCR
accesses, including those for "SLICE" and "DSS" ranges; we no longer
rely on implicit steering. On previous platforms, various
hardware/firmware agents that needed to access registers typically had
their own steering control registers, allowing them to perform multicast
steering without clobbering the CPU/kernel steering. Starting with MTL,
more of these agents now share a single steering register (0xFD4) and it
is no longer safe for us to assume that the value will remain unchanged
from how we initialized it during startup. There is also a slight
chance of race conditions between the driver and a hardware/firmware
agent, so the hardware provides a semaphore register that can be used to
coordinate access to the steering register. Support for the semaphore
register will be introduced in a future patch.
v2:
- Use Xe_LPG terminology instead of "MTL 3D" since it's the IP version
we're matching on now rather than the platform.
- Don't combine l3bank and mslice masks into a union. It's not related
to the other changes here and we might still need both of them on
some future platform.
- Separate debug dumping of steering settings to a separate helper
function. (Tvrtko)
- Update debug dumping to include DSS ranges (and future-proof it so
that any new ranges added on future platforms will also be dumped).
- Restore MULTICAST bit at the end of rw_with_mcr_steering_fw() if we
cleared it. Also force the MULTICAST bit to true at the beginning of
multicast writes just to be safe. (Bala)
Bspec: 67788, 67112
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014230239.1023689-14-matthew.d.roper@intel.com
|
|
Rather than treating multicast registers as 'i915_reg_t' let's define
them as a completely new type. This will allow the compiler to help us
make sure we're using multicast-aware functions to operate on multicast
registers.
This plan does break down a bit in places where we're just maintaining
heterogeneous lists of registers (e.g., various MMIO whitelists used by
perf, GVT, etc.) rather than performing reads/writes. We only really
care about the offset in those cases, so for now we can "cast" the
registers as non-MCR, leaving us with a list of i915_reg_t's, but we may
want to look for better ways to store mixed collections of i915_reg_t
and i915_mcr_reg_t in the future.
v2:
- Add TLB invalidation registers
v3:
- Make type checking of i915_mmio_reg_offset() stricter. It will
accept either i915_reg_t or i915_mcr_reg_t, but will now raise a
compile error if any other type is passed, even if that type contains
a 'reg' field. (Jani)
- Drop a ton of GVT changes; allowing i915_mmio_reg_offset() to take
either an i915_reg_t or an i915_mcr_reg_t means that the huge lists
of MMIO_D*() macros used in GVT will continue to work without
modification. We need only make changes to structures that have an
explicit i915_reg_t in them now.
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014230239.1023689-13-matthew.d.roper@intel.com
|
|
Let's be more explicit about which of our workarounds are updating MCR
registers.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014230239.1023689-12-matthew.d.roper@intel.com
|
|
MCR registers can be placed on the GuC's save/restore list, but at the
moment they are always handled in a multicast manner (i.e., the GuC
reads one instance to save the value and then does a multicast write to
restore that single value to all instances). In the future the GuC will
probably give us an alternate interface to do unicast per-instance
save/restore operations, so we should be very clear about which
registers on the list are MCR registers (and in the future which
save/restore behavior we want for them).
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014230239.1023689-11-matthew.d.roper@intel.com
|
|
Rather than relying on the implicit behavior of intel_uncore_*()
functions, let's always use the intel_gt_mcr_*() functions to operate on
multicast/replicated registers.
v2:
- Add TLB invalidation registers
v3:
- Switch more uncore operations in mmio_invalidate_full() to MCR
operations for Xe_HP. (Bala)
Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014230239.1023689-10-matthew.d.roper@intel.com
|
|
Rather than using the same _MMIO() macro to define MCR registers as
singleton registers, let's use a new MCR_REG() macro to make it clear
that these registers are special and should be handled accordingly. For
now MCR_REG() will still generate an i915_reg_t with the given offset,
but we'll change that in future patches.
Bspec: 66673, 66696, 66534, 67609
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014230239.1023689-9-matthew.d.roper@intel.com
|
|
Xe_HP has some MCR registers that need to be polled for completion of
operations like TLB invalidation. Those registers are in the GAM range,
which rolls up the status from each unit into the 'primary' instance's
value. This makes it useful to have a dedicated 'wait for register'
function that handles this on MCR registers, similar to the
__intel_wait_for_register_fw() function we already have for regular
registers.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014230239.1023689-8-matthew.d.roper@intel.com
|
|
On Xe_HP the fault registers are now in a multicast register range.
However as part of the GAM these registers follow special rules and we
need only read from the "primary" GAM's instance to get the information
we need. So a single intel_gt_mcr_read_any() (which will automatically
steer to the primary GAM) is sufficient; we don't need to loop over each
instance of the MCR register.
v2:
- Update more instances of fault registers. (Bala)
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014230239.1023689-7-matthew.d.roper@intel.com
|
|
There are cases where we wish to read from any non-terminated MCR
register instance (or the primary instance in the case of GAM ranges),
clear/set some bits, and then write the value back out to the register
in a multicast manner. Adding a "multicast RMW" will avoid the need to
open-code this.
v2:
- Return a u32 to align with the recent change to intel_uncore_rmw.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014230239.1023689-6-matthew.d.roper@intel.com
|
|
We have a few registers that have existed for several hardware
generations, but are only used by the driver on Xe_HP and beyond. In
cases where the Xe_HP version of the register is now replicated and uses
multicast behavior, but earlier generations were singleton, let's change
the register prefix to "XEHP_" to help clarify that we're using the
newer multicast form of the register.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014230239.1023689-5-matthew.d.roper@intel.com
|
|
Let's drop a few register definitions that are unused anywhere in the
driver today. Since the referenced offsets are part of what is now
considered a multicast register region, the current definitions would
not be correct for use on any future platform.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014230239.1023689-4-matthew.d.roper@intel.com
|
|
Starting in Xe_HP, several registers our driver works with have been
converted from singleton registers into replicated registers with
multicast behavior. Although the registers are still located at the
same MMIO offsets as on previous platforms, let's duplicate the register
definitions in preparation for upcoming patches that will handle
multicast registers in a special manner.
The registers that are now replicated on Xe_HP are:
* PAT_INDEX (mslice replication)
* FF_MODE2 (gslice replication)
* COMMON_SLICE_CHICKEN3 (gslice replication)
* SLICE_COMMON_ECO_CHICKEN1 (gslice replication)
* SLICE_UNIT_LEVEL_CLKGATE (gslice replication)
* LNCFCMOCS (lncf replication)
Note that there are a couple places in selftest_mocs.c where the
gen9 version of LNCFCMOCS is still used without regards for which
platform we're on. Those cases are just doing an offset lookup and not
issuing any CPU reads/writes of the register, so the potentially
multicast nature of the register doesn't come into play.
v2:
- Add commit message note about the unconditional GEN9_LNCFCMOCS usage
in selftest_mocs. (Bala)
- Include some additional TLB registers.
Bspec: 66534
Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014230239.1023689-3-matthew.d.roper@intel.com
|
|
Gen8 was the first time our hardware had multicast registers (or at
least the first time the multicast nature was exposed and MMIO accesses
could be steered). There are some registers that transitioned from
singleton behavior to multicast during the gen7 -> gen8 transition;
let's duplicate the register definitions for those registers in
preparation for upcoming patches that will handle MCR registers in a
special manner.
The registers adjusted are:
* MISCCPCTL
* SAMPLER_INSTDONE
* ROW_INSTDONE
* ROW_CHICKEN2
* HALF_SLICE_CHICKEN1
* HALF_SLICE_CHICKEN3
v2:
- Use the gen8 version of HALF_SLICE_CHICKEN3 in GVT's gen9 engine MMIO
list. (Bala)
- Update to the gen8 version of MISCCPCTL in a couple new workarounds
that were recently added for DG2/PVC. (Bala)
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014230239.1023689-2-matthew.d.roper@intel.com
|
|
A recent change in LLVM made CONFIG_EFI_STUB unselectable because it no
longer pretends to support -mabi=ms, breaking the dependency in
Kconfig. Lack of CONFIG_EFI_STUB can prevent kernels from booting via
EFI in certain circumstances.
This check was added by
8f24f8c2fc82 ("efi/libstub: Annotate firmware routines as __efiapi")
to ensure that __attribute__((ms_abi)) was available, as -mabi=ms is
not actually used in any cflags.
According to the GCC documentation, this attribute has been supported
since GCC 4.4.7. The kernel currently requires GCC 5.1 so this check is
not necessary; even when that change landed in 5.6, the kernel required
GCC 4.9 so it was unnecessary then as well.
Clang supports __attribute__((ms_abi)) for all versions that are
supported for building the kernel so no additional check is needed.
Remove the 'depends on' line altogether to allow CONFIG_EFI_STUB to be
selected when CONFIG_EFI is enabled, regardless of compiler.
Fixes: 8f24f8c2fc82 ("efi/libstub: Annotate firmware routines as __efiapi")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Cc: stable@vger.kernel.org
Link: https://github.com/llvm/llvm-project/commit/d1ad006a8f64bdc17f618deffa9e7c91d82c444d
|
|
This reverts commit 8bb7ff12a91429eb76e093b517ae810b146448fe.
Commit 8bb7ff12a914 ("PCI: tegra: Use PCI_CONF1_EXT_ADDRESS() macro")
updated the Tegra PCI driver to use the macro PCI_CONF1_EXT_ADDRESS()
instead of a local function in the Tegra PCI driver. This broke PCI for
some Tegra platforms because, when calculating the offset value, the mask
applied to the lower 8-bits changed from 0xff to 0xfc.
For now, fix this by reverting this commit.
Fixes: 8bb7ff12a914 ("PCI: tegra: Use PCI_CONF1_EXT_ADDRESS() macro")
Link: https://lore.kernel.org/r/20221017084006.11770-1-jonathanh@nvidia.com
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Acked-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
|
|
Introduce distinct struct balance_callback instead of performing function
pointer casting which will trip CFI. Avoids warnings as found by Clang's
future -Wcast-function-type-strict option:
In file included from kernel/sched/core.c:84:
kernel/sched/sched.h:1755:15: warning: cast from 'void (*)(struct rq *)' to 'void (*)(struct callback_head *)' converts to incompatible function type [-Wcast-function-type-strict]
head->func = (void (*)(struct callback_head *))func;
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
No binary differences result from this change.
This patch is a cleanup based on Brad Spengler/PaX Team's modifications
to sched code in their last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code
are mine and don't reflect the original grsecurity/PaX code.
Reported-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/1724
Link: https://lkml.kernel.org/r/20221008000758.2957718-1-keescook@chromium.org
|
|
In commit 97886d9dcd86 ("sched: Migration changes for core scheduling"),
sched_group_cookie_match() was added to help determine if a cookie
matches the core state.
However, while it iterates the SMT group, it fails to actually use the
RQ for each of the CPUs iterated, use cpu_rq(cpu) instead of rq to fix
things.
Fixes: 97886d9dcd86 ("sched: Migration changes for core scheduling")
Signed-off-by: Lin Shengwang <linshengwang1@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221008022709.642-1-linshengwang1@huawei.com
|
|
* Raw data is also filled by bpf_perf_event_output.
* Add sample_flags to indicate raw data.
* This eliminates the segfaults as shown below:
Run ./samples/bpf/trace_output
BUG pid 9 cookie 1001000000004 sized 4
BUG pid 9 cookie 1001000000004 sized 4
BUG pid 9 cookie 1001000000004 sized 4
Segmentation fault (core dumped)
Fixes: 838d9bb62d13 ("perf: Use sample_flags for raw_data")
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/r/20221007081327.1047552-1-sumanthk@linux.ibm.com
|
|
Add a SIGTRAP stress test that exercises repeatedly enabling/disabling
an event while it concurrently keeps firing.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/Y0E3uG7jOywn7vy3@elver.google.com/
|
|
Marco reported:
Due to the implementation of how SIGTRAP are delivered if
perf_event_attr::sigtrap is set, we've noticed 3 issues:
1. Missing SIGTRAP due to a race with event_sched_out() (more
details below).
2. Hardware PMU events being disabled due to returning 1 from
perf_event_overflow(). The only way to re-enable the event is
for user space to first "properly" disable the event and then
re-enable it.
3. The inability to automatically disable an event after a
specified number of overflows via PERF_EVENT_IOC_REFRESH.
The worst of the 3 issues is problem (1), which occurs when a
pending_disable is "consumed" by a racing event_sched_out(), observed
as follows:
CPU0 | CPU1
--------------------------------+---------------------------
__perf_event_overflow() |
perf_event_disable_inatomic() |
pending_disable = CPU0 | ...
| _perf_event_enable()
| event_function_call()
| task_function_call()
| /* sends IPI to CPU0 */
<IPI> | ...
__perf_event_enable() +---------------------------
ctx_resched()
task_ctx_sched_out()
ctx_sched_out()
group_sched_out()
event_sched_out()
pending_disable = -1
</IPI>
<IRQ-work>
perf_pending_event()
perf_pending_event_disable()
/* Fails to send SIGTRAP because no pending_disable! */
</IRQ-work>
In the above case, not only is that particular SIGTRAP missed, but also
all future SIGTRAPs because 'event_limit' is not reset back to 1.
To fix, rework pending delivery of SIGTRAP via IRQ-work by introduction
of a separate 'pending_sigtrap', no longer using 'event_limit' and
'pending_disable' for its delivery.
Additionally; and different to Marco's proposed patch:
- recognise that pending_disable effectively duplicates oncpu for
the case where it is set. As such, change the irq_work handler to
use ->oncpu to target the event and use pending_* as boolean toggles.
- observe that SIGTRAP targets the ctx->task, so the context switch
optimization that carries contexts between tasks is invalid. If
the irq_work were delayed enough to hit after a context switch the
SIGTRAP would be delivered to the wrong task.
- observe that if the event gets scheduled out
(rotation/migration/context-switch/...) the irq-work would be
insufficient to deliver the SIGTRAP when the event gets scheduled
back in (the irq-work might still be pending on the old CPU).
Therefore have event_sched_out() convert the pending sigtrap into a
task_work which will deliver the signal at return_to_user.
Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Debugged-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Marco Elver <elver@google.com>
Debugged-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Marco Elver <elver@google.com>
|
|
== Background ==
The XSTATE init code initializes all enabled and supported components.
Then, the init states are saved in the init_fpstate buffer that is
statically allocated in about one page.
The AMX TILE_DATA state is large (8KB) but its init state is zero. And the
feature comes only with the compacted format with these established
dependencies: AMX->XFD->XSAVES. So this state is excludable from
init_fpstate.
== Problem ==
But the buffer is formatted to include that large state. Then, this can be
the cause of a noisy splat like the below.
This came from XRSTORS for the task with init_fpstate in its XSAVE buffer.
It is reproducible on AMX systems when the running kernel is built with
CONFIG_DEBUG_PAGEALLOC=y and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y:
Bad FPU state detected at restore_fpregs_from_fpstate+0x57/0xd0, reinitializing FPU registers.
...
RIP: 0010:restore_fpregs_from_fpstate+0x57/0xd0
? restore_fpregs_from_fpstate+0x45/0xd0
switch_fpu_return+0x4e/0xe0
exit_to_user_mode_prepare+0x17b/0x1b0
syscall_exit_to_user_mode+0x29/0x40
do_syscall_64+0x67/0x80
? do_syscall_64+0x67/0x80
? exc_page_fault+0x86/0x180
entry_SYSCALL_64_after_hwframe+0x63/0xcd
== Solution ==
Adjust init_fpstate to exclude dynamic states. XRSTORS from init_fpstate
still initializes those states when their bits are set in the
requested-feature bitmap.
Fixes: 2308ee57d93d ("x86/fpu/amx: Enable the AMX feature in 64-bit mode")
Reported-by: Lin X Wang <lin.x.wang@intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Lin X Wang <lin.x.wang@intel.com>
Link: https://lore.kernel.org/r/20220824191223.1248-4-chang.seok.bae@intel.com
|
|
The init_fpstate buffer is statically allocated. Thus, the sanity test was
established to check whether the pre-allocated buffer is enough for the
calculated size or not.
The currently measured size is not strictly relevant. Fix to validate the
calculated init_fpstate size with the pre-allocated area.
Also, replace the sanity check function with open code for clarity. The
abstraction itself and the function naming do not tend to represent simply
what it does.
Fixes: 2ae996e0c1a3 ("x86/fpu: Calculate the default sizes independently")
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220824191223.1248-3-chang.seok.bae@intel.com
|
|
The init_fpstate setup code is spread out and out of order. The init image
is recorded before its scoped features and the buffer size are determined.
Determine the scope of init_fpstate components and its size before
recording the init state. Also move the relevant code together.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: neelnatu@google.com
Link: https://lore.kernel.org/r/20220824191223.1248-2-chang.seok.bae@intel.com
|
|
If CONFIG_OF is disabled and the ahci_st driver is builtin (or
CONFIG_MODULES is disabled), then using the macro of_match_ptr()
results in the st_ahci_match variable being unused, which generates a
compilation warning and a compilation error if CONFIG_WERROR is enabled.
Fix this by directly assigning st_ahci_match to .of_match_table in the
st_ahci_driver platform driver definition.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
|
|
Extend hwmon power/energy for XEHPSDV especially per gt level energy
usage.
v2: Update to latest HWMON spec (Ashutosh)
v3: Fix review comments (Ashutosh)
v4: Fix review comments (Anshuman)
v5: s/hwmon_device_register_with_info/
devm_hwmon_device_register_with_info/ (Ashutosh)
v6: Change contact to intel-gfx (Rodrigo)
GEN12_RPSTAT1 is available for all Gen12+ (Andi)
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Dale B Stimson <dale.b.stimson@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221013154526.2105579-8-ashutosh.dixit@intel.com
|
|
Expose power1_max_interval, that is the tau corresponding to PL1, as a
custom hwmon attribute. Some bit manipulation is needed because of the
format of PKG_PWR_LIM_1_TIME in
GT0_PACKAGE_RAPL_LIMIT register (1.x * power(2,y)).
v2: Update date and kernel version in Documentation (Badal)
v3: Cleaned up hwm_power1_max_interval_store() (Badal)
v4:
- Fixed review comments (Anshuman)
- In hwm_power1_max_interval_store() get PKG_MAX_WIN from
pkg_power_sku when it is valid (Ashutosh)
- KernelVersion: 6.2, Date: February 2023 in doc (Tvrtko)
v5: On some of the DGFX setups it is seen that although pkg_power_sku
is valid the field PKG_WIN_MAX is not populated. So it is
decided to stick to default value of PKG_WIN_MAX (Ashutosh)
v6: Change contact to intel-gfx (Rodrigo)
Fixed variable types in hwm_power1_max_interval_store (Andi)
Documented PKG_MAX_WIN_DEFAULT (Andi)
Removed else in hwm_attributes_visible (Andi)
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221013154526.2105579-7-ashutosh.dixit@intel.com
|
|
Expose the card reactive critical (I1) power. I1 is exposed as
power1_crit in microwatts (typically for client products) or as
curr1_crit in milliamperes (typically for server).
v2: Add curr1_crit functionality (Ashutosh)
v3: Use HWMON_CHANNEL_INFO to define power1_crit, curr1_crit (Badal)
v4: Use hwm_ prefix for static functions (Ashutosh)
v5: KernelVersion: 6.2, Date: February 2023 in doc (Tvrtko)
v6: Change contact to intel-gfx (Rodrigo)
Cc: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221013154526.2105579-6-ashutosh.dixit@intel.com
|
|
Use i915 HWMON to display device level energy input.
v2: Updated the date and kernel version in feature description
v3:
- Cleaned up hwm_energy function and removed unused function
i915_hwmon_energy_status_get (Ashutosh)
v4: KernelVersion: 6.2, Date: February 2023 in doc (Tvrtko)
v5: Change contact to intel-gfx (Rodrigo)
Change return type of hwm_energy to void (Andi)
Signed-off-by: Dale B Stimson <dale.b.stimson@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221013154526.2105579-5-ashutosh.dixit@intel.com
|
|
Use i915 HWMON to display/modify dGfx power PL1 limit and TDP setting.
v2:
- Fix review comments (Ashutosh)
- Do not restore power1_max upon module unload/load sequence
because on production systems modules are always loaded
and not unloaded/reloaded (Ashutosh)
- Fix review comments (Jani)
- Remove endianness conversion (Ashutosh)
v3: Add power1_rated_max (Ashutosh)
v4:
- Use macro HWMON_CHANNEL_INFO to define power channel (Guenter)
- Update the date and kernel version in Documentation (Badal)
v5: Use hwm_ prefix for static functions (Ashutosh)
v6: Fix review comments (Ashutosh)
v7:
- Define PCU_PACKAGE_POWER_SKU for DG1,DG2 and move
PKG_PKG_TDP to intel_mchbar_regs.h (Anshuman)
- KernelVersion: 6.2, Date: February 2023 in doc (Tvrtko)
v8: Change contact to intel-gfx (Rodrigo)
Minor change to val_sku_unit init (Andi)
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Dale B Stimson <dale.b.stimson@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221013154526.2105579-4-ashutosh.dixit@intel.com
|
|
Use i915 HWMON subsystem to display current input voltage.
v2:
- Updated date and kernel version in feature description
- Fixed review comments (Ashutosh)
v3: Use macro HWMON_CHANNEL_INFO to define hwmon channel (Guenter)
v4:
- Fixed review comments (Ashutosh)
- Use hwm_ prefix for static functions (Ashutosh)
v5: Added unit of voltage as millivolts (Ashutosh)
v6: KernelVersion: 6.2, Date: February 2023 in doc (Tvrtko)
v7: Change contact to intel-gfx (Rodrigo)
GEN12_RPSTAT1 is available for all Gen12+ (Andi)
Added Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon
to MAINTAINERS
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221013154526.2105579-3-ashutosh.dixit@intel.com
|
|
The i915 HWMON module will be used to expose voltage, power and energy
values for dGfx. Here we set up i915 hwmon infrastructure including i915
hwmon registration, basic data structures and functions.
v2:
- Create HWMON infra patch (Ashutosh)
- Fixed review comments (Jani)
- Remove "select HWMON" from i915/Kconfig (Jani)
v3: Use hwm_ prefix for static functions (Ashutosh)
v4: s/#ifdef CONFIG_HWMON/#if IS_REACHABLE(CONFIG_HWMON)/ since the former
doesn't work if hwmon is compiled as a module (Guenter)
v5: Fixed review comments (Jani)
v6: s/kzalloc/devm_kzalloc/ (Andi)
v7: s/hwmon_device_register_with_info/
devm_hwmon_device_register_with_info/ (Ashutosh)
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Dale B Stimson <dale.b.stimson@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221013154526.2105579-2-ashutosh.dixit@intel.com
|
|
The function mtk_foe_entry_usable() is defined in the mtk_ppe.c file, but
not called elsewhere, so delete this unused function.
drivers/net/ethernet/mediatek/mtk_ppe.c:400:20: warning: unused function 'mtk_foe_entry_usable'.
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2409
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Yang Yingliang says:
====================
net: ethernet: mtk_eth_wed: fixe some leaks
I found some leaks in mtk_eth_soc.c/mtk_wed.c.
patch#1 - I found mtk_wed_exit() is never called, I think mtk_wed_exit() need
be called in error path or module remove function to free the memory
allocated in mtk_wed_add_hw().
patch#2 - The device is not put in error path in mtk_wed_add_hw().
patch#3 - The device_node pointer returned by of_parse_phandle() with refcount
incremented, it should be decreased when it done.
This patchset was just compiled tested because I don't have any HW on
which to do the actual tests.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The device_node pointer returned by of_parse_phandle() with refcount
incremented, when finish using it, the refcount need be decreased.
Fixes: 804775dfc288 ("net: ethernet: mtk_eth_soc: add support for Wireless Ethernet Dispatch (WED)")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After calling get_device() in mtk_wed_add_hw(), in error path, put_device()
needs be called.
Fixes: 804775dfc288 ("net: ethernet: mtk_eth_soc: add support for Wireless Ethernet Dispatch (WED)")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If mtk_wed_add_hw() has been called, mtk_wed_exit() needs be called
in error path or removing module to free the memory allocated in
mtk_wed_add_hw().
Fixes: 804775dfc288 ("net: ethernet: mtk_eth_soc: add support for Wireless Ethernet Dispatch (WED)")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
s_inodes is superblock-specific resource, which should be
protected by sb's specific lock s_inode_list_lock.
Link: https://lore.kernel.org/r/TYCP286MB23238380DE3B74874E8D78ABCA299@TYCP286MB2323.JPNP286.PROD.OUTLOOK.COM
Fixes: 7d41963759fe ("erofs: Support sharing cookies in the same domain")
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Dawei Li <set_pte_at@outlook.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
UBSAN complains about array-index-out-of-bounds:
[ 1.980703] kernel: UBSAN: array-index-out-of-bounds in /build/linux-9H675w/linux-5.15.0/drivers/ata/libahci.c:968:41
[ 1.980709] kernel: index 15 is out of range for type 'ahci_em_priv [8]'
[ 1.980713] kernel: CPU: 0 PID: 209 Comm: scsi_eh_8 Not tainted 5.15.0-25-generic #25-Ubuntu
[ 1.980716] kernel: Hardware name: System manufacturer System Product Name/P5Q3, BIOS 1102 06/11/2010
[ 1.980718] kernel: Call Trace:
[ 1.980721] kernel: <TASK>
[ 1.980723] kernel: show_stack+0x52/0x58
[ 1.980729] kernel: dump_stack_lvl+0x4a/0x5f
[ 1.980734] kernel: dump_stack+0x10/0x12
[ 1.980736] kernel: ubsan_epilogue+0x9/0x45
[ 1.980739] kernel: __ubsan_handle_out_of_bounds.cold+0x44/0x49
[ 1.980742] kernel: ahci_qc_issue+0x166/0x170 [libahci]
[ 1.980748] kernel: ata_qc_issue+0x135/0x240
[ 1.980752] kernel: ata_exec_internal_sg+0x2c4/0x580
[ 1.980754] kernel: ? vprintk_default+0x1d/0x20
[ 1.980759] kernel: ata_exec_internal+0x67/0xa0
[ 1.980762] kernel: sata_pmp_read+0x8d/0xc0
[ 1.980765] kernel: sata_pmp_read_gscr+0x3c/0x90
[ 1.980768] kernel: sata_pmp_attach+0x8b/0x310
[ 1.980771] kernel: ata_eh_revalidate_and_attach+0x28c/0x4b0
[ 1.980775] kernel: ata_eh_recover+0x6b6/0xb30
[ 1.980778] kernel: ? ahci_do_hardreset+0x180/0x180 [libahci]
[ 1.980783] kernel: ? ahci_stop_engine+0xb0/0xb0 [libahci]
[ 1.980787] kernel: ? ahci_do_softreset+0x290/0x290 [libahci]
[ 1.980792] kernel: ? trace_event_raw_event_ata_eh_link_autopsy_qc+0xe0/0xe0
[ 1.980795] kernel: sata_pmp_eh_recover.isra.0+0x214/0x560
[ 1.980799] kernel: sata_pmp_error_handler+0x23/0x40
[ 1.980802] kernel: ahci_error_handler+0x43/0x80 [libahci]
[ 1.980806] kernel: ata_scsi_port_error_handler+0x2b1/0x600
[ 1.980810] kernel: ata_scsi_error+0x9c/0xd0
[ 1.980813] kernel: scsi_error_handler+0xa1/0x180
[ 1.980817] kernel: ? scsi_unjam_host+0x1c0/0x1c0
[ 1.980820] kernel: kthread+0x12a/0x150
[ 1.980823] kernel: ? set_kthread_struct+0x50/0x50
[ 1.980826] kernel: ret_from_fork+0x22/0x30
[ 1.980831] kernel: </TASK>
This happens because sata_pmp_init_links() initialize link->pmp up to
SATA_PMP_MAX_PORTS while em_priv is declared as 8 elements array.
I can't find the maximum Enclosure Management ports specified in AHCI
spec v1.3.1, but "12.2.1 LED message type" states that "Port Multiplier
Information" can utilize 4 bits, which implies it can support up to 16
ports. Hence, use SATA_PMP_MAX_PORTS as EM_MAX_SLOTS to resolve the
issue.
BugLink: https://bugs.launchpad.net/bugs/1970074
Cc: stable@vger.kernel.org
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
|
|
'ahci:' is an invalid prefix, preventing the module from autoloading.
Fix this by using the 'platform:' prefix and DRV_NAME.
Fixes: 9e54eae23bc9 ("ahci_imx: add ahci sata support on imx platforms")
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
|
|
This debug statement was never meant to go into the upstream release,
kill it off before it ends up in a release. It was just part of the
testing for the initial version of the patch.
Fixes: 2ec33a6c3cca ("io_uring/rw: ensure kiocb_end_write() is always called")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Our syzkaller report a null pointer dereference, root cause is
following:
__blk_mq_alloc_map_and_rqs
set->tags[hctx_idx] = blk_mq_alloc_map_and_rqs
blk_mq_alloc_map_and_rqs
blk_mq_alloc_rqs
// failed due to oom
alloc_pages_node
// set->tags[hctx_idx] is still NULL
blk_mq_free_rqs
drv_tags = set->tags[hctx_idx];
// null pointer dereference is triggered
blk_mq_clear_rq_mapping(drv_tags, ...)
This is because commit 63064be150e4 ("blk-mq:
Add blk_mq_alloc_map_and_rqs()") merged the two steps:
1) set->tags[hctx_idx] = blk_mq_alloc_rq_map()
2) blk_mq_alloc_rqs(..., set->tags[hctx_idx])
into one step:
set->tags[hctx_idx] = blk_mq_alloc_map_and_rqs()
Since tags is not initialized yet in this case, fix the problem by
checking if tags is NULL pointer in blk_mq_clear_rq_mapping().
Fixes: 63064be150e4 ("blk-mq: Add blk_mq_alloc_map_and_rqs()")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/20221011142253.4015966-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
We should not be completing requests from a task context that has already
undergone io_uring cancellations, i.e. __io_uring_cancel(), as there are
some assumptions, e.g. around cached task refs draining. Remove
iopolling from io_ring_ctx_wait_and_kill() as it can be called later
after PF_EXITING is set with the last task_work run.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7c03cc91455c4a1af49c6b9cbda4e57ea467aa11.1665891182.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Don't duplicate io_alloc_req() in io_req_caches_free() but reuse the
helper.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6005fc88274864a49fc3096c22d8bdd605cf8576.1665891182.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We test file_table.bitmap in io_file_get_fixed() to check invariants,
don't do it, it's expensive and was showing up in profiles. No reports of
this triggering has come in. Move the check to the file clear instead,
which will still catch any wrong usage.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cf77f2ded68d2e5b2bc7355784d969837d48e023.1665891182.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|