Age | Commit message (Collapse) | Author |
|
detail_location[] is used to collect two location strings so they can
be passed as one to trace_mc_event(). Instead of having an extra copy
step, assemble the location string in other_detail[] from the
beginning.
Using other_detail[] to call trace_mc_event() is now the same as in
edac_mc.c and code can be unified.
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106093239.25517-12-rrichter@marvell.com
|
|
The current code to convert a physical address mask to a grain
(defined as granularity in bytes) is:
e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK);
This is broken in several ways:
1) It calculates to wrong grain values. E.g., a physical address mask
of ~0xfff should give a grain of 0x1000. Without considering
PAGE_MASK, there is an off-by-one. Things are worse when also
filtering it with ~PAGE_MASK. This will calculate to a grain with the
upper bits set. In the example it even calculates to ~0.
2) The grain does not depend on and is unrelated to the kernel's
page-size. The page-size only matters when unmapping memory in
memory_failure(). Smaller grains are wrongly rounded up to the
page-size, on architectures with a configurable page-size (e.g. arm64)
this could round up to the even bigger page-size of the hypervisor.
Fix this with:
e->grain = ~mem_err->physical_addr_mask + 1;
The grain_bits are defined as:
grain = 1 << grain_bits;
Change also the grain_bits calculation accordingly, it is the same
formula as in edac_mc.c now and the code can be unified.
The value in ->physical_addr_mask coming from firmware is assumed to
be contiguous, but this is not sanity-checked. However, in case the
mask is non-contiguous, a conversion to grain_bits effectively
converts the grain bit mask to a power of 2 by rounding it up.
Suggested-by: James Morse <james.morse@arm.com>
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106093239.25517-11-rrichter@marvell.com
|
|
Use standard macros for page calculations.
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106093239.25517-10-rrichter@marvell.com
|
|
Reduce the indentation level in edac_mc_handle_error() a bit.
No functional changes.
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106093239.25517-7-rrichter@marvell.com
|
|
The e string to which this is pointing to has already been cleared
earlier in the function so remove the needless zero string termination.
[ bp: Correct the commit message. ]
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106093239.25517-6-rrichter@marvell.com
|
|
No need to crash the system in case edac_mc_alloc() is called with
invalid arguments, just warn and return. This would cause a checkpatch
warning when touching the code later, so just fix it.
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106093239.25517-5-rrichter@marvell.com
|
|
Introduce an mci_for_each_dimm() iterator. It returns a pointer to
a struct dimm_info. This makes the declaration and use of an index
obsolete and avoids access to internal data of struct mci (direct array
access etc).
[ bp: push the struct dimm_info *dimm; declaration into the
CONFIG_EDAC_DEBUG block. ]
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106093239.25517-4-rrichter@marvell.com
|
|
The EDAC_DIMM_OFF() macro takes 5 arguments to get the DIMM's index.
Simplify this by storing the index in struct dimm_info to avoid its
calculation and remove the EDAC_DIMM_OFF() macro. The index can be
directly used then.
Another advantage is that edac_mc_alloc() could be used even if the
exact size of the layers is unknown. Only the number of DIMMs would be
needed.
Rename iterator variable to idx, while at it. The name is more handy,
esp. when searching for it in the code.
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106093239.25517-3-rrichter@marvell.com
|
|
The EDAC_DIMM_PTR() macro takes 3 arguments from struct mem_ctl_info.
Clean up this interface to only pass the mci struct and replace this
macro with a new function edac_get_dimm().
Also introduce an edac_get_dimm_by_index() function for later use.
This allows it to get a DIMM pointer only by a given index. This can
be useful if the DIMM's position within the layers of the memory
controller or the exact size of the layers are unknown.
Small style changes made for some hunks after applying the semantic
patch.
Semantic patch used:
@@ expression mci, a, b,c; @@
-EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers, a, b, c)
+edac_get_dimm(mci, a, b, c)
[ bp: Touchups. ]
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tero Kristo <t-kristo@ti.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106093239.25517-2-rrichter@marvell.com
|
|
This message keeps flooding dmesg on boxes where ECC is disabled or the
DIMMs do not support ECC but the module gets auto-probed. What's even
worse is that autoprobing happens on every CPU due to the CPU-family
matching the driver does and uevent being generated for each CPU device.
What is more, this message is becoming even more useless on newer
systems where forcing ECC is not recommended and it should be done in
the BIOS so the BIOS can do all the necessary work, i.e., just setting a
bit in an MSR is not enough anymore.
So get rid of it.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: linux-edac@vger.kernel.org
Link: https://lkml.kernel.org/r/20191106160607.GC28380@zn.tnic
|
|
The ghes registration and refcount is broken in several ways:
* ghes_edac_register() returns with success for a 2nd instance
even if a first instance's registration is still running. This is
not correct as the first instance may fail later. A subsequent
registration may not finish before the first. Parallel registrations
must be avoided.
* The refcount was increased even if a registration failed. This
leads to stale counters preventing the device from being released.
* The ghes refcount may not be decremented properly on unregistration.
Always decrement the refcount once ghes_edac_unregister() is called to
keep the refcount sane.
* The ghes_pvt pointer is handed to the irq handler before registration
finished.
* The mci structure could be freed while the irq handler is running.
Fix this by adding a mutex to ghes_edac_register(). This mutex
serializes instances to register and unregister. The refcount is only
increased if the registration succeeded. This makes sure the refcount is
in a consistent state after registering or unregistering a device.
Note: A spinlock cannot be used here as the code section may sleep.
The ghes_pvt is protected by ghes_lock now. This ensures the pointer is
not updated before registration was finished or while the irq handler is
running. It is unset before unregistering the device including necessary
(implicit) memory barriers making the changes visible to other CPUs.
Thus, the device can not be used anymore by an interrupt.
Also, rename ghes_init to ghes_refcount for better readability and
switch to refcount API.
A refcount is needed because there can be multiple GHES structures being
defined (see ACPI 6.3 specification, 18.3.2.7 Generic Hardware Error
Source, "Some platforms may describe multiple Generic Hardware Error
Source structures with different notification types, ...").
Another approach to use the mci's device refcount (get_device()) and
have a release function does not work here. A release function will be
called only for device_release() with the last put_device() call. The
device must be deleted *before* that with device_del(). This is only
possible by maintaining an own refcount.
[ bp: touchups. ]
Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller")
Fixes: 1e72e673b9d1 ("EDAC/ghes: Fix Use after free in ghes_edac remove path")
Co-developed-by: James Morse <james.morse@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Co-developed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191105200732.3053-1-rrichter@marvell.com
|
|
Return early before checking for ECC if the node does not have any
populated memory.
Free any cached hardware data before returning. Also, return 0 in this
case since this is not a failure. Other nodes may have memory and the
module should attempt to load an instance for them.
Move printing of hardware information to after the instance is
initialized, so that the information is only printed for nodes with
memory.
Return an error code when ECC is disabled. This check happens after
checking for memory. The module should explicitly fail to load if memory
is populated on a node and ECC is disabled.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106012448.243970-6-Yazen.Ghannam@amd.com
|
|
...now that the data is available earlier.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106012448.243970-5-Yazen.Ghannam@amd.com
|
|
The maximum number of memory controllers is fixed within a family/model
group. In most cases, this has been fixed at 2, but some systems may
have up to 8.
The struct amd64_family_type already contains family/model-specific
information, and this can be used rather than adding model checks to
various functions.
Create a new field in struct amd64_family_type for max_mcs.
Set this when setting other family type information, and use this when
needing the maximum number of memory controllers possible for a system.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106012448.243970-4-Yazen.Ghannam@amd.com
|
|
Split out gathering hardware information from init_one_instance()
into a separate function hw_info_get(). This is necessary so that
the information can be cached earlier and used to check if memory is
populated and if ECC is enabled on a node.
Also, define a function hw_info_put() to back out changes made in
hw_info_get().
Check for an allocated PCI device (Function 0 for Family 17h or Function
1 for pre-Family 17h) before freeing, since hw_info_put() may be called
before PCI siblings are reserved.
Drop the family check when freeing pvt->umc. This will be NULL on
pre-Family 17h systems. However, kfree() is safe and will check for a
NULL pointer before freeing.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106012448.243970-3-Yazen.Ghannam@amd.com
|
|
The struct amd64_family_type doesn't change between multiple nodes and
instances of the module, so make it global.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106012448.243970-2-Yazen.Ghannam@amd.com
|
|
The following commit introduced a warning on error reports without a
non-zero grain value.
3724ace582d9 ("EDAC/mc: Fix grain_bits calculation")
The amd64_edac_mod module does not provide a value, so the warning will
be given on the first reported memory error.
Set the grain per DIMM to cacheline size (64 bytes). This is the current
recommendation.
Fixes: 3724ace582d9 ("EDAC/mc: Fix grain_bits calculation")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191022203448.13962-7-Yazen.Ghannam@amd.com
|
|
Simplify this function implementation by using a known wrapper function.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Joel Stanley <joel@jms.id.au>
Cc: Andrew Jeffery <andrew@aj.id.au>
Cc: James Morse <james.morse@arm.com>
Cc: kernel-janitors@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-aspeed@lists.ozlabs.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Stefan Schaeckeler <sschaeck@cisco.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/baabb9e9-a1b2-3a04-9fb6-aa632de5f722@web.de
|
|
Skylake logs some additional useful information in per-channel
registers in addition the the architectural status/addr/misc
logged in the machine check bank.
Pick up this information and add it to the EDAC log:
retry_rd_err_[five 32-bit register values]
Sorry, no definitions for these registers. OEMs and DIMM vendors
will be able to use them to isolate which cells in the DIMM are
causing problems.
correrrcnt[per rank corrected error counts]
Note that if additional errors are logged while these registers are
being read, you may see a jumble of values some from earlier errors,
others from later errors (since the registers report the most recent
logged error). The correrrcnt registers provide error counts per possible
rank. If these counts only change by one since the previous error logged
for this channel, then it is safe to assume that the registers logged
provide a coherent view of one error.
With this change EDAC logs look like this:
EDAC MC4: 1 CE memory read error on CPU_SrcID#2_MC#0_Chan#1_DIMM#0 (channel:1 slot:0 page:0x8f26018 offset:0x0 grain:32 syndrome:0x0 - err_code:0x0101:0x0091 socket:2 imc:0 rank:0 bg:0 ba:0 row:0x1f880 col:0x200 retry_rd_err_log[0001a209 00000000 00000001 04800001 0001f880] correrrcnt[0001 0000 0000 0000 0000 0000 0000 0000])
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Simplifies the code a little.
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Pick up urgent change into next queue.
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
ghes_edac models a single logical memory controller, and uses a global
ghes_init variable to ensure only the first ghes_edac_register() will
do anything.
ghes_edac is registered the first time a GHES entry in the HEST is
probed. There may be multiple entries, so subsequent attempts to
register ghes_edac are silently ignored as the work has already been
done.
When a GHES entry is unregistered, it calls ghes_edac_unregister(),
which free()s the memory behind the global variables in ghes_edac.
But there may be multiple GHES entries, the next call to
ghes_edac_unregister() will dereference the free()d memory, and attempt
to free it a second time.
This may also be triggered on a platform with one GHES entry, if the
driver is unbound/re-bound and unbound. The re-bind step will do
nothing because of ghes_init, the second unbind will then do the same
work as the first.
Doing the unregister work on the first call is unsafe, as another
CPU may be processing a notification in ghes_edac_report_mem_error(),
using the memory we are about to free.
ghes_init is already half of the reference counting. We only need
to do the register work for the first call, and the unregister work
for the last. Add the unregister check.
This means we no longer free ghes_edac's memory while there are
GHES entries that may receive a notification.
This was detected by KASAN and DEBUG_TEST_DRIVER_REMOVE.
[ bp: merge into a single patch. ]
Fixes: 0fe5f281f749 ("EDAC, ghes: Model a single, logical memory controller")
Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20191014171919.85044-2-james.morse@arm.com
Link: https://lkml.kernel.org/r/304df85b-8b56-b77e-1a11-aa23769f2e7c@huawei.com
|
|
Make the main workhorse the "count" functions which can log a @count
of errors. Have the current APIs edac_device_handle_{ce,ue}() call
the _count() variants and this way keep the exported symbols number
unchanged.
[ bp: Rewrite. ]
Signed-off-by: Hanna Hawa <hhhawa@amazon.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: benh@amazon.com
Cc: dwmw@amazon.co.uk
Cc: hanochu@amazon.com
Cc: James Morse <james.morse@arm.com>
Cc: jonnyc@amazon.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: ronenk@amazon.com
Cc: talel@amazon.com
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190923191741.29322-2-hhhawa@amazon.com
|
|
drivers/edac/skx_common.c: In function ‘skx_mce_output_error’:
drivers/edac/skx_common.c:478:8: warning: variable ‘type’ set but not used [-Wunused-but-set-variable]
478 | char *type, *optype;
| ^~~~
Acked-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
|
|
There are several vars unused on this driver, probably because
it was a modified copy of another driver. Get rid of them.
drivers/edac/sb_edac.c: In function ‘knl_get_dimm_capacity’:
drivers/edac/sb_edac.c:1343:16: warning: variable ‘sad_size’ set but not used [-Wunused-but-set-variable]
1343 | u64 sad_base, sad_size, sad_limit = 0;
| ^~~~~~~~
drivers/edac/sb_edac.c: In function ‘sbridge_mce_output_error’:
drivers/edac/sb_edac.c:2955:8: warning: variable ‘type’ set but not used [-Wunused-but-set-variable]
2955 | char *type, *optype, msg[256];
| ^~~~
drivers/edac/sb_edac.c: In function ‘sbridge_unregister_mci’:
drivers/edac/sb_edac.c:3203:22: warning: variable ‘pvt’ set but not used [-Wunused-but-set-variable]
3203 | struct sbridge_pvt *pvt;
| ^~~
At top level:
drivers/edac/sb_edac.c:266:18: warning: ‘correrrthrsld’ defined but not used [-Wunused-const-variable=]
266 | static const u32 correrrthrsld[] = {
| ^~~~~~~~~~~~~
drivers/edac/sb_edac.c:257:18: warning: ‘correrrcnt’ defined but not used [-Wunused-const-variable=]
257 | static const u32 correrrcnt[] = {
| ^~~~~~~~~~
Acked-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
|
|
There are several temporary unused vars:
drivers/edac/i5400_edac.c: In function ‘i5400_get_mc_regs’:
drivers/edac/i5400_edac.c:1058:6: warning: variable ‘maxdimmperch’ set but not used [-Wunused-but-set-variable]
1058 | int maxdimmperch;
| ^~~~~~~~~~~~
drivers/edac/i5400_edac.c:1057:6: warning: variable ‘maxch’ set but not used [-Wunused-but-set-variable]
1057 | int maxch;
| ^~~~~
drivers/edac/i5400_edac.c: In function ‘i5400_init_dimms’:
drivers/edac/i5400_edac.c:1174:6: warning: variable ‘max_dimms’ set but not used [-Wunused-but-set-variable]
1174 | int max_dimms;
| ^~~~~~~~~
drivers/edac/i5400_edac.c:1173:14: warning: variable ‘channel_count’ set but not used [-Wunused-but-set-variable]
1173 | int ndimms, channel_count;
| ^~~~~~~~~~~~~
Get rid of them.
Acked-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
|
|
There are 3 types of non-recoverable errors that the MC reports:
- Fatal;
- Non-fatal uncorrected
- Non-fatal correctable
While we don't add it to the log itself, it could be useful to
have this at least for debug messages.
This shuts up this warning:
drivers/edac/i5400_edac.c: In function ‘i5400_proccess_non_recoverable_info’:
drivers/edac/i5400_edac.c:524:8: warning: variable ‘type’ set but not used [-Wunused-but-set-variable]
524 | char *type = NULL;
| ^~~~
Acked-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
|
|
The declaration of the kerneldoc entry is wrong, causing this
warning:
drivers/edac/i7300_edac.c:824: warning: Function parameter or member 'mir_no' not described in 'decode_mir'
Acked-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
|
|
One var was renamed, but the associated kernel-doc markup still
points to the old name.
Acked-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
|
|
As reported by GCC with W=1:
drivers/edac/i5100_edac.c:714:16: warning: variable ‘et’ set but not used [-Wunused-but-set-variable]
714 | unsigned long et;
| ^~
It sounds some left over from some code before the addition of
an udelay().
Acked-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Olof Johansson:
"A few fixes that have trickled in through the merge window:
- Video fixes for OMAP due to panel-dpi driver removal
- Clock fixes for OMAP that broke no-idle quirks + nfsroot on DRA7
- Fixing arch version on ASpeed ast2500
- Two fixes for reset handling on ARM SCMI"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
ARM: aspeed: ast2500 is ARMv6K
reset: reset-scmi: add missing handle initialisation
firmware: arm_scmi: reset: fix reset_state assignment in scmi_domain_reset
bus: ti-sysc: Remove unpaired sysc_clkdm_deny_idle()
ARM: dts: logicpd-som-lv: Fix i2c2 and i2c3 Pin mux
ARM: dts: am3517-evm: Fix missing video
ARM: dts: logicpd-torpedo-baseboard: Fix missing video
ARM: omap2plus_defconfig: Fix missing video
bus: ti-sysc: Fix handling of invalid clocks
bus: ti-sysc: Fix clock handling for no-idle quirks
|
|
Pull more MMC updates from Ulf Hansson:
"A couple more updates/fixes for MMC:
- sdhci-pci: Add Genesys Logic GL975x support
- sdhci-tegra: Recover loss in throughput for DMA
- sdhci-of-esdhc: Fix DMA bug"
* tag 'mmc-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: host: sdhci-pci: Add Genesys Logic GL975x support
mmc: tegra: Implement ->set_dma_mask()
mmc: sdhci: Let drivers define their DMA mask
mmc: sdhci-of-esdhc: set DMA snooping based on DMA coherence
mmc: sdhci: improve ADMA error reporting
|
|
Merge active entropy generation updates.
This is admittedly partly "for discussion". We need to have a way
forward for the boot time deadlocks where user space ends up waiting for
more entropy, but no entropy is forthcoming because the system is
entirely idle just waiting for something to happen.
While this was triggered by what is arguably a user space bug with
GDM/gnome-session asking for secure randomness during early boot, when
they didn't even need any such truly secure thing, the issue ends up
being that our "getrandom()" interface is prone to that kind of
confusion, because people don't think very hard about whether they want
to block for sufficient amounts of entropy.
The approach here-in is to decide to not just passively wait for entropy
to happen, but to start actively collecting it if it is missing. This
is not necessarily always possible, but if the architecture has a CPU
cycle counter, there is a fair amount of noise in the exact timings of
reasonably complex loads.
We may end up tweaking the load and the entropy estimates, but this
should be at least a reasonable starting point.
As part of this, we also revert the revert of the ext4 IO pattern
improvement that ended up triggering the reported lack of external
entropy.
* getrandom() active entropy waiting:
Revert "Revert "ext4: make __ext4_get_inode_loc plug""
random: try to actively add entropy rather than passively wait for it
|
|
For 5.3 we had to revert a nice ext4 IO pattern improvement, because it
caused a bootup regression due to lack of entropy at bootup together
with arguably broken user space that was asking for secure random
numbers when it really didn't need to.
See commit 72dbcf721566 (Revert "ext4: make __ext4_get_inode_loc plug").
This aims to solve the issue by actively generating entropy noise using
the CPU cycle counter when waiting for the random number generator to
initialize. This only works when you have a high-frequency time stamp
counter available, but that's the case on all modern x86 CPU's, and on
most other modern CPU's too.
What we do is to generate jitter entropy from the CPU cycle counter
under a somewhat complex load: calling the scheduler while also
guaranteeing a certain amount of timing noise by also triggering a
timer.
I'm sure we can tweak this, and that people will want to look at other
alternatives, but there's been a number of papers written on jitter
entropy, and this should really be fairly conservative by crediting one
bit of entropy for every timer-induced jump in the cycle counter. Not
because the timer itself would be all that unpredictable, but because
the interaction between the timer and the loop is going to be.
Even if (and perhaps particularly if) the timer actually happens on
another CPU, the cacheline interaction between the loop that reads the
cycle counter and the timer itself firing is going to add perturbations
to the cycle counter values that get mixed into the entropy pool.
As Thomas pointed out, with a modern out-of-order CPU, even quite simple
loops show a fair amount of hard-to-predict timing variability even in
the absense of external interrupts. But this tries to take that further
by actually having a fairly complex interaction.
This is not going to solve the entropy issue for architectures that have
no CPU cycle counter, but it's not clear how (and if) that is solvable,
and the hardware in question is largely starting to be irrelevant. And
by doing this we can at least avoid some of the even more contentious
approaches (like making the entropy waiting time out in order to avoid
the possibly unbounded waiting).
Cc: Ahmed Darwish <darwish.07@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Nicholas Mc Guire <hofrat@opentech.at>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Alexander E. Patrakov <patrakov@gmail.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes
Fixes for omap variants
Few fixes for ti-sysc interconnect target module driver for no-idle
quirks that caused nfsroot to fail on some dra7 boards.
And let's fixes to get LCD working again for logicpd board that got
broken a while back with removal of panel-dpi driver. We need to now
use generic CONFIG_DRM_PANEL_SIMPLE instead.
* tag 'fixes-5.4-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
bus: ti-sysc: Remove unpaired sysc_clkdm_deny_idle()
ARM: dts: logicpd-som-lv: Fix i2c2 and i2c3 Pin mux
ARM: dts: am3517-evm: Fix missing video
ARM: dts: logicpd-torpedo-baseboard: Fix missing video
ARM: omap2plus_defconfig: Fix missing video
bus: ti-sysc: Fix handling of invalid clocks
bus: ti-sysc: Fix clock handling for no-idle quirks
Link: https://lore.kernel.org/r/pull-1568819401-72461@atomide.com
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into arm/fixes
ARM SCMI fixes for v5.4
Couple of fixes: one in scmi reset driver initialising missed scmi handle
and an other in scmi reset API implementation fixing the assignment of
reset state
* tag 'scmi-fixes-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
reset: reset-scmi: add missing handle initialisation
firmware: arm_scmi: reset: fix reset_state assignment in scmi_domain_reset
Link: https://lore.kernel.org/r/20190918142139.GA4370@bogus
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
More libnvdimm updates from Dan Williams:
- Complete the reworks to interoperate with powerpc dynamic huge page
sizes
- Fix a crash due to missed accounting for the powerpc 'struct
page'-memmap mapping granularity
- Fix badblock initialization for volatile (DRAM emulated) pmem ranges
- Stop triggering request_key() notifications to userspace when
NVDIMM-security is disabled / not present
- Miscellaneous small fixups
* tag 'libnvdimm-fixes-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm/region: Enable MAP_SYNC for volatile regions
libnvdimm: prevent nvdimm from requesting key when security is disabled
libnvdimm/region: Initialize bad block for volatile namespaces
libnvdimm/nfit_test: Fix acpi_handle redefinition
libnvdimm/altmap: Track namespace boundaries in altmap
libnvdimm: Fix endian conversion issues
libnvdimm/dax: Pick the right alignment default when creating dax devices
powerpc/book3s64: Export has_transparent_hugepage() related functions.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
Pull thermal SoC updates from Eduardo Valentin:
"This is a really small pull in the midst of a lot of pending patches.
We are in the middle of restructuring how we are maintaining the
thermal subsystem, as per discussion in our last LPC. For now, I am
sending just some changes that were pending in my tree. Looking
forward to get a more streamlined process in the next merge window"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
thermal: db8500: Rewrite to be a pure OF sensor
thermal: db8500: Use dev helper variable
thermal: db8500: Finalize device tree conversion
thermal: thermal_mmio: remove some dead code
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull more i2c updates from Wolfram Sang:
- make Lenovo Yoga C630 boot now that the dependencies are merged
- restore BlockProcessCall for i801, accidently removed in this merge
window
- a bugfix for the riic driver
- an improvement to the slave-eeprom driver which should have been in
the first pull request but sadly got lost in the process
* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: slave-eeprom: Add read only mode
i2c: i801: Bring back Block Process Call support for certain platforms
i2c: riic: Clear NACK in tend isr
i2c: qcom-geni: Disable DMA processing on the Lenovo Yoga C630
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
"A couple of fixes for the AMD IOMMU driver have piled up:
- Some fixes for the reworked IO page-table which caused memory leaks
or did not allow to downgrade mappings under some conditions.
- Locking fixes to fix a couple of possible races around accessing
'struct protection_domain'. The races got introduced when the
dma-ops path became lock-less in the fast-path"
* tag 'iommu-fixes-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Lock code paths traversing protection_domain->dev_list
iommu/amd: Lock dev_data in attach/detach code paths
iommu/amd: Check for busy devices earlier in attach_device()
iommu/amd: Take domain->lock for complete attach/detach path
iommu/amd: Remove amd_iommu_devtable_lock
iommu/amd: Remove domain->updated
iommu/amd: Wait for completion of IOTLB flush in attach_device
iommu/amd: Unmap all L7 PTEs when downgrading page-sizes
iommu/amd: Introduce first_pte_l7() helper
iommu/amd: Fix downgrading default page-sizes in alloc_pte()
iommu/amd: Fix pages leak in free_pagetable()
|
|
Pull networking fixes from David Miller:
1) Sanity check URB networking device parameters to avoid divide by
zero, from Oliver Neukum.
2) Disable global multicast filter in NCSI, otherwise LLDP and IPV6
don't work properly. Longer term this needs a better fix tho. From
Vijay Khemka.
3) Small fixes to selftests (use ping when ping6 is not present, etc.)
from David Ahern.
4) Bring back rt_uses_gateway member of struct rtable, it's semantics
were not well understood and trying to remove it broke things. From
David Ahern.
5) Move usbnet snaity checking, ignore endpoints with invalid
wMaxPacketSize. From Bjørn Mork.
6) Missing Kconfig deps for sja1105 driver, from Mao Wenan.
7) Various small fixes to the mlx5 DR steering code, from Alaa Hleihel,
Alex Vesker, and Yevgeny Kliteynik
8) Missing CAP_NET_RAW checks in various places, from Ori Nimron.
9) Fix crash when removing sch_cbs entry while offloading is enabled,
from Vinicius Costa Gomes.
10) Signedness bug fixes, generally in looking at the result given by
of_get_phy_mode() and friends. From Dan Crapenter.
11) Disable preemption around BPF_PROG_RUN() calls, from Eric Dumazet.
12) Don't create VRF ipv6 rules if ipv6 is disabled, from David Ahern.
13) Fix quantization code in tcp_bbr, from Kevin Yang.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (127 commits)
net: tap: clean up an indentation issue
nfp: abm: fix memory leak in nfp_abm_u32_knode_replace
tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state
sk_buff: drop all skb extensions on free and skb scrubbing
tcp_bbr: fix quantization code to not raise cwnd if not probing bandwidth
mlxsw: spectrum_flower: Fail in case user specifies multiple mirror actions
Documentation: Clarify trap's description
mlxsw: spectrum: Clear VLAN filters during port initialization
net: ena: clean up indentation issue
NFC: st95hf: clean up indentation issue
net: phy: micrel: add Asym Pause workaround for KSZ9021
net: socionext: ave: Avoid using netdev_err() before calling register_netdev()
ptp: correctly disable flags on old ioctls
lib: dimlib: fix help text typos
net: dsa: microchip: Always set regmap stride to 1
nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs
net/sched: Set default of CONFIG_NET_TC_SKB_EXT to N
vrf: Do not attempt to create IPv6 mcast rule if IPv6 is disabled
net: sched: sch_sfb: don't call qdisc_put() while holding tree lock
...
|
|
Add read-only versions of all EEPROMs. These versions are read-only
on the i2c side, but can be written from the sysfs side.
Signed-off-by: Björn Ardö <bjorn.ardo@axis.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
Commit b84398d6d7f9 ("i2c: i801: Use iTCO version 6 in Cannon Lake PCH
and beyond") looks like to drop by accident Block Write-Block Read Process
Call support for Intel Sunrisepoint, Lewisburg, Denverton and Kaby Lake.
That support was added for above and newer platforms by the commit
315cd67c9453 ("i2c: i801: Add Block Write-Block Read Process Call
support") so bring it back for above platforms.
Fixes: b84398d6d7f9 ("i2c: i801: Use iTCO version 6 in Cannon Lake PCH and beyond")
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
The NACKF flag should be cleared in INTRIICNAKI interrupt processing as
description in HW manual.
This issue shows up quickly when PREEMPT_RT is applied and a device is
probed that is not plugged in (like a touchscreen controller). The result
is endless interrupts that halt system boot.
Fixes: 310c18a41450 ("i2c: riic: add driver")
Cc: stable@vger.kernel.org
Reported-by: Chien Nguyen <chien.nguyen.eb@rvc.renesas.com>
Signed-off-by: Chris Brandt <chris.brandt@renesas.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
We have a production-level laptop (Lenovo Yoga C630) which is exhibiting
a rather horrific bug. When I2C HID devices are being scanned for at
boot-time the QCom Geni based I2C (Serial Engine) attempts to use DMA.
When it does, the laptop reboots and the user never sees the OS.
Attempts are being made to debug the reason for the spontaneous reboot.
No luck so far, hence the requirement for this hot-fix. This workaround
will be removed once we have a viable fix.
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Tested-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull kernel lockdown mode from James Morris:
"This is the latest iteration of the kernel lockdown patchset, from
Matthew Garrett, David Howells and others.
From the original description:
This patchset introduces an optional kernel lockdown feature,
intended to strengthen the boundary between UID 0 and the kernel.
When enabled, various pieces of kernel functionality are restricted.
Applications that rely on low-level access to either hardware or the
kernel may cease working as a result - therefore this should not be
enabled without appropriate evaluation beforehand.
The majority of mainstream distributions have been carrying variants
of this patchset for many years now, so there's value in providing a
doesn't meet every distribution requirement, but gets us much closer
to not requiring external patches.
There are two major changes since this was last proposed for mainline:
- Separating lockdown from EFI secure boot. Background discussion is
covered here: https://lwn.net/Articles/751061/
- Implementation as an LSM, with a default stackable lockdown LSM
module. This allows the lockdown feature to be policy-driven,
rather than encoding an implicit policy within the mechanism.
The new locked_down LSM hook is provided to allow LSMs to make a
policy decision around whether kernel functionality that would allow
tampering with or examining the runtime state of the kernel should be
permitted.
The included lockdown LSM provides an implementation with a simple
policy intended for general purpose use. This policy provides a coarse
level of granularity, controllable via the kernel command line:
lockdown={integrity|confidentiality}
Enable the kernel lockdown feature. If set to integrity, kernel features
that allow userland to modify the running kernel are disabled. If set to
confidentiality, kernel features that allow userland to extract
confidential information from the kernel are also disabled.
This may also be controlled via /sys/kernel/security/lockdown and
overriden by kernel configuration.
New or existing LSMs may implement finer-grained controls of the
lockdown features. Refer to the lockdown_reason documentation in
include/linux/security.h for details.
The lockdown feature has had signficant design feedback and review
across many subsystems. This code has been in linux-next for some
weeks, with a few fixes applied along the way.
Stephen Rothwell noted that commit 9d1f8be5cf42 ("bpf: Restrict bpf
when kernel lockdown is in confidentiality mode") is missing a
Signed-off-by from its author. Matthew responded that he is providing
this under category (c) of the DCO"
* 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits)
kexec: Fix file verification on S390
security: constify some arrays in lockdown LSM
lockdown: Print current->comm in restriction messages
efi: Restrict efivar_ssdt_load when the kernel is locked down
tracefs: Restrict tracefs when the kernel is locked down
debugfs: Restrict debugfs when the kernel is locked down
kexec: Allow kexec_file() with appropriate IMA policy when locked down
lockdown: Lock down perf when in confidentiality mode
bpf: Restrict bpf when kernel lockdown is in confidentiality mode
lockdown: Lock down tracing and perf kprobes when in confidentiality mode
lockdown: Lock down /proc/kcore
x86/mmiotrace: Lock down the testmmiotrace module
lockdown: Lock down module params that specify hardware parameters (eg. ioport)
lockdown: Lock down TIOCSSERIAL
lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down
acpi: Disable ACPI table override if the kernel is locked down
acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
ACPI: Limit access to custom_method when the kernel is locked down
x86/msr: Restrict MSR access when the kernel is locked down
x86: Lock down IO port access when the kernel is locked down
...
|
|
The traversing of this list requires protection_domain->lock to be taken
to avoid nasty races with attach/detach code. Make sure the lock is held
on all code-paths traversing this list.
Reported-by: Filippo Sironi <sironi@amazon.de>
Fixes: 92d420ec028d ("iommu/amd: Relax locking in dma_ops path")
Reviewed-by: Filippo Sironi <sironi@amazon.de>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Make sure that attaching a detaching a device can't race against each
other and protect the iommu_dev_data with a spin_lock in these code
paths.
Fixes: 92d420ec028d ("iommu/amd: Relax locking in dma_ops path")
Reviewed-by: Filippo Sironi <sironi@amazon.de>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Check early in attach_device whether the device is already attached to a
domain. This also simplifies the code path so that __attach_device() can
be removed.
Fixes: 92d420ec028d ("iommu/amd: Relax locking in dma_ops path")
Reviewed-by: Filippo Sironi <sironi@amazon.de>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The code-paths before __attach_device() and __detach_device() are called
also access and modify domain state, so take the domain lock there too.
This allows to get rid of the __detach_device() function.
Fixes: 92d420ec028d ("iommu/amd: Relax locking in dma_ops path")
Reviewed-by: Filippo Sironi <sironi@amazon.de>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|