Age | Commit message (Collapse) | Author |
|
TX ring QP wasn't released at mlx4_en_destroy_tx_ring. Instead, the code
used the deprecated base_tx_qpn field. Move TX QP release to
mlx4_en_destroy_tx_ring and remove the base_tx_qpn field.
Fixes: ddae0349fdb7 ('net/mlx4: Change QP allocation scheme')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As part of unmap sequence, kernel client has to talk to the OSDs to
teardown watch on the header object. If none of the OSDs are available
it would hang forever, until interrupted by a signal - when that
happens we follow through with the rest of unmap procedure (i.e.
unregister the device and put all the data structures) and the unmap is
still considired successful (rbd cli tool exits with 0). The watch on
the userspace side should eventually timeout so that's fine.
This isn't very nice, because various userspace tools (pacemaker rbd
resource agent, for example) then have to worry about setting up their
own timeouts. Timeout it with mount_timeout (60 seconds by default).
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Sage Weil <sage@redhat.com>
|
|
There are currently three libceph-level timeouts that the user can
specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive. All of
these are in seconds and no checking is done on user input: negative
values are accepted, we multiply them all by HZ which may or may not
overflow, arbitrarily large jiffies then get added together, etc.
There is also a bug in the way mount_timeout=0 is handled. It's
supposed to mean "infinite timeout", but that's not how wait.h APIs
treat it and so __ceph_open_session() for example will busy loop
without much chance of being interrupted if none of ceph-mons are
there.
Fix all this by verifying user input, storing timeouts capped by
msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies()
helper for all user-specified waits to handle infinite timeouts
correctly.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
|
|
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
|
|
NVDIMM namespaces, in addition to accepting "struct bio" based requests,
also have the capability to perform byte-aligned accesses. By default
only the bio/block interface is used. However, if another driver can
make effective use of the byte-aligned capability it can claim namespace
interface and use the byte-aligned ->rw_bytes() interface.
The BTT driver is the initial first consumer of this mechanism to allow
adding atomic sector update semantics to a pmem or blk namespace. This
patch is the sysfs infrastructure to allow configuring a BTT instance
for a namespace. Enabling that BTT and performing i/o is in a
subsequent patch.
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
It turned out that SIGP set-multi-threading can only be done once.
Therefore switching to a different MT level after switching to
sclp.mtid_prev in the dump case fails.
As a symptom specifying the "nosmt" parameter currently fails for
the kdump kernel and the kernel starts with multi-threading enabled.
So fix this and issue diag 308 subcode 1 call after collecting the
CPU states for the dump. Also enhance the diag308_reset() function to
be usable also with enabled lowcore protection and prefix register != 0.
After the reset it is possible to switch the MT level again. We have
to do the reset very early in order not to kill the already initialized
console. Therefore instead of kmalloc() the corresponding memblock
functions have to be used. To avoid copying the sclp cpu code into
sclp_early, we now use the simple sigp loop method for CPU detection.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
The SCLP interface to query, configure and deconfigure CPUs actually
operates on cores. For a machine without the multi-threading faciltiy
a CPU and a core are equivalent but starting with System z13 a core
can have multiple hardware threads, also referred to as logical CPUs.
To avoid confusion replace the word 'cpu' with 'core' in the SCLP
interface. Also replace MAX_CPU_ADDRESS with SCLP_MAX_CORES.
The core-id is an 8-bit field, the maximum thread id is in the range
0-31. The theoretical limit for the CPU address is therefore 8191.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
In case of request timeouts an AP queue reset will be triggered to
recover and reinitialize the AP queue. The previous behavior was an
immediate reset execution regardless of current/pending requests.
Due to newly changed firmware behavior the reset may be delayed, based
on the priority of pending request. The device driver's waiting time
frame was limited, hence it did not received the reset response. As a
consequence interrupts would not be enabled afterwards.
The RAPQ (queue reset) and AQIC (interrupt control) commands will be
treated fully asynchronous now. The device driver will check the reset and
interrupt states periodically, thus it can handle the reinitialization
properly.
Signed-off-by: Ingo Tuchscherer <ingo.tuchscherer@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
The Blocked flag indicates that a device has failed but that this
fact hasn't been recorded in the metadata yet. Writes to such
devices cannot be allowed until the metadata has been updated.
On a read-only array, the Blocked flag will never be cleared.
This prevents the device being removed from the array.
If the metadata is being handled by the kernel
(i.e. !mddev->external), then we can be sure that if the array is
switch to writable, then a metadata update will happen and will
record the failure. So we don't need the flag set.
If metadata is externally managed, it is upto the external manager
to clear the 'blocked' flag.
Reported-by: XiaoNi <xni@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
This error path retuns while still holding the lock - bad.
Fixes: 6791875e2e53 ("md: make reconfig_mutex optional for writes to md sysfs files.")
Cc: stable@vger.kernel.org (v4.0+)
Signed-off-by: NeilBrown <neilb@suse.com>
|
|
If ->private is set when ->run is called, it is assumed to be
a 'config' prepared as part of 'reshape'.
So it is important when we free that config, that we also clear ->private.
This is not often a problem as the mddev will normally be discarded
shortly after the config us freed.
However if an 'assemble' races with a final close, the assemble can use
the old mddev which has a stale ->private. This leads to any of
various sorts of crashes.
So clear ->private after calling ->free().
Reported-by: Nate Clark <nate@neworld.us>
Cc: stable@vger.kernel.org (v4.0+)
Fixes: afa0f557cb15 ("md: rename ->stop to ->free")
Signed-off-by: NeilBrown <neilb@suse.com>
|
|
Signed-off-by: Jean Delvare <jdelvare@suse.de>
|
|
Some utils, like dmidecode and smbios, need to access SMBIOS entry
table area in order to get information like SMBIOS version, size, etc.
Currently it's done via /dev/mem. But for situation when /dev/mem
usage is disabled, the utils have to use dmi sysfs instead, which
doesn't represent SMBIOS entry and adds code/delay redundancy when direct
access for table is needed.
So this patch creates dmi/tables and adds SMBIOS entry point to allow
utils in question to work correctly without /dev/mem. Also patch adds
raw dmi table to simplify dmi table processing in user space, as
proposed by Jean Delvare.
Tested-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@globallogic.com>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
|
|
The SMBIOS v3 entry points specify a maximum length for the DMI table,
not the exact length. Thus there may be garbage after the end-of-table
marker, which we don't want to export to user-space. Adjust dmi_len
when we find the end-of-table marker, so that only the actual table
payload is exported.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Ivan Khoronzhuk <ivan.khoronzhuk@globallogic.com>
|
|
The "dmi_table" function looks like data instance, but it does DMI
table decode. This patch renames it to "dmi_decode_table" name as
more appropriate. That allows us to use "dmi_table" name for correct
purposes.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@globallogic.com>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
|
|
A 32-bit entry point to a DMI table says how many structures the table
contains. The SMBIOS specification explicitly says that end-of-table
markers should be ignored if they are not actually at the end of the
DMI table. So only honor the end-of-table marker for tables accessed
through 64-bit entry points, as they do not specify a structure count.
Fixes: fc43026278 ("dmi: add support for SMBIOS 3.0 64-bit entry point")
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Matt Fleming <matt.fleming@intel.com>
|
|
ARM: SoC: driver updates for v4.2
Some of these are for drivers/soc, where we're now putting
SoC-specific drivers these days. Some are for other driver subsystems
where we have received acks from the appropriate maintainers.
Some highlights:
- simple-mfd: document DT bindings and misc updates
- migrate mach-berlin to simple-mfd for clock, pinctrl and reset
- memory: support for Tegra132 SoC
- memory: introduce tegra EMC driver for scaling memory frequency
- misc. updates for ARM CCI and CCN busses
Conflicts:
arch/arm64/boot/dts/arm/juno-motherboard.dtsi
Trivial add/add conflict with our dt branch.
Resolution: take both sides.
# gpg: Signature made Wed Jun 24 21:32:17 2015 PDT using RSA key ID D3FBC665
# gpg: Good signature from "Kevin Hilman <khilman@deeprootsystems.com>"
# gpg: aka "Kevin Hilman <khilman@linaro.org>"
# gpg: aka "Kevin Hilman <khilman@kernel.org>"
# Conflicts:
# arch/arm64/boot/dts/arm/juno-motherboard.dtsi
|
|
ARM: SoC: platform support for v4.2
Our SoC branch usually contains expanded support for new SoCs and
other core platform code. Some highlights from this round:
- sunxi: SMP support for A23 SoC
- socpga: big-endian support
- pxa: conversion to common clock framework
- bcm: SMP support for BCM63138
- imx: support new I.MX7D SoC
- zte: basic support for ZX296702 SoC
Conflicts:
arch/arm/mach-socfpga/core.h
Trivial remove/remove conflict with our cleanup branch.
Resolution: remove both sides
# gpg: Signature made Wed Jun 24 21:32:12 2015 PDT using RSA key ID D3FBC665
# gpg: Good signature from "Kevin Hilman <khilman@deeprootsystems.com>"
# gpg: aka "Kevin Hilman <khilman@linaro.org>"
# gpg: aka "Kevin Hilman <khilman@kernel.org>"
# Conflicts:
# arch/arm/mach-socfpga/core.h
|
|
ARM: SoC cleanups for v4.2
A relatively small setup of cleanups this time around, and similar to last time
the bulk of it is removal of legacy board support:
- OMAP: removal of legacy (non-DT) booting for several platforms
- i.MX: remove some legacy board files
Conflicts: None
# gpg: Signature made Wed Jun 24 21:32:09 2015 PDT using RSA key ID D3FBC665
# gpg: Good signature from "Kevin Hilman <khilman@deeprootsystems.com>"
# gpg: aka "Kevin Hilman <khilman@linaro.org>"
# gpg: aka "Kevin Hilman <khilman@kernel.org>"
|
|
drivers/dma/xgene-dma.c has file permissions 775, which is wrong, it should
be 664, so fix it
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
|
|
Clear pending interrupts before requesting interrupts and move
interrupt initialization after channels have been initialized.
This avoids a NULL pointer dereference panic when using kexec
while DMA requests were running.
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
|
|
The XDMAC supports memset transfers, both over contiguous areas, and over
discontiguous areas through a LLI.
The current memset operation only supports contiguous memset for now, add some
support for it. Scatter-gathered memset will come eventually.
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
|
|
|
|
|
|
|
|
|
|
Merge first patchbomb from Andrew Morton:
- a few misc things
- ocfs2 udpates
- kernel/watchdog.c feature work (took ages to get right)
- most of MM. A few tricky bits are held up and probably won't make 4.2.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (91 commits)
mm: kmemleak_alloc_percpu() should follow the gfp from per_alloc()
mm, thp: respect MPOL_PREFERRED policy with non-local node
tmpfs: truncate prealloc blocks past i_size
mm/memory hotplug: print the last vmemmap region at the end of hot add memory
mm/mmap.c: optimization of do_mmap_pgoff function
mm: kmemleak: optimise kmemleak_lock acquiring during kmemleak_scan
mm: kmemleak: avoid deadlock on the kmemleak object insertion error path
mm: kmemleak: do not acquire scan_mutex in kmemleak_do_cleanup()
mm: kmemleak: fix delete_object_*() race when called on the same memory block
mm: kmemleak: allow safe memory scanning during kmemleak disabling
memcg: convert mem_cgroup->under_oom from atomic_t to int
memcg: remove unused mem_cgroup->oom_wakeups
frontswap: allow multiple backends
x86, mirror: x86 enabling - find mirrored memory ranges
mm/memblock: allocate boot time data structures from mirrored memory
mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute
mm: do not ignore mapping_gfp_mask in page cache allocation paths
mm/cma.c: fix typos in comments
mm/oom_kill.c: print points as unsigned int
mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages
...
|
|
This commit fix kernel crash when probing for rfkill devices in dell-laptop
driver failed. Function free_page() was incorrectly used on struct page *
instead of virtual address of SMI buffer.
This commit also simplify allocating page for SMI buffer by using
__get_free_page() function instead of sequential call of functions
alloc_page() and page_address().
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input subsystem updates from Dmitry Torokhov:
"Thanks to Samuel Thibault input device (keyboard) LEDs are no longer
hardwired within the input core but use LED subsystem and so allow use
of different triggers; Hans de Goede did a large update for the ALPS
touchpad driver; we have new TI drv2665 haptics driver and DA9063
OnKey driver, and host of other drivers got various fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (55 commits)
Input: pixcir_i2c_ts - fix receive error
MAINTAINERS: remove non existent input mt git tree
Input: improve usage of gpiod API
tty/vt/keyboard: define LED triggers for VT keyboard lock states
tty/vt/keyboard: define LED triggers for VT LED states
Input: export LEDs as class devices in sysfs
Input: cyttsp4 - use swap() in cyttsp4_get_touch()
Input: goodix - do not explicitly set evbits in input device
Input: goodix - export id and version read from device
Input: goodix - fix variable length array warning
Input: goodix - fix alignment issues
Input: add OnKey driver for DA9063 MFD part
Input: elan_i2c - add product IDs FW names
Input: elan_i2c - add support for multi IC type and iap format
Input: focaltech - report finger width to userspace
tty: remove platform_sysrq_reset_seq
Input: synaptics_i2c - use proper boolean values
Input: psmouse - use true instead of 1 for boolean values
Input: cyapa - fix a few typos in comments
Input: stmpe-ts - enforce device tree only mode
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
- New APM X-Gene SoC EDAC driver (Loc Ho)
- AMD error injection module improvements (Aravind Gopalakrishnan)
- Altera Arria 10 support (Thor Thayer)
- misc fixes and cleanups all over the place
* tag 'edac_for_4.2_2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (28 commits)
EDAC: Update Documentation/edac.txt
EDAC: Fix typos in Documentation/edac.txt
EDAC, mce_amd_inj: Set MISCV on injection
EDAC, mce_amd_inj: Move bit preparations before the injection
EDAC, mce_amd_inj: Cleanup and simplify README
EDAC, altera: Do not allow suspend when EDAC is enabled
EDAC, mce_amd_inj: Make inj_type static
arm: socfpga: dts: Add Arria10 SDRAM EDAC DTS support
EDAC, altera: Add Arria10 EDAC support
EDAC, altera: Refactor for Altera CycloneV SoC
EDAC, altera: Generalize driver to use DT Memory size
EDAC, mce_amd_inj: Add README file
EDAC, mce_amd_inj: Add individual permissions field to dfs_node
EDAC, mce_amd_inj: Modify flags attribute to use string arguments
EDAC, mce_amd_inj: Read out number of MCE banks from the hardware
EDAC, mce_amd_inj: Use MCE_INJECT_GET macro for bank node too
EDAC, xgene: Fix cpuid abuse
EDAC, mpc85xx: Extend error address to 64 bit
EDAC, mpc8xxx: Adapt for FSL SoC
EDAC, edac_stub: Drop arch-specific include
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control updates from Linus Walleij:
"Here is the bulk of pin control changes for the v4.2 series: Quite a
lot of new SoC subdrivers and two new main drivers this time, apart
from that business as usual.
Details:
Core functionality:
- Enable exclusive pin ownership: it is possible to flag a pin
controller so that GPIO and other functions cannot use a single pin
simultaneously.
New drivers:
- NXP LPC18xx System Control Unit pin controller
- Imagination Pistachio SoC pin controller
New subdrivers:
- Freescale i.MX7d SoC
- Intel Sunrisepoint-H PCH
- Renesas PFC R8A7793
- Renesas PFC R8A7794
- Mediatek MT6397, MT8127
- SiRF Atlas 7
- Allwinner A33
- Qualcomm MSM8660
- Marvell Armada 395
- Rockchip RK3368
Cleanups:
- A big cleanup of the Marvell MVEBU driver rectifying it to
correspond to reality
- Drop platform device probing from the SH PFC driver, we are now a
DT only shop for SuperH
- Drop obsolte multi-platform check for SH PFC
- Various janitorial: constification, grammar etc
Improvements:
- The AT91 GPIO portions now supports the set_multiple() feature
- Split out SPI pins on the Xilinx Zynq
- Support DTs without specific function nodes in the i.MX driver"
* tag 'pinctrl-v4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (99 commits)
pinctrl: rockchip: add support for the rk3368
pinctrl: rockchip: generalize perpin driver-strength setting
pinctrl: sh-pfc: r8a7794: add SDHI pin groups
pinctrl: sh-pfc: r8a7794: add MMCIF pin groups
pinctrl: sh-pfc: add R8A7794 PFC support
pinctrl: make pinctrl_register() return proper error code
pinctrl: mvebu: armada-39x: add support for Armada 395 variant
pinctrl: mvebu: armada-39x: add missing SATA functions
pinctrl: mvebu: armada-39x: add missing PCIe functions
pinctrl: mvebu: armada-38x: add ptp functions
pinctrl: mvebu: armada-38x: add ua1 functions
pinctrl: mvebu: armada-38x: add nand functions
pinctrl: mvebu: armada-38x: add sata functions
pinctrl: mvebu: armada-xp: add dram functions
pinctrl: mvebu: armada-xp: add nand rb function
pinctrl: mvebu: armada-xp: add spi1 function
pinctrl: mvebu: armada-39x: normalize ref clock naming
pinctrl: mvebu: armada-xp: rename spi to spi0
pinctrl: mvebu: armada-370: align spi1 clock pin naming
pinctrl: mvebu: armada-370: align VDD cpu-pd pin naming with datasheet
...
|
|
I've only seen this once, and I failed to capture the
lockdep backtrace, but I did some investigations.
If we are calling into the MST layer from EDID probing,
we have the mode_config mutex held, if during that EDID
probing, the MST hub goes away, then we can get a deadlock
where the connector destruction function in the driver
tries to retake the mode config mutex.
This offloads connector destruction to a workqueue,
and avoid the subsequenct lock ordering issue.
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight
Pull backlight updates from Lee Jones:
"Changes to existing drivers:
- supply MODULE_DEVICE_TABLE() to ensure probing
- constify struct; da9052_bl
- enable compile test; lcd_l4f00242t03, lcd_lms283fg05, backlight_gpio
- suspend/resume bugfix; lp855x_bl
- devm_gpiod_get_optional() API fixup; pwm_bl
- error handling fixup; backlight"
* tag 'backlight-for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
backlight: Change the return type of backlight_update_status() to int
backlight: pwm_bl: Simplify usage of devm_gpiod_get_optional
backlight: lp855x: Don't clear level on suspend/blank
backlight: Allow compile test of GPIO consumers if !GPIOLIB
video: backlight: da9052: Constify platform_device_id
gpio-backlight: Discover driver during boot time
|
|
After 'uuid', 'size', 'sector_size', and optionally 'alt_name' have been
set to valid values the labels on the dimm can be updated. The
difference with the pmem case is that blk namespaces are limited to one
dimm and can cover discontiguous ranges in dpa space.
Also, after allocating label slots, it is useful for userspace to know
how many slots are left. Export this information in sysfs.
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Neil Brown <neilb@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
After 'uuid', 'size', and optionally 'alt_name' have been set to valid
values the labels on the dimms can be updated.
Write procedure is:
1/ Allocate and write new labels in the "next" index
2/ Free the old labels in the working copy
3/ Write the bitmap and the label space on the dimm
4/ Write the index to make the update valid
Label ranges directly mirror the dpa resource values for the given
label_id of the namespace.
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Neil Brown <neilb@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
A blk label set describes a namespace comprised of one or more
discontiguous dpa ranges on a single dimm. They may alias with one or
more pmem interleave sets that include the given dimm.
This is the runtime/volatile configuration infrastructure for sysfs
manipulation of 'alt_name', 'uuid', 'size', and 'sector_size'. A later
patch will make these settings persistent by writing back the label(s).
Unlike pmem namespaces, multiple blk namespaces can be created per
region. Once a blk namespace has been created a new seed device
(unconfigured child of a parent blk region) is instantiated. As long as
a region has 'available_size' != 0 new child namespaces may be created.
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Neil Brown <neilb@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
A complete label set is a PMEM-label per-dimm per-interleave-set where
all the UUIDs match and the interleave set cookie matches the hosting
interleave set.
Present sysfs attributes for manipulation of a PMEM-namespace's
'alt_name', 'uuid', and 'size' attributes. A later patch will make
these settings persistent by writing back the label.
Note that PMEM allocations grow forwards from the start of an interleave
set (lowest dimm-physical-address (DPA)). BLK-namespaces that alias
with a PMEM interleave set will grow allocations backward from the
highest DPA.
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Neil Brown <neilb@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
This on media label format [1] consists of two index blocks followed by
an array of labels. None of these structures are ever updated in place.
A sequence number tracks the current active index and the next one to
write, while labels are written to free slots.
+------------+
| |
| nsindex0 |
| |
+------------+
| |
| nsindex1 |
| |
+------------+
| label0 |
+------------+
| label1 |
+------------+
| |
....nslot...
| |
+------------+
| labelN |
+------------+
After reading valid labels, store the dpa ranges they claim into
per-dimm resource trees.
[1]: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
Cc: Neil Brown <neilb@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
On platforms that have firmware support for reading/writing per-dimm
label space, a portion of the dimm may be accessible via an interleave
set PMEM mapping in addition to the dimm's BLK (block-data-window
aperture(s)) interface. A label, stored in a "configuration data
region" on the dimm, disambiguates which dimm addresses are accessed
through which exclusive interface.
Add infrastructure that allows the kernel to block modifications to a
label in the set while any member dimm is active. Note that this is
meant only for enforcing "no modifications of active labels" via the
coarse ioctl command. Adding/deleting namespaces from an active
interleave set is always possible via sysfs.
Another aspect of tracking interleave sets is tracking their integrity
when DIMMs in a set are physically re-ordered. For this purpose we
generate an "interleave-set cookie" that can be recorded in a label and
validated against the current configuration. It is the bus provider
implementation's responsibility to calculate the interleave set cookie
and attach it to a given region.
Cc: Neil Brown <neilb@suse.de>
Cc: <linux-acpi@vger.kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
nd_pmem attaches to persistent memory regions and namespaces emitted by
the libnvdimm subsystem, and, same as the original pmem driver, presents
the system-physical-address range as a block device.
The existing e820-type-12 to pmem setup is converted to an nvdimm_bus
that emits an nd_namespace_io device.
Note that the X in 'pmemX' is now derived from the parent region. This
provides some stability to the pmem devices names from boot-to-boot.
The minor numbers are also more predictable by passing 0 to
alloc_disk().
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Prepare the pmem driver to consume PMEM namespaces emitted by regions of
an nvdimm_bus instance. No functional change.
Acked-by: Christoph Hellwig <hch@lst.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
The libnvdimm region driver is an intermediary driver that translates
non-volatile "region"s into "namespace" sub-devices that are surfaced by
persistent memory block-device drivers (PMEM and BLK).
ACPI 6 introduces the concept that a given nvdimm may simultaneously
offer multiple access modes to its media through direct PMEM load/store
access, or windowed BLK mode. Existing nvdimms mostly implement a PMEM
interface, some offer a BLK-like mode, but never both as ACPI 6 defines.
If an nvdimm is single interfaced, then there is no need for dimm
metadata labels. For these devices we can take the region boundaries
directly to create a child namespace device (nd_namespace_io).
Acked-by: Christoph Hellwig <hch@lst.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
A "region" device represents the maximum capacity of a BLK range (mmio
block-data-window(s)), or a PMEM range (DAX-capable persistent memory or
volatile memory), without regard for aliasing. Aliasing, in the
dimm-local address space (DPA), is resolved by metadata on a dimm to
designate which exclusive interface will access the aliased DPA ranges.
Support for the per-dimm metadata/label arrvies is in a subsequent
patch.
The name format of "region" devices is "regionN" where, like dimms, N is
a global ida index assigned at discovery time. This id is not reliable
across reboots nor in the presence of hotplug. Look to attributes of
the region or static id-data of the sub-namespace to generate a
persistent name. However, if the platform configuration does not change
it is reasonable to expect the same region id to be assigned at the next
boot.
"region"s have 2 generic attributes "size", and "mapping"s where:
- size: the BLK accessible capacity or the span of the
system physical address range in the case of PMEM.
- mappingN: a tuple describing a dimm's contribution to the region's
capacity in the format (<nmemX>,<dpa>,<size>). For a PMEM-region
there will be at least one mapping per dimm in the interleave set. For
a BLK-region there is only "mapping0" listing the starting DPA of the
BLK-region and the available DPA capacity of that space (matches "size"
above).
The max number of mappings per "region" is hard coded per the
constraints of sysfs attribute groups. That said the number of mappings
per region should never exceed the maximum number of possible dimms in
the system. If the current number turns out to not be enough then the
"mappings" attribute clarifies how many there are supposed to be. "32
should be enough for anybody...".
Cc: Neil Brown <neilb@suse.de>
Cc: <linux-acpi@vger.kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
* Implement the device-model infrastructure for loading modules and
attaching drivers to nvdimm devices. This is a simple association of a
nd-device-type number with a driver that has a bitmask of supported
device types. To facilitate userspace bind/unbind operations 'modalias'
and 'devtype', that also appear in the uevent, are added as generic
sysfs attributes for all nvdimm devices. The reason for the device-type
number is to support sub-types within a given parent devtype, be it a
vendor-specific sub-type or otherwise.
* The first consumer of this infrastructure is the driver
for dimm devices. It simply uses control messages to retrieve and
store the configuration-data image (label set) from each dimm.
Note: nd_device_register() arranges for asynchronous registration of
nvdimm bus devices by default.
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Neil Brown <neilb@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Most discovery/configuration of the nvdimm-subsystem is done via sysfs
attributes. However, some nvdimm_bus instances, particularly the
ACPI.NFIT bus, define a small set of messages that can be passed to the
platform. For convenience we derive the initial libnvdimm-ioctl command
formats directly from the NFIT DSM Interface Example formats.
ND_CMD_SMART: media health and diagnostics
ND_CMD_GET_CONFIG_SIZE: size of the label space
ND_CMD_GET_CONFIG_DATA: read label space
ND_CMD_SET_CONFIG_DATA: write label space
ND_CMD_VENDOR: vendor-specific command passthrough
ND_CMD_ARS_CAP: report address-range-scrubbing capabilities
ND_CMD_ARS_START: initiate scrubbing
ND_CMD_ARS_STATUS: report on scrubbing state
ND_CMD_SMART_THRESHOLD: configure alarm thresholds for smart events
If a platform later defines different commands than this set it is
straightforward to extend support to those formats.
Most of the commands target a specific dimm. However, the
address-range-scrubbing commands target the bus. The 'commands'
attribute in sysfs of an nvdimm_bus, or nvdimm, enumerate the supported
commands for that object.
Cc: <linux-acpi@vger.kernel.org>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reported-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Enable nvdimm devices to be registered on a nvdimm_bus. The kernel
assigned device id for nvdimm devicesis dynamic. If userspace needs a
more static identifier it should consult a provider-specific attribute.
In the case where NFIT is the provider, the 'nmemX/nfit/handle' or
'nmemX/nfit/serial' attributes may be used for this purpose.
Cc: Neil Brown <neilb@suse.de>
Cc: <linux-acpi@vger.kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
The control device for a nvdimm_bus is registered as an "nd" class
device. The expectation is that there will usually only be one "nd" bus
registered under /sys/class/nd. However, we allow for the possibility
of multiple buses and they will listed in discovery order as
ndctl0...ndctlN. This character device hosts the ioctl for passing
control messages. The initial command set has a 1:1 correlation with
the commands listed in the by the "NFIT DSM Example" document [1], but
this scheme is extensible to future command sets.
Note, nd_ioctl() and the backing ->ndctl() implementation are defined in
a subsequent patch. This is simply the initial registrations and sysfs
attributes.
[1]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
Cc: Neil Brown <neilb@suse.de>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: <linux-acpi@vger.kernel.org>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
A struct nvdimm_bus is the anchor device for registering nvdimm
resources and interfaces, for example, a character control device,
nvdimm devices, and I/O region devices. The ACPI NFIT (NVDIMM Firmware
Interface Table) is one possible platform description for such
non-volatile memory resources in a system. The nfit.ko driver attaches
to the "ACPI0012" device that indicates the presence of the NFIT and
parses the table to register a struct nvdimm_bus instance.
Cc: <linux-acpi@vger.kernel.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Change frontswap single pointer to a singly linked list of frontswap
implementations. Update Xen tmem implementation as register no longer
returns anything.
Frontswap only keeps track of a single implementation; any
implementation that registers second (or later) will replace the
previously registered implementation, and gets a pointer to the previous
implementation that the new implementation is expected to pass all
frontswap functions to if it can't handle the function itself. However
that method doesn't really make much sense, as passing that work on to
every implementation adds unnecessary work to implementations; instead,
frontswap should simply keep a list of all registered implementations
and try each implementation for any function. Most importantly, neither
of the two currently existing frontswap implementations in the kernel
actually do anything with any previous frontswap implementation that
they replace when registering.
This allows frontswap to successfully manage multiple implementations by
keeping a list of them all.
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The zonelist locking and the oom_sem are two overlapping locks that are
used to serialize global OOM killing against different things.
The historical zonelist locking serializes OOM kills from allocations with
overlapping zonelists against each other to prevent killing more tasks
than necessary in the same memory domain. Only when neither tasklists nor
zonelists from two concurrent OOM kills overlap (tasks in separate memcgs
bound to separate nodes) are OOM kills allowed to execute in parallel.
The younger oom_sem is a read-write lock to serialize OOM killing against
the PM code trying to disable the OOM killer altogether.
However, the OOM killer is a fairly cold error path, there is really no
reason to optimize for highly performant and concurrent OOM kills. And
the oom_sem is just flat-out redundant.
Replace both locking schemes with a single global mutex serializing OOM
kills regardless of context.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|