linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2015-06-25	net/mlx4_en: Release TX QP when destroying TX ring	Eran Ben Elisha
	TX ring QP wasn't released at mlx4_en_destroy_tx_ring. Instead, the code used the deprecated base_tx_qpn field. Move TX QP release to mlx4_en_destroy_tx_ring and remove the base_tx_qpn field. Fixes: ddae0349fdb7 ('net/mlx4: Change QP allocation scheme') Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25	rbd: timeout watch teardown on unmap with mount_timeout	Ilya Dryomov
	As part of unmap sequence, kernel client has to talk to the OSDs to teardown watch on the header object. If none of the OSDs are available it would hang forever, until interrupted by a signal - when that happens we follow through with the rest of unmap procedure (i.e. unregister the device and put all the data structures) and the unmap is still considired successful (rbd cli tool exits with 0). The watch on the userspace side should eventually timeout so that's fine. This isn't very nice, because various userspace tools (pacemaker rbd resource agent, for example) then have to worry about setting up their own timeouts. Timeout it with mount_timeout (60 seconds by default). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: Sage Weil <sage@redhat.com>
2015-06-25	libceph: store timeouts in jiffies, verify user input	Ilya Dryomov
	There are currently three libceph-level timeouts that the user can specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive. All of these are in seconds and no checking is done on user input: negative values are accepted, we multiply them all by HZ which may or may not overflow, arbitrarily large jiffies then get added together, etc. There is also a bug in the way mount_timeout=0 is handled. It's supposed to mean "infinite timeout", but that's not how wait.h APIs treat it and so __ceph_open_session() for example will busy loop without much chance of being interrupted if none of ceph-mons are there. Fix all this by verifying user input, storing timeouts capped by msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies() helper for all user-specified waits to handle infinite timeouts correctly. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2015-06-25	libceph: allow setting osd_req_op's flags	Yan, Zheng
	Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2015-06-25	libnvdimm: infrastructure for btt devices	Dan Williams
	NVDIMM namespaces, in addition to accepting "struct bio" based requests, also have the capability to perform byte-aligned accesses. By default only the bio/block interface is used. However, if another driver can make effective use of the byte-aligned capability it can claim namespace interface and use the byte-aligned ->rw_bytes() interface. The BTT driver is the initial first consumer of this mechanism to allow adding atomic sector update semantics to a pmem or blk namespace. This patch is the sysfs infrastructure to allow configuring a BTT instance for a namespace. Enabling that BTT and performing i/o is in a subsequent patch. Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-25	s390/kdump: fix nosmt kernel parameter	Michael Holzheu
	It turned out that SIGP set-multi-threading can only be done once. Therefore switching to a different MT level after switching to sclp.mtid_prev in the dump case fails. As a symptom specifying the "nosmt" parameter currently fails for the kdump kernel and the kernel starts with multi-threading enabled. So fix this and issue diag 308 subcode 1 call after collecting the CPU states for the dump. Also enhance the diag308_reset() function to be usable also with enabled lowcore protection and prefix register != 0. After the reset it is possible to switch the MT level again. We have to do the reset very early in order not to kill the already initialized console. Therefore instead of kmalloc() the corresponding memblock functions have to be used. To avoid copying the sclp cpu code into sclp_early, we now use the simple sigp loop method for CPU detection. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-06-25	s390/smp: cleanup core vs. cpu in the SCLP interface	Martin Schwidefsky
	The SCLP interface to query, configure and deconfigure CPUs actually operates on cores. For a machine without the multi-threading faciltiy a CPU and a core are equivalent but starting with System z13 a core can have multiple hardware threads, also referred to as logical CPUs. To avoid confusion replace the word 'cpu' with 'core' in the SCLP interface. Also replace MAX_CPU_ADDRESS with SCLP_MAX_CORES. The core-id is an 8-bit field, the maximum thread id is in the range 0-31. The theoretical limit for the CPU address is therefore 8191. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-06-25	s390/zcrypt: Fixed reset and interrupt handling of AP queues	Ingo Tuchscherer
	In case of request timeouts an AP queue reset will be triggered to recover and reinitialize the AP queue. The previous behavior was an immediate reset execution regardless of current/pending requests. Due to newly changed firmware behavior the reset may be delayed, based on the priority of pending request. The device driver's waiting time frame was limited, hence it did not received the reset response. As a consequence interrupts would not be enabled afterwards. The RAPQ (queue reset) and AQIC (interrupt control) commands will be treated fully asynchronous now. The device driver will check the reset and interrupt states periodically, thus it can handle the reinitialization properly. Signed-off-by: Ingo Tuchscherer <ingo.tuchscherer@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-06-25	md: clear Blocked flag on failed devices when array is read-only.	Neil Brown
	The Blocked flag indicates that a device has failed but that this fact hasn't been recorded in the metadata yet. Writes to such devices cannot be allowed until the metadata has been updated. On a read-only array, the Blocked flag will never be cleared. This prevents the device being removed from the array. If the metadata is being handled by the kernel (i.e. !mddev->external), then we can be sure that if the array is switch to writable, then a metadata update will happen and will record the failure. So we don't need the flag set. If metadata is externally managed, it is upto the external manager to clear the 'blocked' flag. Reported-by: XiaoNi <xni@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
2015-06-25	md: unlock mddev_lock on an error path.	NeilBrown
	This error path retuns while still holding the lock - bad. Fixes: 6791875e2e53 ("md: make reconfig_mutex optional for writes to md sysfs files.") Cc: stable@vger.kernel.org (v4.0+) Signed-off-by: NeilBrown <neilb@suse.com>
2015-06-25	md: clear mddev->private when it has been freed.	NeilBrown
	If ->private is set when ->run is called, it is assumed to be a 'config' prepared as part of 'reshape'. So it is important when we free that config, that we also clear ->private. This is not often a problem as the mddev will normally be discarded shortly after the config us freed. However if an 'assemble' races with a final close, the assemble can use the old mddev which has a stale ->private. This leads to any of various sorts of crashes. So clear ->private after calling ->free(). Reported-by: Nate Clark <nate@neworld.us> Cc: stable@vger.kernel.org (v4.0+) Fixes: afa0f557cb15 ("md: rename ->stop to ->free") Signed-off-by: NeilBrown <neilb@suse.com>
2015-06-25	firmware: dmi_scan: Coding style cleanups	Jean Delvare
	Signed-off-by: Jean Delvare <jdelvare@suse.de>
2015-06-25	firmware: dmi_scan: add SBMIOS entry and DMI tables	Ivan Khoronzhuk
	Some utils, like dmidecode and smbios, need to access SMBIOS entry table area in order to get information like SMBIOS version, size, etc. Currently it's done via /dev/mem. But for situation when /dev/mem usage is disabled, the utils have to use dmi sysfs instead, which doesn't represent SMBIOS entry and adds code/delay redundancy when direct access for table is needed. So this patch creates dmi/tables and adds SMBIOS entry point to allow utils in question to work correctly without /dev/mem. Also patch adds raw dmi table to simplify dmi table processing in user space, as proposed by Jean Delvare. Tested-by: Roy Franz <roy.franz@linaro.org> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@globallogic.com> Signed-off-by: Jean Delvare <jdelvare@suse.de>
2015-06-25	firmware: dmi_scan: Trim DMI table length before exporting it	Jean Delvare
	The SMBIOS v3 entry points specify a maximum length for the DMI table, not the exact length. Thus there may be garbage after the end-of-table marker, which we don't want to export to user-space. Adjust dmi_len when we find the end-of-table marker, so that only the actual table payload is exported. Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Ivan Khoronzhuk <ivan.khoronzhuk@globallogic.com>
2015-06-25	firmware: dmi_scan: Rename dmi_table to dmi_decode_table	Ivan Khoronzhuk
	The "dmi_table" function looks like data instance, but it does DMI table decode. This patch renames it to "dmi_decode_table" name as more appropriate. That allows us to use "dmi_table" name for correct purposes. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@globallogic.com> Signed-off-by: Jean Delvare <jdelvare@suse.de>
2015-06-25	firmware: dmi_scan: Only honor end-of-table for 64-bit tables	Jean Delvare
	A 32-bit entry point to a DMI table says how many structures the table contains. The SMBIOS specification explicitly says that end-of-table markers should be ignored if they are not actually at the end of the DMI table. So only honor the end-of-table marker for tables accessed through 64-bit entry points, as they do not specify a structure count. Fixes: fc43026278 ("dmi: add support for SMBIOS 3.0 64-bit entry point") Signed-off-by: Jean Delvare <jdelvare@suse.de> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Leif Lindholm <leif.lindholm@linaro.org> Cc: Matt Fleming <matt.fleming@intel.com>
2015-06-24	Merge tag 'armsoc-drivers' into test-merge	Kevin Hilman
	ARM: SoC: driver updates for v4.2 Some of these are for drivers/soc, where we're now putting SoC-specific drivers these days. Some are for other driver subsystems where we have received acks from the appropriate maintainers. Some highlights: - simple-mfd: document DT bindings and misc updates - migrate mach-berlin to simple-mfd for clock, pinctrl and reset - memory: support for Tegra132 SoC - memory: introduce tegra EMC driver for scaling memory frequency - misc. updates for ARM CCI and CCN busses Conflicts: arch/arm64/boot/dts/arm/juno-motherboard.dtsi Trivial add/add conflict with our dt branch. Resolution: take both sides. # gpg: Signature made Wed Jun 24 21:32:17 2015 PDT using RSA key ID D3FBC665 # gpg: Good signature from "Kevin Hilman <khilman@deeprootsystems.com>" # gpg: aka "Kevin Hilman <khilman@linaro.org>" # gpg: aka "Kevin Hilman <khilman@kernel.org>" # Conflicts: # arch/arm64/boot/dts/arm/juno-motherboard.dtsi
2015-06-24	Merge tag 'armsoc-soc' into test-merge	Kevin Hilman
	ARM: SoC: platform support for v4.2 Our SoC branch usually contains expanded support for new SoCs and other core platform code. Some highlights from this round: - sunxi: SMP support for A23 SoC - socpga: big-endian support - pxa: conversion to common clock framework - bcm: SMP support for BCM63138 - imx: support new I.MX7D SoC - zte: basic support for ZX296702 SoC Conflicts: arch/arm/mach-socfpga/core.h Trivial remove/remove conflict with our cleanup branch. Resolution: remove both sides # gpg: Signature made Wed Jun 24 21:32:12 2015 PDT using RSA key ID D3FBC665 # gpg: Good signature from "Kevin Hilman <khilman@deeprootsystems.com>" # gpg: aka "Kevin Hilman <khilman@linaro.org>" # gpg: aka "Kevin Hilman <khilman@kernel.org>" # Conflicts: # arch/arm/mach-socfpga/core.h
2015-06-24	Merge tag 'armsoc-cleanup' into test-merge	Kevin Hilman
	ARM: SoC cleanups for v4.2 A relatively small setup of cleanups this time around, and similar to last time the bulk of it is removal of legacy board support: - OMAP: removal of legacy (non-DT) booting for several platforms - i.MX: remove some legacy board files Conflicts: None # gpg: Signature made Wed Jun 24 21:32:09 2015 PDT using RSA key ID D3FBC665 # gpg: Good signature from "Kevin Hilman <khilman@deeprootsystems.com>" # gpg: aka "Kevin Hilman <khilman@linaro.org>" # gpg: aka "Kevin Hilman <khilman@kernel.org>"
2015-06-25	dmaengine: xgene: fix file permission	Vinod Koul
	drivers/dma/xgene-dma.c has file permissions 775, which is wrong, it should be 664, so fix it Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-06-25	dmaengine: fsl-edma: clear pending interrupts on initialization	Stefan Agner
	Clear pending interrupts before requesting interrupts and move interrupt initialization after channels have been initialized. This avoids a NULL pointer dereference panic when using kexec while DMA requests were running. Signed-off-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-06-25	dmaengine: xdmac: Add memset support	Maxime Ripard
	The XDMAC supports memset transfers, both over contiguous areas, and over discontiguous areas through a LLI. The current memset operation only supports contiguous memset for now, add some support for it. Scatter-gathered memset will come eventually. Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-06-25	Merge branch 'topic/pxa' into for-linus	Vinod Koul

2015-06-25	Merge branch 'topic/xdmac' into for-linus	Vinod Koul

2015-06-25	Merge branch 'topic/omap' into for-linus	Vinod Koul

2015-06-25	Merge branch 'topic/core' into for-linus	Vinod Koul

2015-06-24	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds
	Merge first patchbomb from Andrew Morton: - a few misc things - ocfs2 udpates - kernel/watchdog.c feature work (took ages to get right) - most of MM. A few tricky bits are held up and probably won't make 4.2. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (91 commits) mm: kmemleak_alloc_percpu() should follow the gfp from per_alloc() mm, thp: respect MPOL_PREFERRED policy with non-local node tmpfs: truncate prealloc blocks past i_size mm/memory hotplug: print the last vmemmap region at the end of hot add memory mm/mmap.c: optimization of do_mmap_pgoff function mm: kmemleak: optimise kmemleak_lock acquiring during kmemleak_scan mm: kmemleak: avoid deadlock on the kmemleak object insertion error path mm: kmemleak: do not acquire scan_mutex in kmemleak_do_cleanup() mm: kmemleak: fix delete_object_*() race when called on the same memory block mm: kmemleak: allow safe memory scanning during kmemleak disabling memcg: convert mem_cgroup->under_oom from atomic_t to int memcg: remove unused mem_cgroup->oom_wakeups frontswap: allow multiple backends x86, mirror: x86 enabling - find mirrored memory ranges mm/memblock: allocate boot time data structures from mirrored memory mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute mm: do not ignore mapping_gfp_mask in page cache allocation paths mm/cma.c: fix typos in comments mm/oom_kill.c: print points as unsigned int mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages ...
2015-06-24	dell-laptop: Fix allocating & freeing SMI buffer page	Pali Rohár
	This commit fix kernel crash when probing for rfkill devices in dell-laptop driver failed. Function free_page() was incorrectly used on struct page * instead of virtual address of SMI buffer. This commit also simplify allocating page for SMI buffer by using __get_free_page() function instead of sequential call of functions alloc_page() and page_address(). Signed-off-by: Pali Rohár <pali.rohar@gmail.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2015-06-24	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input subsystem updates from Dmitry Torokhov: "Thanks to Samuel Thibault input device (keyboard) LEDs are no longer hardwired within the input core but use LED subsystem and so allow use of different triggers; Hans de Goede did a large update for the ALPS touchpad driver; we have new TI drv2665 haptics driver and DA9063 OnKey driver, and host of other drivers got various fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (55 commits) Input: pixcir_i2c_ts - fix receive error MAINTAINERS: remove non existent input mt git tree Input: improve usage of gpiod API tty/vt/keyboard: define LED triggers for VT keyboard lock states tty/vt/keyboard: define LED triggers for VT LED states Input: export LEDs as class devices in sysfs Input: cyttsp4 - use swap() in cyttsp4_get_touch() Input: goodix - do not explicitly set evbits in input device Input: goodix - export id and version read from device Input: goodix - fix variable length array warning Input: goodix - fix alignment issues Input: add OnKey driver for DA9063 MFD part Input: elan_i2c - add product IDs FW names Input: elan_i2c - add support for multi IC type and iap format Input: focaltech - report finger width to userspace tty: remove platform_sysrq_reset_seq Input: synaptics_i2c - use proper boolean values Input: psmouse - use true instead of 1 for boolean values Input: cyapa - fix a few typos in comments Input: stmpe-ts - enforce device tree only mode ...
2015-06-24	Merge tag 'edac_for_4.2_2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull EDAC updates from Borislav Petkov: - New APM X-Gene SoC EDAC driver (Loc Ho) - AMD error injection module improvements (Aravind Gopalakrishnan) - Altera Arria 10 support (Thor Thayer) - misc fixes and cleanups all over the place * tag 'edac_for_4.2_2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (28 commits) EDAC: Update Documentation/edac.txt EDAC: Fix typos in Documentation/edac.txt EDAC, mce_amd_inj: Set MISCV on injection EDAC, mce_amd_inj: Move bit preparations before the injection EDAC, mce_amd_inj: Cleanup and simplify README EDAC, altera: Do not allow suspend when EDAC is enabled EDAC, mce_amd_inj: Make inj_type static arm: socfpga: dts: Add Arria10 SDRAM EDAC DTS support EDAC, altera: Add Arria10 EDAC support EDAC, altera: Refactor for Altera CycloneV SoC EDAC, altera: Generalize driver to use DT Memory size EDAC, mce_amd_inj: Add README file EDAC, mce_amd_inj: Add individual permissions field to dfs_node EDAC, mce_amd_inj: Modify flags attribute to use string arguments EDAC, mce_amd_inj: Read out number of MCE banks from the hardware EDAC, mce_amd_inj: Use MCE_INJECT_GET macro for bank node too EDAC, xgene: Fix cpuid abuse EDAC, mpc85xx: Extend error address to 64 bit EDAC, mpc8xxx: Adapt for FSL SoC EDAC, edac_stub: Drop arch-specific include ...
2015-06-24	Merge tag 'pinctrl-v4.2-1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control updates from Linus Walleij: "Here is the bulk of pin control changes for the v4.2 series: Quite a lot of new SoC subdrivers and two new main drivers this time, apart from that business as usual. Details: Core functionality: - Enable exclusive pin ownership: it is possible to flag a pin controller so that GPIO and other functions cannot use a single pin simultaneously. New drivers: - NXP LPC18xx System Control Unit pin controller - Imagination Pistachio SoC pin controller New subdrivers: - Freescale i.MX7d SoC - Intel Sunrisepoint-H PCH - Renesas PFC R8A7793 - Renesas PFC R8A7794 - Mediatek MT6397, MT8127 - SiRF Atlas 7 - Allwinner A33 - Qualcomm MSM8660 - Marvell Armada 395 - Rockchip RK3368 Cleanups: - A big cleanup of the Marvell MVEBU driver rectifying it to correspond to reality - Drop platform device probing from the SH PFC driver, we are now a DT only shop for SuperH - Drop obsolte multi-platform check for SH PFC - Various janitorial: constification, grammar etc Improvements: - The AT91 GPIO portions now supports the set_multiple() feature - Split out SPI pins on the Xilinx Zynq - Support DTs without specific function nodes in the i.MX driver" * tag 'pinctrl-v4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (99 commits) pinctrl: rockchip: add support for the rk3368 pinctrl: rockchip: generalize perpin driver-strength setting pinctrl: sh-pfc: r8a7794: add SDHI pin groups pinctrl: sh-pfc: r8a7794: add MMCIF pin groups pinctrl: sh-pfc: add R8A7794 PFC support pinctrl: make pinctrl_register() return proper error code pinctrl: mvebu: armada-39x: add support for Armada 395 variant pinctrl: mvebu: armada-39x: add missing SATA functions pinctrl: mvebu: armada-39x: add missing PCIe functions pinctrl: mvebu: armada-38x: add ptp functions pinctrl: mvebu: armada-38x: add ua1 functions pinctrl: mvebu: armada-38x: add nand functions pinctrl: mvebu: armada-38x: add sata functions pinctrl: mvebu: armada-xp: add dram functions pinctrl: mvebu: armada-xp: add nand rb function pinctrl: mvebu: armada-xp: add spi1 function pinctrl: mvebu: armada-39x: normalize ref clock naming pinctrl: mvebu: armada-xp: rename spi to spi0 pinctrl: mvebu: armada-370: align spi1 clock pin naming pinctrl: mvebu: armada-370: align VDD cpu-pd pin naming with datasheet ...
2015-06-25	drm/dp/mst: close deadlock in connector destruction.	Dave Airlie
	I've only seen this once, and I failed to capture the lockdep backtrace, but I did some investigations. If we are calling into the MST layer from EDID probing, we have the mode_config mutex held, if during that EDID probing, the MST hub goes away, then we can get a deadlock where the connector destruction function in the driver tries to retake the mode config mutex. This offloads connector destruction to a workqueue, and avoid the subsequenct lock ordering issue. Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-06-24	Merge tag 'backlight-for-linus-4.2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight Pull backlight updates from Lee Jones: "Changes to existing drivers: - supply MODULE_DEVICE_TABLE() to ensure probing - constify struct; da9052_bl - enable compile test; lcd_l4f00242t03, lcd_lms283fg05, backlight_gpio - suspend/resume bugfix; lp855x_bl - devm_gpiod_get_optional() API fixup; pwm_bl - error handling fixup; backlight" * tag 'backlight-for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight: backlight: Change the return type of backlight_update_status() to int backlight: pwm_bl: Simplify usage of devm_gpiod_get_optional backlight: lp855x: Don't clear level on suspend/blank backlight: Allow compile test of GPIO consumers if !GPIOLIB video: backlight: da9052: Constify platform_device_id gpio-backlight: Discover driver during boot time
2015-06-24	libnvdimm: write blk label set	Dan Williams
	After 'uuid', 'size', 'sector_size', and optionally 'alt_name' have been set to valid values the labels on the dimm can be updated. The difference with the pmem case is that blk namespaces are limited to one dimm and can cover discontiguous ranges in dpa space. Also, after allocating label slots, it is useful for userspace to know how many slots are left. Export this information in sysfs. Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Neil Brown <neilb@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24	libnvdimm: write pmem label set	Dan Williams
	After 'uuid', 'size', and optionally 'alt_name' have been set to valid values the labels on the dimms can be updated. Write procedure is: 1/ Allocate and write new labels in the "next" index 2/ Free the old labels in the working copy 3/ Write the bitmap and the label space on the dimm 4/ Write the index to make the update valid Label ranges directly mirror the dpa resource values for the given label_id of the namespace. Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Neil Brown <neilb@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24	libnvdimm: blk labels and namespace instantiation	Dan Williams
	A blk label set describes a namespace comprised of one or more discontiguous dpa ranges on a single dimm. They may alias with one or more pmem interleave sets that include the given dimm. This is the runtime/volatile configuration infrastructure for sysfs manipulation of 'alt_name', 'uuid', 'size', and 'sector_size'. A later patch will make these settings persistent by writing back the label(s). Unlike pmem namespaces, multiple blk namespaces can be created per region. Once a blk namespace has been created a new seed device (unconfigured child of a parent blk region) is instantiated. As long as a region has 'available_size' != 0 new child namespaces may be created. Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Neil Brown <neilb@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24	libnvdimm: pmem label sets and namespace instantiation.	Dan Williams
	A complete label set is a PMEM-label per-dimm per-interleave-set where all the UUIDs match and the interleave set cookie matches the hosting interleave set. Present sysfs attributes for manipulation of a PMEM-namespace's 'alt_name', 'uuid', and 'size' attributes. A later patch will make these settings persistent by writing back the label. Note that PMEM allocations grow forwards from the start of an interleave set (lowest dimm-physical-address (DPA)). BLK-namespaces that alias with a PMEM interleave set will grow allocations backward from the highest DPA. Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Neil Brown <neilb@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24	libnvdimm: namespace indices: read and validate	Dan Williams
	This on media label format [1] consists of two index blocks followed by an array of labels. None of these structures are ever updated in place. A sequence number tracks the current active index and the next one to write, while labels are written to free slots. +------------+ \| \| \| nsindex0 \| \| \| +------------+ \| \| \| nsindex1 \| \| \| +------------+ \| label0 \| +------------+ \| label1 \| +------------+ \| \| ....nslot... \| \| +------------+ \| labelN \| +------------+ After reading valid labels, store the dpa ranges they claim into per-dimm resource trees. [1]: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf Cc: Neil Brown <neilb@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24	libnvdimm, nfit: add interleave-set state-tracking infrastructure	Dan Williams
	On platforms that have firmware support for reading/writing per-dimm label space, a portion of the dimm may be accessible via an interleave set PMEM mapping in addition to the dimm's BLK (block-data-window aperture(s)) interface. A label, stored in a "configuration data region" on the dimm, disambiguates which dimm addresses are accessed through which exclusive interface. Add infrastructure that allows the kernel to block modifications to a label in the set while any member dimm is active. Note that this is meant only for enforcing "no modifications of active labels" via the coarse ioctl command. Adding/deleting namespaces from an active interleave set is always possible via sysfs. Another aspect of tracking interleave sets is tracking their integrity when DIMMs in a set are physically re-ordered. For this purpose we generate an "interleave-set cookie" that can be recorded in a label and validated against the current configuration. It is the bus provider implementation's responsibility to calculate the interleave set cookie and attach it to a given region. Cc: Neil Brown <neilb@suse.de> Cc: <linux-acpi@vger.kernel.org> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Robert Moore <robert.moore@intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24	libnvdimm, pmem: add libnvdimm support to the pmem driver	Dan Williams
	nd_pmem attaches to persistent memory regions and namespaces emitted by the libnvdimm subsystem, and, same as the original pmem driver, presents the system-physical-address range as a block device. The existing e820-type-12 to pmem setup is converted to an nvdimm_bus that emits an nd_namespace_io device. Note that the X in 'pmemX' is now derived from the parent region. This provides some stability to the pmem devices names from boot-to-boot. The minor numbers are also more predictable by passing 0 to alloc_disk(). Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boaz Harrosh <boaz@plexistor.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jens Axboe <axboe@fb.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Christoph Hellwig <hch@lst.de> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24	libnvdimm, pmem: move pmem to drivers/nvdimm/	Dan Williams
	Prepare the pmem driver to consume PMEM namespaces emitted by regions of an nvdimm_bus instance. No functional change. Acked-by: Christoph Hellwig <hch@lst.de> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24	libnvdimm: support for legacy (non-aliasing) nvdimms	Dan Williams
	The libnvdimm region driver is an intermediary driver that translates non-volatile "region"s into "namespace" sub-devices that are surfaced by persistent memory block-device drivers (PMEM and BLK). ACPI 6 introduces the concept that a given nvdimm may simultaneously offer multiple access modes to its media through direct PMEM load/store access, or windowed BLK mode. Existing nvdimms mostly implement a PMEM interface, some offer a BLK-like mode, but never both as ACPI 6 defines. If an nvdimm is single interfaced, then there is no need for dimm metadata labels. For these devices we can take the region boundaries directly to create a child namespace device (nd_namespace_io). Acked-by: Christoph Hellwig <hch@lst.de> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24	libnvdimm, nfit: regions (block-data-window, persistent memory, volatile memory)	Dan Williams
	A "region" device represents the maximum capacity of a BLK range (mmio block-data-window(s)), or a PMEM range (DAX-capable persistent memory or volatile memory), without regard for aliasing. Aliasing, in the dimm-local address space (DPA), is resolved by metadata on a dimm to designate which exclusive interface will access the aliased DPA ranges. Support for the per-dimm metadata/label arrvies is in a subsequent patch. The name format of "region" devices is "regionN" where, like dimms, N is a global ida index assigned at discovery time. This id is not reliable across reboots nor in the presence of hotplug. Look to attributes of the region or static id-data of the sub-namespace to generate a persistent name. However, if the platform configuration does not change it is reasonable to expect the same region id to be assigned at the next boot. "region"s have 2 generic attributes "size", and "mapping"s where: - size: the BLK accessible capacity or the span of the system physical address range in the case of PMEM. - mappingN: a tuple describing a dimm's contribution to the region's capacity in the format (<nmemX>,<dpa>,<size>). For a PMEM-region there will be at least one mapping per dimm in the interleave set. For a BLK-region there is only "mapping0" listing the starting DPA of the BLK-region and the available DPA capacity of that space (matches "size" above). The max number of mappings per "region" is hard coded per the constraints of sysfs attribute groups. That said the number of mappings per region should never exceed the maximum number of possible dimms in the system. If the current number turns out to not be enough then the "mappings" attribute clarifies how many there are supposed to be. "32 should be enough for anybody...". Cc: Neil Brown <neilb@suse.de> Cc: <linux-acpi@vger.kernel.org> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Robert Moore <robert.moore@intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24	libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver infrastructure	Dan Williams
	* Implement the device-model infrastructure for loading modules and attaching drivers to nvdimm devices. This is a simple association of a nd-device-type number with a driver that has a bitmask of supported device types. To facilitate userspace bind/unbind operations 'modalias' and 'devtype', that also appear in the uevent, are added as generic sysfs attributes for all nvdimm devices. The reason for the device-type number is to support sub-types within a given parent devtype, be it a vendor-specific sub-type or otherwise. * The first consumer of this infrastructure is the driver for dimm devices. It simply uses control messages to retrieve and store the configuration-data image (label set) from each dimm. Note: nd_device_register() arranges for asynchronous registration of nvdimm bus devices by default. Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Neil Brown <neilb@suse.de> Acked-by: Christoph Hellwig <hch@lst.de> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24	libnvdimm: control (ioctl) messages for nvdimm_bus and nvdimm devices	Dan Williams
	Most discovery/configuration of the nvdimm-subsystem is done via sysfs attributes. However, some nvdimm_bus instances, particularly the ACPI.NFIT bus, define a small set of messages that can be passed to the platform. For convenience we derive the initial libnvdimm-ioctl command formats directly from the NFIT DSM Interface Example formats. ND_CMD_SMART: media health and diagnostics ND_CMD_GET_CONFIG_SIZE: size of the label space ND_CMD_GET_CONFIG_DATA: read label space ND_CMD_SET_CONFIG_DATA: write label space ND_CMD_VENDOR: vendor-specific command passthrough ND_CMD_ARS_CAP: report address-range-scrubbing capabilities ND_CMD_ARS_START: initiate scrubbing ND_CMD_ARS_STATUS: report on scrubbing state ND_CMD_SMART_THRESHOLD: configure alarm thresholds for smart events If a platform later defines different commands than this set it is straightforward to extend support to those formats. Most of the commands target a specific dimm. However, the address-range-scrubbing commands target the bus. The 'commands' attribute in sysfs of an nvdimm_bus, or nvdimm, enumerate the supported commands for that object. Cc: <linux-acpi@vger.kernel.org> Cc: Robert Moore <robert.moore@intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reported-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24	libnvdimm, nfit: dimm/memory-devices	Dan Williams
	Enable nvdimm devices to be registered on a nvdimm_bus. The kernel assigned device id for nvdimm devicesis dynamic. If userspace needs a more static identifier it should consult a provider-specific attribute. In the case where NFIT is the provider, the 'nmemX/nfit/handle' or 'nmemX/nfit/serial' attributes may be used for this purpose. Cc: Neil Brown <neilb@suse.de> Cc: <linux-acpi@vger.kernel.org> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Robert Moore <robert.moore@intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24	libnvdimm: control character device and nvdimm_bus sysfs attributes	Dan Williams
	The control device for a nvdimm_bus is registered as an "nd" class device. The expectation is that there will usually only be one "nd" bus registered under /sys/class/nd. However, we allow for the possibility of multiple buses and they will listed in discovery order as ndctl0...ndctlN. This character device hosts the ioctl for passing control messages. The initial command set has a 1:1 correlation with the commands listed in the by the "NFIT DSM Example" document [1], but this scheme is extensible to future command sets. Note, nd_ioctl() and the backing ->ndctl() implementation are defined in a subsequent patch. This is simply the initial registrations and sysfs attributes. [1]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf Cc: Neil Brown <neilb@suse.de> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: <linux-acpi@vger.kernel.org> Cc: Robert Moore <robert.moore@intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24	libnvdimm, nfit: initial libnvdimm infrastructure and NFIT support	Dan Williams
	A struct nvdimm_bus is the anchor device for registering nvdimm resources and interfaces, for example, a character control device, nvdimm devices, and I/O region devices. The ACPI NFIT (NVDIMM Firmware Interface Table) is one possible platform description for such non-volatile memory resources in a system. The nfit.ko driver attaches to the "ACPI0012" device that indicates the presence of the NFIT and parses the table to register a struct nvdimm_bus instance. Cc: <linux-acpi@vger.kernel.org> Cc: Lv Zheng <lv.zheng@intel.com> Cc: Robert Moore <robert.moore@intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24	frontswap: allow multiple backends	Dan Streetman
	Change frontswap single pointer to a singly linked list of frontswap implementations. Update Xen tmem implementation as register no longer returns anything. Frontswap only keeps track of a single implementation; any implementation that registers second (or later) will replace the previously registered implementation, and gets a pointer to the previous implementation that the new implementation is expected to pass all frontswap functions to if it can't handle the function itself. However that method doesn't really make much sense, as passing that work on to every implementation adds unnecessary work to implementations; instead, frontswap should simply keep a list of all registered implementations and try each implementation for any function. Most importantly, neither of the two currently existing frontswap implementations in the kernel actually do anything with any previous frontswap implementation that they replace when registering. This allows frontswap to successfully manage multiple implementations by keeping a list of them all. Signed-off-by: Dan Streetman <ddstreet@ieee.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24	mm: oom_kill: simplify OOM killer locking	Johannes Weiner
	The zonelist locking and the oom_sem are two overlapping locks that are used to serialize global OOM killing against different things. The historical zonelist locking serializes OOM kills from allocations with overlapping zonelists against each other to prevent killing more tasks than necessary in the same memory domain. Only when neither tasklists nor zonelists from two concurrent OOM kills overlap (tasks in separate memcgs bound to separate nodes) are OOM kills allowed to execute in parallel. The younger oom_sem is a read-write lock to serialize OOM killing against the PM code trying to disable the OOM killer altogether. However, the OOM killer is a fairly cold error path, there is really no reason to optimize for highly performant and concurrent OOM kills. And the oom_sem is just flat-out redundant. Replace both locking schemes with a single global mutex serializing OOM kills regardless of context. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>