summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-09-21i2c: mux: pca954x: retry updating the mux selection on failurePeter Rosin
The cached value of the last selected channel prevents retries on the next call, even on failure to update the selected channel. Fix that. Signed-off-by: Peter Rosin <peda@axentia.se> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org
2016-09-21i2c: octeon: Fix high-level controller status checkJan Glauber
In case the high-level controller (HLC) is used the status code is reported at a different location. Check that location after HLC write operations if the ready bit is not set and return an appropriate error code instead of always returning -EAGAIN. Signed-off-by: Jan Glauber <jglauber@cavium.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2016-09-21i2c: octeon: Avoid sending STOP during recoveryDmitry Bazhenov
Due to a bug in the ThunderX I2C hardware sending STOP during a recovery attempt could lock up the hardware. To work around this problem do not send STOP at the beginning of the recovery but use the override registers to bring the TWSI including the high-level controller out of the bad state. Signed-off-by: Dmitry Bazhenov <dmitry.bazhenov@auriga.com> Signed-off-by: Jan Glauber <jglauber@cavium.com> [Changed commit message] Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2016-09-21i2c: octeon: Fix set SCL recovery functionDmitry Bazhenov
The set SCL recovery function unconditionally pulls the SCL line low. Only pull SCL line low according to val parameter. Signed-off-by: Dmitry Bazhenov <dmitry.bazhenov@auriga.com> Signed-off-by: Jan Glauber <jglauber@cavium.com> [Changed commit message] Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2016-09-21clk: nxp: clk-lpc32xx: Unmap region obtained by of_iomapArvind Yadav
Free memory mapping, if lpc32xx_clk_init is not successful. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Acked-by: Vladimir Zapolskiy <vz@mleia.com> Acked-by: Sylvain Lemieux <slemieux.tyco@gmail.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2016-09-21ARM: select PCI_DOMAINS config from ARCH_MULTIPLATFORMKishon Vijay Abraham I
PCI_DOMAINS config should be selected for any SoCs having more than a single PCIe controller. Without PCI_DOMAINS config, only one PCIe controller gets registered. Select PCI_DOMAINS in ARCH_MULTIPLATFORM if PCI is selected, since it doesn't harm even if a platform has a single PCIe port. Also remove PCI_DOMAINS being selected from other platform specific configs. Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Acked-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2016-09-21ARM: stop *MIGHT_HAVE_PCI* config from being selected redundantlyKishon Vijay Abraham I
*MIGHT_HAVE_PCI* config is already selected in ARCH_MULTIPLATFORM. Don't select it redundantly in all ARCH_MULTIPLATFORM based machines. Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Acked-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2016-09-21scsi: hpsa: correct call to hpsa_do_resetDon Brace
calling fill_cmd() using a MACRO definition not handled in switch statement causes BUG() to be called. Signed-off-by: Don Brace <don.brace@microsemi.com> Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-09-21Merge tag 'qcom-ebi2-arm-soc' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-integrator into next/drivers Pull "Qualcomm EBI2 bindings and bus driver" from Linus Walleij * tag 'qcom-ebi2-arm-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-integrator: bus: qcom: add EBI2 driver bus: qcom: add EBI2 device tree bindings Acked-by: Andy Gross <andy.gross@linaro.org>
2016-09-21Merge tag 'sunxi-dt-for-4.9-3' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux into next/late Pull "Allwinner DT changes for 4.9, late edition" from Maxime Ripard: Here is a bunch of late changes for the 4.9 merge window, mostly: - Added a bunch of touchscreens nodes to tablets - Added support for the AXP806 PMIC found in the A80 boards - Enabled a few pinmux options for the H3 * tag 'sunxi-dt-for-4.9-3' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux: ARM: dts: sun8i: Add accelerometer to polaroid-mid2407pxe03 ARM: dts: sun8i: enable UART1 for iNet D978 Rev2 board ARM: dts: sun8i: add pinmux for UART1 at PG dts: sun8i-h3: add I2C0-2 peripherals to H3 SOC dts: sun8i-h3: add pinmux definitions for I2C0-2 dts: sun8i-h3: associate exposed UARTs on Orange Pi Boards dts: sun8i-h3: split off RTS/CTS for UART1 in seperate pinmux dts: sun8i-h3: add pinmux definitions for UART2-3 ARM: dts: sun9i: a80-optimus: Disable EHCI1 ARM: dts: sun9i: cubieboard4: Add AXP806 PMIC device node and regulators ARM: dts: sun9i: a80-optimus: Add AXP806 PMIC device node and regulators ARM: dts: sun9i: cubieboard4: Declare AXP809 SW regulator as unused ARM: dts: sun9i: a80-optimus: Declare AXP809 SW regulator as unused ARM: dts: sun8i: Add touchscreen node for sun8i-a33-ga10h ARM: dts: sun8i: Add touchscreen node for sun8i-a23-polaroid-mid2809pxe04 ARM: dts: sun8i: Add touchscreen node for sun8i-a23-polaroid-mid2407pxe03 ARM: dts: sun8i: Add touchscreen node for sun8i-a23-inet86dz ARM: dts: sun8i: Add touchscreen node for sun8i-a23-gt90h
2016-09-21Merge tag 'mvebu-drivers-4.9-1' of git://git.infradead.org/linux-mvebu into ↵Arnd Bergmann
next/drivers Pull "mvebu drivers for 4.9 (part 1)" from Gregory CLEMENT: - Add pinctrl and clk support for the Orion5x SoC mv88f5181 variant * tag 'mvebu-drivers-4.9-1' of git://git.infradead.org/linux-mvebu: pinctrl: mvebu: orion5x: Generalise mv88f5181l support for 88f5181 clk: mvebu: Add clk support for the orion5x SoC mv88f5181
2016-09-21Merge tag 'mvebu-soc-4.9-1' of git://git.infradead.org/linux-mvebu into next/socArnd Bergmann
Pull "mvebu soc for 4.9 (part 1)" from Gregory CLEMENT: - irq cleanup for old mvebu SoC * tag 'mvebu-soc-4.9-1' of git://git.infradead.org/linux-mvebu: ARM: orion5x: remove extraneous NO_IRQ ARM: orion: simplify orion_ge00_switch_init ARM: mvebu/orion: remove NO_IRQ check from device init ARM: mv78xx0: simplify ethernet device creation
2016-09-21Merge tag 'imx-legacy-4.9' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into next/soc Pull "i.MX legacy board file changes for 4.9" from Shawn Guo: It includes a patch series that moves registrations and initializations of all peripherals which are GPIO line consumers for all legacy boards from .init_machine to .init_late init level. This is needed to proactively prevent boot time issues on the legacy boards due to the deprioritized init level of the GPIO controller driver (set lower than IOMUX controller driver init level), which is shared among all i.MX SoCs. * tag 'imx-legacy-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: ARM: imx legacy: pca100: move peripheral initialization to .init_late ARM: imx legacy: mx27ads: move peripheral initialization to .init_late ARM: imx legacy: mx21ads: move peripheral initialization to .init_late ARM: imx legacy: pcm043: move peripheral initialization to .init_late ARM: imx legacy: mx35-3ds: move peripheral initialization to .init_late ARM: imx legacy: mx27-3ds: move peripheral initialization to .init_late ARM: imx legacy: imx27-visstrim-m10: move peripheral initialization to .init_late ARM: imx legacy: vpr200: move peripheral initialization to .init_late ARM: imx legacy: mx31moboard: move peripheral initialization to .init_late ARM: imx legacy: armadillo5x0: move peripheral initialization to .init_late ARM: imx legacy: qong: move peripheral initialization to .init_late ARM: imx legacy: mx31-3ds: move peripheral initialization to .init_late ARM: imx legacy: pcm037: move peripheral initialization to .init_late ARM: imx legacy: mx31lilly: move peripheral initialization to .init_late ARM: imx legacy: mx31ads: move peripheral initialization to .init_late ARM: imx legacy: mx31lite: move peripheral initialization to .init_late ARM: imx legacy: kzm: move peripheral initialization to .init_late
2016-09-21scsi: ufs: Get a TM service response from the correct offsetKiwoong Kim
When any UFS host controller receives a TM(Task Management) response from a UFS device, UFS driver has been recognize like receiving a message of "Task Management Function Complete"(00h) in all cases, so far. That means there is no pending task for a tag of the TM request sent before in the UFS device. That's because the byte offset 6 in TM response which has been used to get a TM service response so far represents just whether or not a TM transmission passes. Regarding UFS spec, the correct byte offset to get TM service response is 15, not 6. I tested that UFS driver responds properly for the TM response from a UFS device with an reference board with exynos8890, as follow: No pending task -> Task Management Function Complete (00h) Pending task -> Task Management Function Succeeded (08h) [mkp: applied by hand] Signed-off-by: Kiwoong Kim <kwmad.kim@samsung.com> Signed-off-by: HeonGwang Chu <hg.chu@samsung.com> Tested-by: : Kiwoong Kim <kwmad.kim@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-09-21rtc: cmos: Restore alarm after resumeGabriele Mazzotta
Some platform firmware may interfere with the RTC alarm over suspend, resulting in the kernel and hardware having different ideas about system state but also potentially causing problems with firmware that assumes the OS will clean this case up. This patch restores the RTC alarm on resume to ensure that kernel and hardware are in sync. The case we've seen is Intel Rapid Start, which is a firmware-mediated feature that automatically transitions systems from suspend-to-RAM to suspend-to-disk without OS involvement. It does this by setting the RTC alarm and a flag that indicates that on wake it should perform the transition rather than re-starting the OS. However, if the OS has set a wakeup alarm that would wake the machine earlier, it refuses to overwrite it and allows the system to wake instead. This fails in the following situation: 1) User configures Intel Rapid Start to transition after (say) 15 minutes 2) User suspends to RAM. Firmware sets the wakeup alarm for 15 minutes in the future 3) User resumes after 5 minutes. Firmware does not reset the alarm, and as such it is still set for 10 minutes in the future 4) User suspends after 5 minutes. Firmware notices that the alarm is set for 5 minutes in the future, which is less than the 15 minute transition threshold. It therefore assumes that the user wants the machine to wake in 5 minutes 5) System resumes after 5 minutes The worst case scenario here is that the user may have put the system in a bag between (4) and (5), resulting in it running in a confined space and potentially overheating. This seems reasonably important. The Rapid Start support code got added in 3.11, but it can be configured in the firmware regardless of kernel support. Signed-off-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-09-21rtc: cmos: Clear ACPI-driven alarms upon resumeGabriele Mazzotta
Currently ACPI-driven alarms are not cleared when they wake the system. As consequence, expired alarms must be manually cleared to program a new alarm. Fix this by correctly handling ACPI-driven alarms. More specifically, the ACPI specification [1] provides for two alternative implementations of the RTC. Depending on the implementation, the driver either clear the alarm from the resume callback or from ACPI interrupt handler: - The platform has the RTC wakeup status fixed in hardware (ACPI_FADT_FIXED_RTC is 0). In this case the driver can determine if the RTC was the reason of the wakeup from the resume callback by reading the RTC status register. - The platform has no fixed hardware feature event bits. In this case a GPE is used to wake the system and the driver clears the alarm from its handler. [1] http://www.acpi.info/DOWNLOADS/ACPI_5_Errata%20A.pdf Signed-off-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-09-21rtc: omap: Support ext_wakeup configurationMarcin Niestroj
Support configuration of ext_wakeup sources. This patch makes it possible to enable ext_wakeup and set it's polarity, depending on board configuration. AM335x's dedicated PMIC (tps65217) uses ext_wakeup to notify about power-button presses. Handling power-button presses enables to recover from RTC-only power states correctly. Signed-off-by: Marcin Niestroj <m.niestroj@grinn-global.com> Acked-by: Tony Lindgren <tony@atomide.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2016-09-21acpi: Validate processor id when mapping the processorDou Liyang
When we want to identify whether the proc_id is unreasonable or not, we can call the "acpi_processor_validate_proc_id" function. It will search in the duplicate IDs. If we find the proc_id in the IDs, we return true to the call function. Conversely, the false represents available. When we establish all possible cpuid <-> nodeid mapping to handle the cpu hotplugs, we will use the proc_id from ACPI table. We do validation when we get the proc_id. If the result is true, we will stop the mapping. [ tglx: Mark the new function __init ] Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: mika.j.penttila@gmail.com Cc: len.brown@intel.com Cc: rafael@kernel.org Cc: rjw@rjwysocki.net Cc: yasu.isimatu@gmail.com Cc: linux-mm@kvack.org Cc: linux-acpi@vger.kernel.org Cc: isimatu.yasuaki@jp.fujitsu.com Cc: gongzhaogang@inspur.com Cc: tj@kernel.org Cc: izumi.taku@jp.fujitsu.com Cc: cl@linux.com Cc: chen.tang@easystack.cn Cc: akpm@linux-foundation.org Cc: kamezawa.hiroyu@jp.fujitsu.com Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1472114120-3281-8-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-21acpi: Provide mechanism to validate processors in the ACPI tablesDou Liyang
[Problem] When we set cpuid <-> nodeid mapping to be persistent, it will use the DSDT As we know, the ACPI tables are just like user's input in that respect, and we don't crash if user's input is unreasonable. Such as, the mapping of the proc_id and pxm in some machine's ACPI table is like this: proc_id | pxm -------------------- 0 <-> 0 1 <-> 0 2 <-> 1 3 <-> 1 89 <-> 0 89 <-> 0 89 <-> 0 89 <-> 1 89 <-> 1 89 <-> 2 89 <-> 3 ..... We can't be sure which one is correct to the proc_id 89. We may map a wrong node to a cpu. When pages are allocated, this may cause a kernal panic. So, we should provide mechanisms to validate the ACPI tables, just like we do validation to check user's input in web project. The mechanism is that the processor objects which have the duplicate IDs are not valid. [Solution] We add a validation function, like this: foreach Processor in DSDT proc_id = get_ACPI_Processor_number(Processor) if (proc_id exists ) mark both of them as being unreasonable; The function will record the unique or duplicate processor IDs. The duplicate processor IDs such as 89 are regarded as the unreasonable IDs which mean that the processor objects in question are not valid. [ tglx: Add __init[data] annotations ] Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: mika.j.penttila@gmail.com Cc: len.brown@intel.com Cc: rafael@kernel.org Cc: rjw@rjwysocki.net Cc: yasu.isimatu@gmail.com Cc: linux-mm@kvack.org Cc: linux-acpi@vger.kernel.org Cc: isimatu.yasuaki@jp.fujitsu.com Cc: gongzhaogang@inspur.com Cc: tj@kernel.org Cc: izumi.taku@jp.fujitsu.com Cc: cl@linux.com Cc: chen.tang@easystack.cn Cc: akpm@linux-foundation.org Cc: kamezawa.hiroyu@jp.fujitsu.com Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1472114120-3281-7-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-21x86/acpi: Set persistent cpuid <-> nodeid mapping when bootingGu Zheng
The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that, when node online/offline happens, cache based on cpuid <-> nodeid mapping such as wq_numa_possible_cpumask will not cause any problem. It contains 4 steps: 1. Enable apic registeration flow to handle both enabled and disabled cpus. 2. Introduce a new array storing all possible cpuid <-> apicid mapping. 3. Enable _MAT and MADT relative apis to return non-present or disabled cpus' apicid. 4. Establish all possible cpuid <-> nodeid mapping. This patch finishes step 4. This patch set the persistent cpuid <-> nodeid mapping for all enabled/disabled processors at boot time via an additional acpi namespace walk for processors. [ tglx: Remove the unneeded exports ] Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: mika.j.penttila@gmail.com Cc: len.brown@intel.com Cc: rafael@kernel.org Cc: rjw@rjwysocki.net Cc: yasu.isimatu@gmail.com Cc: linux-mm@kvack.org Cc: linux-acpi@vger.kernel.org Cc: isimatu.yasuaki@jp.fujitsu.com Cc: gongzhaogang@inspur.com Cc: tj@kernel.org Cc: izumi.taku@jp.fujitsu.com Cc: cl@linux.com Cc: chen.tang@easystack.cn Cc: akpm@linux-foundation.org Cc: kamezawa.hiroyu@jp.fujitsu.com Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1472114120-3281-6-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-21x86/acpi: Enable MADT APIs to return disabled apicidsGu Zheng
The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that, when node online/offline happens, cache based on cpuid <-> nodeid mapping such as wq_numa_possible_cpumask will not cause any problem. It contains 4 steps: 1. Enable apic registeration flow to handle both enabled and disabled cpus. 2. Introduce a new array storing all possible cpuid <-> apicid mapping. 3. Enable _MAT and MADT relative apis to return non-present or disabled cpus' apicid. 4. Establish all possible cpuid <-> nodeid mapping. This patch finishes step 3. There are four mappings in the kernel: 1. nodeid (logical node id) <-> pxm (persistent) 2. apicid (physical cpu id) <-> nodeid (persistent) 3. cpuid (logical cpu id) <-> apicid (not persistent, now persistent by step 2) 4. cpuid (logical cpu id) <-> nodeid (not persistent) So, in order to setup persistent cpuid <-> nodeid mapping for all possible CPUs, we should: 1. Setup cpuid <-> apicid mapping for all possible CPUs, which has been done in step 1, 2. 2. Setup cpuid <-> nodeid mapping for all possible CPUs. But before that, we should obtain all apicids from MADT. All processors' apicids can be obtained by _MAT method or from MADT in ACPI. The current code ignores disabled processors and returns -ENODEV. After this patch, a new parameter will be added to MADT APIs so that caller is able to control if disabled processors are ignored. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: mika.j.penttila@gmail.com Cc: len.brown@intel.com Cc: rafael@kernel.org Cc: rjw@rjwysocki.net Cc: yasu.isimatu@gmail.com Cc: linux-mm@kvack.org Cc: linux-acpi@vger.kernel.org Cc: isimatu.yasuaki@jp.fujitsu.com Cc: gongzhaogang@inspur.com Cc: tj@kernel.org Cc: izumi.taku@jp.fujitsu.com Cc: cl@linux.com Cc: chen.tang@easystack.cn Cc: akpm@linux-foundation.org Cc: kamezawa.hiroyu@jp.fujitsu.com Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1472114120-3281-5-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-21x86/acpi: Introduce persistent storage for cpuid <-> apicid mappingGu Zheng
The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that, when node online/offline happens, cache based on cpuid <-> nodeid mapping such as wq_numa_possible_cpumask will not cause any problem. It contains 4 steps: 1. Enable apic registeration flow to handle both enabled and disabled cpus. 2. Introduce a new array storing all possible cpuid <-> apicid mapping. 3. Enable _MAT and MADT relative apis to return non-present or disabled cpus' apicid. 4. Establish all possible cpuid <-> nodeid mapping. This patch finishes step 2. In this patch, we introduce a new static array named cpuid_to_apicid[], which is large enough to store info for all possible cpus. And then, we modify the cpuid calculation. In generic_processor_info(), it simply finds the next unused cpuid. And it is also why the cpuid <-> nodeid mapping changes with node hotplug. After this patch, we find the next unused cpuid, map it to an apicid, and store the mapping in cpuid_to_apicid[], so that cpuid <-> apicid mapping will be persistent. And finally we will use this array to make cpuid <-> nodeid persistent. cpuid <-> apicid mapping is established at local apic registeration time. But non-present or disabled cpus are ignored. In this patch, we establish all possible cpuid <-> apicid mapping when registering local apic. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: mika.j.penttila@gmail.com Cc: len.brown@intel.com Cc: rafael@kernel.org Cc: rjw@rjwysocki.net Cc: yasu.isimatu@gmail.com Cc: linux-mm@kvack.org Cc: linux-acpi@vger.kernel.org Cc: isimatu.yasuaki@jp.fujitsu.com Cc: gongzhaogang@inspur.com Cc: tj@kernel.org Cc: izumi.taku@jp.fujitsu.com Cc: cl@linux.com Cc: chen.tang@easystack.cn Cc: akpm@linux-foundation.org Cc: kamezawa.hiroyu@jp.fujitsu.com Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1472114120-3281-4-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-21x86/acpi: Enable acpi to register all possible cpus at boot timeGu Zheng
cpuid <-> nodeid mapping is firstly established at boot time. And workqueue caches the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time. When doing node online/offline, cpuid <-> nodeid mapping is established/destroyed, which means, cpuid <-> nodeid mapping will change if node hotplug happens. But workqueue does not update wq_numa_possible_cpumask. So here is the problem: Assume we have the following cpuid <-> nodeid in the beginning: Node | CPU ------------------------ node 0 | 0-14, 60-74 node 1 | 15-29, 75-89 node 2 | 30-44, 90-104 node 3 | 45-59, 105-119 and we hot-remove node2 and node3, it becomes: Node | CPU ------------------------ node 0 | 0-14, 60-74 node 1 | 15-29, 75-89 and we hot-add node4 and node5, it becomes: Node | CPU ------------------------ node 0 | 0-14, 60-74 node 1 | 15-29, 75-89 node 4 | 30-59 node 5 | 90-119 But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like. When a pool workqueue is initialized, if its cpumask belongs to a node, its pool->node will be mapped to that node. And memory used by this workqueue will also be allocated on that node. static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs){ ... /* if cpumask is contained inside a NUMA node, we belong to that node */ if (wq_numa_enabled) { for_each_node(node) { if (cpumask_subset(pool->attrs->cpumask, wq_numa_possible_cpumask[node])) { pool->node = node; break; } } } Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline node, which will lead to memory allocation failure: SLUB: Unable to allocate memory on node 2 (gfp=0x80d0) cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min order: 0 node 0: slabs: 6172, objs: 259224, free: 245741 node 1: slabs: 3261, objs: 136962, free: 127656 It happens here: create_worker(struct worker_pool *pool) |--> worker = alloc_worker(pool->node); static struct worker *alloc_worker(int node) { struct worker *worker; worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, useing the wrong node. ...... return worker; } [Solution] There are four mappings in the kernel: 1. nodeid (logical node id) <-> pxm 2. apicid (physical cpu id) <-> nodeid 3. cpuid (logical cpu id) <-> apicid 4. cpuid (logical cpu id) <-> nodeid 1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> pxm mapping is setup at boot time. This mapping is persistent, won't change. 2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at boot time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is also persistent. 3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is allocated, lower ids first, and released at CPU hotremove time, reused for other hotadded CPUs. So this mapping is not persistent. 4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and cleared at CPU hotremove time. As a result of 3, this mapping is not persistent. To fix this problem, we establish cpuid <-> nodeid mapping for all the possible cpus at boot time, and make it persistent. And according to init_cpu_to_node(), cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> apicid mapping. So the key point is obtaining all cpus' apicid. apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in MADT (Multiple APIC Description Table). So we finish the job in the following steps: 1. Enable apic registeration flow to handle both enabled and disabled cpus. This is done by introducing an extra parameter to generic_processor_info to let the caller control if disabled cpus are ignored. 2. Introduce a new array storing all possible cpuid <-> apicid mapping. And also modify the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping when registering local apic. Store the mapping in this array. 3. Enable _MAT and MADT relative apis to return non-present or disabled cpus' apicid. This is also done by introducing an extra parameter to these apis to let the caller control if disabled cpus are ignored. 4. Establish all possible cpuid <-> nodeid mapping. This is done via an additional acpi namespace walk for processors. This patch finished step 1. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: mika.j.penttila@gmail.com Cc: len.brown@intel.com Cc: rafael@kernel.org Cc: rjw@rjwysocki.net Cc: yasu.isimatu@gmail.com Cc: linux-mm@kvack.org Cc: linux-acpi@vger.kernel.org Cc: isimatu.yasuaki@jp.fujitsu.com Cc: gongzhaogang@inspur.com Cc: tj@kernel.org Cc: izumi.taku@jp.fujitsu.com Cc: cl@linux.com Cc: chen.tang@easystack.cn Cc: akpm@linux-foundation.org Cc: kamezawa.hiroyu@jp.fujitsu.com Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1472114120-3281-3-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-21x86/numa: Online memory-less nodes at boot timeTang Chen
For now, x86 does not support memory-less node. A node without memory will not be onlined, and the cpus on it will be mapped to the other online nodes with memory in init_cpu_to_node(). The reason of doing this is to ensure each cpu has mapped to a node with memory, so that it will be able to allocate local memory for that cpu. But we don't have to do it in this way. In this series of patches, we are going to construct cpu <-> node mapping for all possible cpus at boot time, which is a persistent mapping. It means that the cpu will be mapped to the node which it belongs to, and will never be changed. If a node has only cpus but no memory, the cpus on it will be mapped to a memory-less node. And the memory-less node should be onlined. Allocate pgdats for all memory-less nodes and online them at boot time. Then build zonelists for these nodes. As a result, when cpus on these memory-less nodes try to allocate memory from local node, it will automatically fall back to the proper zones in the zonelists. Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: mika.j.penttila@gmail.com Cc: len.brown@intel.com Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: rafael@kernel.org Cc: rjw@rjwysocki.net Cc: yasu.isimatu@gmail.com Cc: linux-mm@kvack.org Cc: linux-acpi@vger.kernel.org Cc: isimatu.yasuaki@jp.fujitsu.com Cc: gongzhaogang@inspur.com Cc: tj@kernel.org Cc: izumi.taku@jp.fujitsu.com Cc: cl@linux.com Cc: chen.tang@easystack.cn Cc: akpm@linux-foundation.org Cc: kamezawa.hiroyu@jp.fujitsu.com Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1472114120-3281-2-git-send-email-douly.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-21ftrace/scripts: Add helper script to bisect function tracing problem functionsSteven Rostedt (Red Hat)
Every so often, with a special config or a architecture change, running function or function_graph tracing can cause the machien to hard reboot, crash, or simply hard lockup. There's some functions in the function graph tracer that can not be traced otherwise it causes the function tracer to recurse before the recursion protection mechanisms are in place. When this occurs, using the dynamic ftrace featuer that allows limiting what actually gets traced can be used to bisect down to the problem function. This adds a script that helps with this process in the scripts/tracing directory, called ftrace-bisect.sh The set up is to read all the functions that can be traced from available_filter_functions into a file (full_file). Then run this script passing it the full_file and a "test_file" and "non_test_file", where the test_file will be add to set_ftrace_filter. What ftarce_bisect.sh does, is to copy half of the functions in full_file into the test_file and the other half into the non_test_file. This way, one can cat the test_file into the set_ftrace_filter functions and only test the functions that are in that file. If it works, then we run the process again after copying non_test_file to full_file and repeating the process. If the system crashed, then the bad function is in the test_file and after a reboot, the test_file becomes the new full_file in the next iteration. When we get down to a single function in the full_file, then ftrace_bisect.sh will report that as the bad function. Full documentation of how to use this simple script is within the script file itself. Link: http://lkml.kernel.org/r/20160920100716.131d3647@gandalf.local.home Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-09-21ARM: sa1111: provide to_sa1111_device() macroRussell King
Provide a nicer to_sa1111_device macro to convert a struct device to a sa1111_dev. We will need this for drivers when converting them to dev_pm_ops, or removing shutdown methods. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2016-09-21gfs2: fix to detect failure of register_shrinkerChao Yu
register_shrinker can fail after commit 1d3d4437eae1 ("vmscan: per-node deferred work"), we should detect the failure of it, otherwise we may fail to register shrinker after gfs2 module was been inited successfully. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-09-21carl9170: fix debugfs crashesChristian Lamparter
Ben Greear reported: > I see lots of instability as soon as I load up the carl9710 NIC. > My application is going to be poking at it's debugfs files... > > BUG: KASAN: slab-out-of-bounds in carl9170_debugfs_read+0xd5/0x2a0 > [carl9170] at addr 0xffff8801bc1208b0 > Read of size 8 by task btserver/5888 > ======================================================================= > BUG kmalloc-256 (Tainted: G W ): kasan: bad access detected > ----------------------------------------------------------------------- > > INFO: Allocated in seq_open+0x50/0x100 age=2690 cpu=2 pid=772 >... This breakage was caused by the introduction of intermediate fops in debugfs by commit 9fd4dcece43a ("debugfs: prevent access to possibly dead file_operations at file open") Thankfully, the original/real fops are still available in d_fsdata. Reported-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Christian Lamparter <chunkeey@gmail.com> Cc: stable <stable@vger.kernel.org> # 4.7+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-21orangefs: bump minimum userspace versionMartin Brandenburg
OrangeFS 2.9.6 was released without support for the features op. Thus OrangeFS 2.9.7 will be required to use it. Signed-off-by: Martin Brandenburg <martin@omnibond.com>
2016-09-21libnvdimm, namespace: debug invalid interleave-set-cookie valuesDan Williams
If platform firmware fails to populate unique / non-zero serial number data for each nvdimm in an interleave-set it may cause pmem region initialization to fail. Add a debug message for this case. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-21tools/testing/nvdimm: test get_config_size DSM failuresDan Williams
Add an nfit_test specific attribute for gating whether a get_config_size DSM, or any DSM for that matter, succeeds or fails. The get_config_size DSM is initial motivation since that is the first command libnvdimm core issues to determine the state of the namespace label area. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-21nfit: fail DSMs that return non-zero status by defaultDan Williams
For the DSMs where the kernel knows the format of the output buffer and originates those DSMs from within the kernel, return -EIO for any non-zero status. If the BIOS is indicating a status that we do not know how to handle, fail the DSM. Cc: <stable@vger.kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-21libnvdimm: fix devm_nvdimm_memremap() error pathDan Williams
The internal alloc_nvdimm_map() helper might fail, particularly if the memory region is already busy. Report request_mem_region() failures and check for the failure. Reported-by: Ryan Chen <ryan.chan105@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-21usb: misc: legousbtower: Fix NULL pointer deferenceGreg Kroah-Hartman
This patch fixes a NULL pointer dereference caused by a race codition in the probe function of the legousbtower driver. It re-structures the probe function to only register the interface after successfully reading the board's firmware ID. The probe function does not deregister the usb interface after an error receiving the devices firmware ID. The device file registered (/dev/usb/legousbtower%d) may be read/written globally before the probe function returns. When tower_delete is called in the probe function (after an r/w has been initiated), core dev structures are deleted while the file operation functions are still running. If the 0 address is mappable on the machine, this vulnerability can be used to create a Local Priviege Escalation exploit via a write-what-where condition by remapping dev->interrupt_out_buffer in tower_write. A forged USB device and local program execution would be required for LPE. The USB device would have to delay the control message in tower_probe and accept the control urb in tower_open whilst guest code initiated a write to the device file as tower_delete is called from the error in tower_probe. This bug has existed since 2003. Patch tested by emulated device. Reported-by: James Patrick-Evans <james@jmp-e.com> Tested-by: James Patrick-Evans <james@jmp-e.com> Signed-off-by: James Patrick-Evans <james@jmp-e.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-21raid5: handle register_shrinker failureShaohua Li
register_shrinker() now can fail. When it happens, shrinker.nr_deferred is null. We use it to determine if unregister_shrinker is required. Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21raid5: fix to detect failure of register_shrinkerChao Yu
register_shrinker can fail after commit 1d3d4437eae1 ("vmscan: per-node deferred work"), we should detect the failure of it, otherwise we may fail to register shrinker after raid5 configuration was setup successfully. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21md: fix a potential deadlockShaohua Li
lockdep reports a potential deadlock. Fix this by droping the mutex before md_import_device [ 1137.126601] ====================================================== [ 1137.127013] [ INFO: possible circular locking dependency detected ] [ 1137.127013] 4.8.0-rc4+ #538 Not tainted [ 1137.127013] ------------------------------------------------------- [ 1137.127013] mdadm/16675 is trying to acquire lock: [ 1137.127013] (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff81243cf3>] __blkdev_get+0x63/0x450 [ 1137.127013] but task is already holding lock: [ 1137.127013] (detected_devices_mutex){+.+.+.}, at: [<ffffffff81a5138c>] md_ioctl+0x2ac/0x1f50 [ 1137.127013] which lock already depends on the new lock. [ 1137.127013] the existing dependency chain (in reverse order) is: [ 1137.127013] -> #1 (detected_devices_mutex){+.+.+.}: [ 1137.127013] [<ffffffff810b6f19>] lock_acquire+0xb9/0x220 [ 1137.127013] [<ffffffff81c51647>] mutex_lock_nested+0x67/0x3d0 [ 1137.127013] [<ffffffff81a4eeaf>] md_autodetect_dev+0x3f/0x90 [ 1137.127013] [<ffffffff81595be8>] rescan_partitions+0x1a8/0x2c0 [ 1137.127013] [<ffffffff81590081>] __blkdev_reread_part+0x71/0xb0 [ 1137.127013] [<ffffffff815900e5>] blkdev_reread_part+0x25/0x40 [ 1137.127013] [<ffffffff81590c4b>] blkdev_ioctl+0x51b/0xa30 [ 1137.127013] [<ffffffff81242bf1>] block_ioctl+0x41/0x50 [ 1137.127013] [<ffffffff81214c96>] do_vfs_ioctl+0x96/0x6e0 [ 1137.127013] [<ffffffff81215321>] SyS_ioctl+0x41/0x70 [ 1137.127013] [<ffffffff81c56825>] entry_SYSCALL_64_fastpath+0x18/0xa8 [ 1137.127013] -> #0 (&bdev->bd_mutex){+.+.+.}: [ 1137.127013] [<ffffffff810b6af2>] __lock_acquire+0x1662/0x1690 [ 1137.127013] [<ffffffff810b6f19>] lock_acquire+0xb9/0x220 [ 1137.127013] [<ffffffff81c51647>] mutex_lock_nested+0x67/0x3d0 [ 1137.127013] [<ffffffff81243cf3>] __blkdev_get+0x63/0x450 [ 1137.127013] [<ffffffff81244307>] blkdev_get+0x227/0x350 [ 1137.127013] [<ffffffff812444f6>] blkdev_get_by_dev+0x36/0x50 [ 1137.127013] [<ffffffff81a46d65>] lock_rdev+0x35/0x80 [ 1137.127013] [<ffffffff81a49bb4>] md_import_device+0xb4/0x1b0 [ 1137.127013] [<ffffffff81a513d6>] md_ioctl+0x2f6/0x1f50 [ 1137.127013] [<ffffffff815909b3>] blkdev_ioctl+0x283/0xa30 [ 1137.127013] [<ffffffff81242bf1>] block_ioctl+0x41/0x50 [ 1137.127013] [<ffffffff81214c96>] do_vfs_ioctl+0x96/0x6e0 [ 1137.127013] [<ffffffff81215321>] SyS_ioctl+0x41/0x70 [ 1137.127013] [<ffffffff81c56825>] entry_SYSCALL_64_fastpath+0x18/0xa8 [ 1137.127013] other info that might help us debug this: [ 1137.127013] Possible unsafe locking scenario: [ 1137.127013] CPU0 CPU1 [ 1137.127013] ---- ---- [ 1137.127013] lock(detected_devices_mutex); [ 1137.127013] lock(&bdev->bd_mutex); [ 1137.127013] lock(detected_devices_mutex); [ 1137.127013] lock(&bdev->bd_mutex); [ 1137.127013] *** DEADLOCK *** Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21md/bitmap: fix wrong cleanupShaohua Li
if bitmap_create fails, the bitmap is already cleaned up and the returned value is an error number. We can't do the cleanup again. Reported-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21raid5: allow arbitrary max_hw_sectorsShaohua Li
raid5 will split bio to proper size internally, there is no point to use underlayer disk's max_hw_sectors. In my qemu system, without the change, the raid5 only receives 128k size bio, which reduces the chance of bio merge sending to underlayer disks. Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21lib/raid6: Add AVX512 optimized xor_syndrome functionsGayatri Kammela
Optimize RAID6 xor_syndrome functions to take advantage of the 512-bit ZMM integer instructions introduced in AVX512. AVX512 optimized xor_syndrome functions, which is simply based on sse2.c written by hpa. The patch was tested and benchmarked before submission on a hardware that has AVX512 flags to support such instructions Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Megha Dey <megha.dey@linux.intel.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21lib/raid6/test/Makefile: Add avx512 gen_syndrome and recovery functionsGayatri Kammela
Adding avx512 gen_syndrome and recovery functions so as to allow code to be compiled and tested successfully in userspace. This patch is tested in userspace and improvement in performace is observed. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Megha Dey <megha.dey@linux.intel.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21lib/raid6: Add AVX512 optimized recovery functionsGayatri Kammela
Optimize RAID6 recovery functions to take advantage of the 512-bit ZMM integer instructions introduced in AVX512. AVX512 optimized recovery functions, which is simply based on recov_avx2.c written by Jim Kukunas This patch was tested and benchmarked before submission on a hardware that has AVX512 flags to support such instructions Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Megha Dey <megha.dey@linux.intel.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21lib/raid6: Add AVX512 optimized gen_syndrome functionsGayatri Kammela
Optimize RAID6 gen_syndrom functions to take advantage of the 512-bit ZMM integer instructions introduced in AVX512. AVX512 optimized gen_syndrom functions, which is simply based on avx2.c written by Yuanhan Liu and sse2.c written by hpa. The patch was tested and benchmarked before submission on a hardware that has AVX512 flags to support such instructions Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Megha Dey <megha.dey@linux.intel.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21md-cluster: make resync lock also could be interrupttedGuoqing Jiang
When one node is perform resync or recovery, other nodes can't get resync lock and could block for a while before it holds the lock, so we can't stop array immediately for this scenario. To make array could be stop quickly, we check MD_CLOSING in dlm_lock_sync_interruptible to make us can interrupt the lock request. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hangGuoqing Jiang
When some node leaves cluster, then it's bitmap need to be synced by another node, so "md*_recover" thread is triggered for the purpose. However, with below steps. we can find tasks hang happened either in B or C. 1. Node A create a resyncing cluster raid1, assemble it in other two nodes (B and C). 2. stop array in B and C. 3. stop array in A. linux44:~ # ps aux|grep md|grep D root 5938 0.0 0.1 19852 1964 pts/0 D+ 14:52 0:00 mdadm -S md0 root 5939 0.0 0.0 0 0 ? D 14:52 0:00 [md0_recover] linux44:~ # cat /proc/5939/stack [<ffffffffa04cf321>] dlm_lock_sync+0x71/0x90 [md_cluster] [<ffffffffa04d0705>] recover_bitmaps+0x125/0x220 [md_cluster] [<ffffffffa052105d>] md_thread+0x16d/0x180 [md_mod] [<ffffffff8107ad94>] kthread+0xb4/0xc0 [<ffffffff8152a518>] ret_from_fork+0x58/0x90 linux44:~ # cat /proc/5938/stack [<ffffffff8107afde>] kthread_stop+0x6e/0x120 [<ffffffffa0519da0>] md_unregister_thread+0x40/0x80 [md_mod] [<ffffffffa04cfd20>] leave+0x70/0x120 [md_cluster] [<ffffffffa0525e24>] md_cluster_stop+0x14/0x30 [md_mod] [<ffffffffa05269ab>] bitmap_free+0x14b/0x150 [md_mod] [<ffffffffa0523f3b>] do_md_stop+0x35b/0x5a0 [md_mod] [<ffffffffa0524e83>] md_ioctl+0x873/0x1590 [md_mod] [<ffffffff81288464>] blkdev_ioctl+0x214/0x7d0 [<ffffffff811dd3dd>] block_ioctl+0x3d/0x40 [<ffffffff811b92d4>] do_vfs_ioctl+0x2d4/0x4b0 [<ffffffff811b9538>] SyS_ioctl+0x88/0xa0 [<ffffffff8152a5c9>] system_call_fastpath+0x16/0x1b The problem is caused by recover_bitmaps can't reliably abort when the thread is unregistered. So dlm_lock_sync_interruptible is introduced to detect the thread's situation to fix the problem. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21md-cluster: convert the completion to wait queueGuoqing Jiang
Previously, we used completion to sync between require dlm lock and sync_ast, however we will have to expose completion.wait and completion.done in dlm_lock_sync_interruptible (introduced later), it is not a common usage for completion, so convert related things to wait queue. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21md-cluster: protect md_find_rdev_nr_rcu with rcu lockGuoqing Jiang
We need to use rcu_read_lock/unlock to avoid potential race. Reported-by: Shaohua Li <shli@fb.com> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21md-cluster: clean related infos of clusterGuoqing Jiang
cluster_info and bitmap_info.nodes also need to be cleared when array is stopped. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21md: changes for MD_STILL_CLOSED flagGuoqing Jiang
When stop clustered raid while it is pending on resync, MD_STILL_CLOSED flag could be cleared since udev rule is triggered to open the mddev. So obviously array can't be stopped soon and returns EBUSY. mdadm -Ss md-raid-arrays.rules set MD_STILL_CLOSED md_open() ... ... ... clear MD_STILL_CLOSED do_md_stop We make below changes to resolve this issue: 1. rename MD_STILL_CLOSED to MD_CLOSING since it is set when stop array and it means we are stopping array. 2. let md_open returns early if CLOSING is set, so no other threads will open array if one thread is trying to close it. 3. no need to clear CLOSING bit in md_open because 1 has ensure the bit is cleared, then we also don't need to test CLOSING bit in do_md_stop. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21md-cluster: remove some unnecessary dlm_unlock_syncGuoqing Jiang
Since DLM_LKF_FORCEUNLOCK is used in lockres_free, we don't need to call dlm_unlock_sync before free lock resource. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>