summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-09-05rtc: mt6397: implement suspend/resume function in rtc-mt6397 driverHenry Chen
Implement the suspend/resume function in order to control rtc's irq_wake flag and handle as wakeup source. Signed-off-by: Henry Chen <henryc.chen@mediatek.com> Acked-by: Eddie Huang <eddie.huang@mediatek.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05rtc: switch to using is_visible() to control sysfs attributesDmitry Torokhov
Instead of creating wakealarm attribute manually, after the device has been registered, let's rely on facilities provided by the attribute groups to control which attributes are visible and which are not. This allows to create all needed attributes at once, at the same time that we register RTC class device. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05rtc: switch wakealarm attribute to DEVICE_ATTR_RWDmitry Torokhov
Instead of using older style DEVICE_ATTR for wakealarm attribute let's switch to using DEVICE_ATTR_RW that ensures consistent across the kernel permissions on the attribute. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05rtc: make rtc_does_wakealarm() return booleanDmitry Torokhov
Users of rtc_does_wakealarm() return value treat it as boolean so let's change the signature accordingly. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05rtc: rx8025: remove obsolete local_irq_disable() and local_irq_enable() for ↵Henri Roosen
rtc_update_irq() Since commit e6229bec25be ("rtc: make rtc_update_irq callable with irqs enabled") rtc_update_irq() is callable with irqs enabled. Signed-off-by: Henri Roosen <henriroosen@gmail.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05rtc: fix drivers that consider 0 as a valid IRQ in client->irqOctavian Purdila
Since dab472eb931b ("i2c / ACPI: Use 0 to indicate that device does not have interrupt assigned"), 0 is not a valid i2c client irq anymore, so change all driver's checks accordingly. The same issue occurs when the device is instantiated via device tree with no IRQ, or from the i2c sysfs interface, even before the patch above. Signed-off-by: Octavian Purdila <octavian.purdila@intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05rtc: dev: properly manage lifetime of dev and cdev in rtc deviceDmitry Torokhov
struct rtc embeds both struct dev and struct cdev. Unfortunately character device structure may outlive the parent rtc structure unless we set it up as parent of character device so that it will stay pinned until character device is freed. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05rtc: class: remove unnecessary device_get() in rtc_device_unregisterDmitry Torokhov
Technically the address of rtc->dev can never be NULL, so get_device() can never fail. Also caller of rtc_device_unregister() supposed to be the owner of the device and thus have a valid reference. Therefore call to get_device() is not needed here. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05rtc: class: fix double free in rtc_register_device() error pathDmitry Torokhov
Commit 59cca865f21e ("drivers/rtc/class.c: fix device_register() error handling") correctly noted that naked kfree() should not be used after failed device_register() call, however, while it added the needed put_device() it forgot to remove the original kfree() causing double-free. Cc: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05rtc: sirfsoc: move to regmap APIs from platform-specific APIsGuo Zeng
The current codes use CSR platform specific API exported by machine codes to read/write RTC registers. they are: sirfsoc_rtc_iobrg_readl() sirfsoc_rtc_iobrg_writel() commit b1999477ed91 ("ARM: prima2: move to use REGMAP APIs for rtciobrg") moves to regmap support, now we can move to use regmap APIs in RTC driver. Signed-off-by: Guo Zeng <guo.zeng@csr.com> Signed-off-by: Barry Song <Baohua.Song@csr.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05rtc: opal: Enable alarms only when opal supports tpoVaibhav Jain
rtc-opal driver provides support for rtc alarms via timed-power-on(tpo). However some Power platforms like BML use a fake rtc clock and don't support tpo. Such platforms are indicated by the missing 'has-tpo' property in the device tree. Current implementation however enables callback for rtc_class_ops.read/set alarm irrespective of the tpo support from the platform. This results in a failed opal call when kernel tries to read an existing alarms via opal_get_tpo_time during rtc device registration. This patch fixes this issue by setting opal_rtc_ops.read/set_alarm callback pointers only when tpo is supported. Acked-by: Michael Neuling <mikey@neuling.org> Acked-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05rtc: add rtc-lpc24xx driverJoachim Eastwood
Add driver for the RTC found on NXP LPC178x/18xx/408x/43xx devices. The RTC provides calendar and clock functionality together with alarm interrupt support. Signed-off-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05doc: dt: add documentation for nxp,lpc1788-rtcJoachim Eastwood
Document NXP LPC178x/18xx/408x/43xx bindings Signed-off-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05rtc: Drop owner assignment from platform_driverKrzysztof Kozlowski
platform_driver does not need to set an owner because platform_driver_register() will set it. Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05rtc: Drop owner assignment from i2c_driverKrzysztof Kozlowski
i2c_driver does not need to set an owner because i2c_register_driver() will set it. Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05rtc: pcf2127: use OFS flag to detect unreliable date and warn the userAndrea Scian
The PCF2127 datasheet states that it's wrong to say that the date in unreliable if BLF (battery low flag) is set but instead, OSF (seconds register) should be used to check if oscillator, for any reason, stopped. Battery may be low (usually below 2V5 threshold) but the date may be anyway correct (typically date is unreliable when input voltage is below 1V2). Signed-off-by: Andrea Scian <andrea.scian@dave.eu> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05rtc: use rtc_valid_tm() error code when reading date/timeAndrea Scian
There's a wrong comment in some RTC drivers that say it's better to ignore rtc_valid_tm() when reading RTC timestamp. However this is wrong and is better to return to the userspace the error if timestamp is not valid. Signed-off-by: Andrea Scian <andrea.scian@dave.eu> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05rtc: 88pm80x: add device tree supportVaibhav Hiremath
Along with DT support, this patch also cleans up the unnecessary code around 'rtc_wakeup' initialization. Signed-off-by: Chao Xie <chao.xie@marvell.com> Signed-off-by: Vaibhav Hiremath <vaibhav.hiremath@linaro.org> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05rtc: bq32k: remove redundant checkManinder Singh
removing below static analysis error: (error) Possible null pointer dereference: client if (!i2c_check_functionality(client->adapter, I2C_FUNC_I2C)) ^^^^^^^ Error comes because client is dereferenced before NULL check. So probably NULL this check is not required. Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05rtc: ds1685: Use module_platform_driverVaishali Thakkar
Use module_platform_driver for drivers whose init and exit functions only register and unregister, respectively. A simplified version of the Coccinelle semantic patch that performs this transformation is as follows: @a@ identifier f, x; @@ -static f(...) { return platform_driver_register(&x); } @b depends on a@ identifier e, a.x; @@ -static e(...) { platform_driver_unregister(&x); } @c depends on a && b@ identifier a.f; declarer name module_init; @@ -module_init(f); @d depends on a && b && c@ identifier b.e, a.x; declarer name module_exit; declarer name module_platform_driver; @@ -module_exit(e); +module_platform_driver(x); Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05rtc: ds1307: Support optional wakeup interrupt sourceNishanth Menon
With the recent pinctrl-single changes, SoCs such as Texas Instrument's OMAP processors can treat wake-up events from deeper idle states as interrupts. Let's add support for the optional second interrupt for wake-up using the generic wakeirq support added in commit 4990d4fe327b ("PM / Wakeirq: Add automated device wake IRQ handling") Finally, to pass the wake-up interrupt in the dts file, interrupts-extended property needs to be passed. This is similar in approach to commit 2a0b965cfb6e ("serial: omap: Add support for optional wake-up") + ee83bd3b6483 ("serial: omap: Switch wake-up interrupt to generic wakeirq") Signed-off-by: Nishanth Menon <nm@ti.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Tony Lindgren <tony@atomide.com> Acked-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05rtc: ds1307: Sort the headersNishanth Menon
It is always a good practice to keep the #includes sorted Signed-off-by: Nishanth Menon <nm@ti.com> Acked-by: Felipe Balbi <balbi@ti.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05rtc: ds1307: Switch to managed irq allocationNishanth Menon
Since we are not doing anything fancy in remove function that requires us to sequence IRQ free operation, we might as well switch over to devm_ equivalent of managed IRQ allocation and remove the explicit free_irq since it'd be done automatically at remove. Signed-off-by: Nishanth Menon <nm@ti.com> Acked-by: Felipe Balbi <balbi@ti.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05rtc: ds1307: Convert to threaded IRQFelipe Balbi
The driver currently emulates the concept of threaded IRQ using a workqueue, which it really does not need to. Instead, switch over to threaded_irq handlers which is meant precisely for the same purpose. Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Nishanth Menon <nm@ti.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-09-05Merge linux-block/for-4.3/core into md/for-linuxNeilBrown
There were a few conflicts that are fairly easy to resolve. Signed-off-by: NeilBrown <neilb@suse.com>
2015-09-04mm/hugetlb.c: make vma_has_reserves() return boolNicholas Krause
This makes vma_has_reserves() return bool due to this particular function only returning either one or zero as its return value. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04mm/madvise.c: make madvise_behaviour_valid() return boolNicholas Krause
This makes the madvise_bahaviour_valid() function return bool due to this particular function always returning the value of either one or zero as its return value. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04mm/memory.c: make tlb_next_batch() return boolNicholas Krause
This makes the tlb_next_batch() bool due to this particular function only ever returning either one or zero as its return value. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04mm/dmapool.c: change is_page_busy() return from int to boolNicholas Krause
This makes the function is_page_busy() return bool rather then an int now due to this particular function's single return statement only ever evaulating to either one or zero. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04mm: remove struct node_active_regionminkyung88.kim
struct node_active_region is not used anymore. Remove it. Signed-off-by: minkyung88.kim <minkyung88.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04mremap: simplify the "overlap" check in mremap_to()Oleg Nesterov
Minor, but this check is overcomplicated. Two half-intervals do NOT overlap if END1 <= START2 || END2 <= START1, mremap_to() just needs to negate this check. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04mremap: don't do uneccesary checks if new_len == old_lenOleg Nesterov
The "new_len > old_len" branch in vma_to_resize() looks very confusing. It only covers the VM_DONTEXPAND/pgoff checks but everything below is equally unneeded if new_len == old_len. Change this code to return if "new_len == old_len", new_len < old_len is not possible, otherwise the code below is wrong anyway. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04mremap: don't do mm_populate(new_addr) on failureOleg Nesterov
move_vma() sets *locked even if move_page_tables() or ->mremap() fails, change sys_mremap() to check "ret & ~PAGE_MASK". I think we should simply remove the VM_LOCKED code in move_vma(), that is why this patch doesn't change move_vma(). But this needs more cleanups. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04mm: move ->mremap() from file_operations to vm_operations_structOleg Nesterov
vma->vm_ops->mremap() looks more natural and clean in move_vma(), and this way ->mremap() can have more users. Say, vdso. While at it, s/aio_ring_remap/aio_ring_mremap/. Note: this is the minimal change before ->mremap() finds another user in file_operations; this method should have more arguments, and it can be used to kill arch_remap(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04mremap: don't leak new_vma if f_op->mremap() failsOleg Nesterov
move_vma() can't just return if f_op->mremap() fails, we should unmap the new vma like we do if move_page_tables() fails. To avoid the code duplication this patch moves the "move entries back" under the new "if (err)" branch. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04mm/hugetlb.c: make vma_shareable() return boolNicholas Krause
This makes vma_shareable() return bool now due to this particular function only ever returning either one or zero as its return value. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04mm: make GUP handle pfn mapping unless FOLL_GET is requestedKirill A. Shutemov
With DAX, pfn mapping becoming more common. The patch adjusts GUP code to cover pfn mapping for cases when we don't need struct page to proceed. To make it possible, let's change follow_page() code to return -EEXIST error code if proper page table entry exists, but no corresponding struct page. __get_user_page() would ignore the error code and move to the next page frame. The immediate effect of the change is working MAP_POPULATE and mlock() on DAX mappings. [akpm@linux-foundation.org: fix arm64 build] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Acked-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04mm: fix status code which move_pages() returns for zero pageKirill A. Shutemov
The manpage for move_pages(2) specifies that status code for zero page is supposed to be -EFAULT. Currently kernel return -ENOENT in this case. follow_page() can do it for us, if we would ask for FOLL_DUMP. The use of FOLL_DUMP also means that the upper layer page tables pages are no longer allocated. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Christoph Lameter <cl@linux.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04mm: memcontrol: bring back the VM_BUG_ON() in mem_cgroup_swapout()Sebastian Andrzej Siewior
Clark stumbled over a VM_BUG_ON() in -RT which was then was removed by Johannes in commit f371763a79d ("mm: memcontrol: fix false-positive VM_BUG_ON() on -rt"). The comment before that patch was a tiny bit better than it is now. While the patch claimed to fix a false-postive on -RT this was not the case. None of the -RT folks ACKed it and it was not a false positive report. That was a *real* problem. This patch updates the comment that is improper because it refers to "disabled preemption" as a consequence of that lock being taken. A spin_lock() disables preemption, true, but in this case the code relies on the fact that the lock _also_ disables interrupts once it is acquired. And this is the important detail (which was checked the VM_BUG_ON()) which needs to be pointed out. This is the hint one needs while looking at the code. It was explained by Johannes on the list that the per-CPU variables are protected by local_irq_save(). The BUG_ON() was helpful. This code has been workarounded in -RT in the meantime. I wouldn't mind running into more of those if the code in question uses *special* kind of locking since now there is no verification (in terms of lockdep or BUG_ON()) and therefore I bring the VM_BUG_ON() check back in. The two functions after the comment could also have a "local_irq_save()" dance around them in order to serialize access to the per-CPU variables. This has been avoided because the interrupts should be off. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Clark Williams <williams@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04genalloc: add support of multiple gen_pools per deviceVladimir Zapolskiy
This change fills devm_gen_pool_create()/gen_pool_get() "name" argument stub with contents and extends of_gen_pool_get() functionality on this basis. If there is no associated platform device with a device node passed to of_gen_pool_get(), the function attempts to get a label property or device node name (= repeats MTD OF partition standard) and seeks for a named gen_pool registered by device of the parent device node. The main idea of the change is to allow registration of independent gen_pools under the same umbrella device, say "partitions" on "storage device", the original functionality of one "partition" per "storage device" is untouched. [akpm@linux-foundation.org: fix constness in devres_find()] [dan.carpenter@oracle.com: freeing const data pointers] Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com> Cc: Philipp Zabel <p.zabel@pengutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Sascha Hauer <kernel@pengutronix.de> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04genalloc: add name arg to gen_pool_get() and devm_gen_pool_create()Vladimir Zapolskiy
This change modifies gen_pool_get() and devm_gen_pool_create() client interfaces adding one more argument "name" of a gen_pool object. Due to implementation gen_pool_get() is capable to retrieve only one gen_pool associated with a device even if multiple gen_pools are created, fortunately right at the moment it is sufficient for the clients, hence provide NULL as a valid argument on both producer devm_gen_pool_create() and consumer gen_pool_get() sides. Because only one created gen_pool per device is addressable, explicitly add a restriction to devm_gen_pool_create() to create only one gen_pool per device, this implies two possible error codes returned by the function, account it on client side (only misc/sram). This completes client side changes related to genalloc updates. [akpm@linux-foundation.org: gen_pool_get() cleanup] Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com> Cc: Philipp Zabel <p.zabel@pengutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Sascha Hauer <kernel@pengutronix.de> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04mm/memblock: WARN_ON when nid differs from overlap regionWei Yang
Each memblock_region has nid to indicates the Node ID of this range. For the overlap case, memblock_add_range() inserts the lower part and leave the upper part as indicated in the overlapped region. If the nid of the new range differs from the overlapped region, the information recorded is not correct. This patch adds a WARN_ON when the nid of the new range differs from the overlapped region. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04Documentation/features/vm: add feature description and arch support status ↵Mel Gorman
for batched TLB flush after unmap Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04mm: defer flush of writable TLB entriesMel Gorman
If a PTE is unmapped and it's dirty then it was writable recently. Due to deferred TLB flushing, it's best to assume a writable TLB cache entry exists. With that assumption, the TLB must be flushed before any IO can start or the page is freed to avoid lost writes or data corruption. This patch defers flushing of potentially writable TLBs as long as possible. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04mm: send one IPI per CPU to TLB flush all entries after unmapping pagesMel Gorman
An IPI is sent to flush remote TLBs when a page is unmapped that was potentially accesssed by other CPUs. There are many circumstances where this happens but the obvious one is kswapd reclaiming pages belonging to a running process as kswapd and the task are likely running on separate CPUs. On small machines, this is not a significant problem but as machine gets larger with more cores and more memory, the cost of these IPIs can be high. This patch uses a simple structure that tracks CPUs that potentially have TLB entries for pages being unmapped. When the unmapping is complete, the full TLB is flushed on the assumption that a refill cost is lower than flushing individual entries. Architectures wishing to do this must give the following guarantee. If a clean page is unmapped and not immediately flushed, the architecture must guarantee that a write to that linear address from a CPU with a cached TLB entry will trap a page fault. This is essentially what the kernel already depends on but the window is much larger with this patch applied and is worth highlighting. The architecture should consider whether the cost of the full TLB flush is higher than sending an IPI to flush each individual entry. An additional architecture helper called flush_tlb_local is required. It's a trivial wrapper with some accounting in the x86 case. The impact of this patch depends on the workload as measuring any benefit requires both mapped pages co-located on the LRU and memory pressure. The case with the biggest impact is multiple processes reading mapped pages taken from the vm-scalability test suite. The test case uses NR_CPU readers of mapped files that consume 10*RAM. Linear mapped reader on a 4-node machine with 64G RAM and 48 CPUs 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 Ops lru-file-mmap-read-elapsed 159.62 ( 0.00%) 120.68 ( 24.40%) Ops lru-file-mmap-read-time_range 30.59 ( 0.00%) 2.80 ( 90.85%) Ops lru-file-mmap-read-time_stddv 6.70 ( 0.00%) 0.64 ( 90.38%) 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 User 581.00 611.43 System 5804.93 4111.76 Elapsed 161.03 122.12 This is showing that the readers completed 24.40% faster with 29% less system CPU time. From vmstats, it is known that the vanilla kernel was interrupted roughly 900K times per second during the steady phase of the test and the patched kernel was interrupts 180K times per second. The impact is lower on a single socket machine. 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 Ops lru-file-mmap-read-elapsed 25.33 ( 0.00%) 20.38 ( 19.54%) Ops lru-file-mmap-read-time_range 0.91 ( 0.00%) 1.44 (-58.24%) Ops lru-file-mmap-read-time_stddv 0.28 ( 0.00%) 0.47 (-65.34%) 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 User 58.09 57.64 System 111.82 76.56 Elapsed 27.29 22.55 It's still a noticeable improvement with vmstat showing interrupts went from roughly 500K per second to 45K per second. The patch will have no impact on workloads with no memory pressure or have relatively few mapped pages. It will have an unpredictable impact on the workload running on the CPU being flushed as it'll depend on how many TLB entries need to be refilled and how long that takes. Worst case, the TLB will be completely cleared of active entries when the target PFNs were not resident at all. [sasha.levin@oracle.com: trace tlb flush after disabling preemption in try_to_unmap_flush] Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04x86, mm: trace when an IPI is about to be sentMel Gorman
When unmapping pages it is necessary to flush the TLB. If that page was accessed by another CPU then an IPI is used to flush the remote CPU. That is a lot of IPIs if kswapd is scanning and unmapping >100K pages per second. There already is a window between when a page is unmapped and when it is TLB flushed. This series increases the window so multiple pages can be flushed using a single IPI. This should be safe or the kernel is hosed already. Patch 1 simply made the rest of the series easier to write as ftrace could identify all the senders of TLB flush IPIS. Patch 2 tracks what CPUs potentially map a PFN and then sends an IPI to flush the entire TLB. Patch 3 tracks when there potentially are writable TLB entries that need to be batched differently Patch 4 increases SWAP_CLUSTER_MAX to further batch flushes The performance impact is documented in the changelogs but in the optimistic case on a 4-socket machine the full series reduces interrupts from 900K interrupts/second to 60K interrupts/second. This patch (of 4): It is easy to trace when an IPI is received to flush a TLB but harder to detect what event sent it. This patch makes it easy to identify the source of IPIs being transmitted for TLB flushes on x86. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04userfaultfd: selftestAndrea Arcangeli
This test allocates two virtual areas and bounces the physical memory across the two virtual areas using only userfaultfd. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Shuah Khan <shuah.kh@samsung.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04userfaultfd: avoid missing wakeups during refile in userfaultfd_readAndrea Arcangeli
During the refile in userfaultfd_read both waitqueues could look empty to the lockless wake_userfault(). Use a seqcount to prevent this false negative that could leave an userfault blocked. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04userfaultfd: propagate the full address in THP faultsAndrea Arcangeli
The THP faults were not propagating the original fault address. The latest version of the API with uffd.arg.pagefault.address is supposed to propagate the full address through THP faults. This was not a kernel crashing bug and it wouldn't risk to corrupt user memory, but it would cause a SIGBUS failure because the wrong page was being copied. For various reasons this wasn't easily reproducible in the qemu workload, but the strestest exposed the problem immediately. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04userfaultfd: allow signals to interrupt a userfaultAndrea Arcangeli
This is only simple to achieve if the userfault is going to return to userland (not to the kernel) because we can avoid returning VM_FAULT_RETRY despite we temporarily released the mmap_sem. The fault would just be retried by userland then. This is safe at least on x86 and powerpc (the two archs with the syscall implemented so far). Hint to verify for which archs this is safe: after handle_mm_fault returns, no access to data structures protected by the mmap_sem must be done by the fault code in arch/*/mm/fault.c until up_read(&mm->mmap_sem) is called. This has two main benefits: signals can run with lower latency in production (signals aren't blocked by userfaults and userfaults are immediately repeated after signal processing) and gdb can then trivially debug the threads blocked in this kind of userfaults coming directly from userland. On a side note: while gdb has a need to get signal processed, coredumps always worked perfectly with userfaults, no matter if the userfault is triggered by GUP a kernel copy_user or directly from userland. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>