summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2013-04-29Merge branch 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
Pull workqueue updates from Tejun Heo: "A lot of activities on workqueue side this time. The changes achieve the followings. - WQ_UNBOUND workqueues - the workqueues which are per-cpu - are updated to be able to interface with multiple backend worker pools. This involved a lot of churning but the end result seems actually neater as unbound workqueues are now a lot closer to per-cpu ones. - The ability to interface with multiple backend worker pools are used to implement unbound workqueues with custom attributes. Currently the supported attributes are the nice level and CPU affinity. It may be expanded to include cgroup association in future. The attributes can be specified either by calling apply_workqueue_attrs() or through /sys/bus/workqueue/WQ_NAME/* if the workqueue in question is exported through sysfs. The backend worker pools are keyed by the actual attributes and shared by any workqueues which share the same attributes. When attributes of a workqueue are changed, the workqueue binds to the worker pool with the specified attributes while leaving the work items which are already executing in its previous worker pools alone. This allows converting custom worker pool implementations which want worker attribute tuning to use workqueues. The writeback pool is already converted in block tree and there are a couple others are likely to follow including btrfs io workers. - WQ_UNBOUND's ability to bind to multiple worker pools is also used to make it NUMA-aware. Because there's no association between work item issuer and the specific worker assigned to execute it, before this change, using unbound workqueue led to unnecessary cross-node bouncing and it couldn't be helped by autonuma as it requires tasks to have implicit node affinity and workers are assigned randomly. After these changes, an unbound workqueue now binds to multiple NUMA-affine worker pools so that queued work items are executed in the same node. This is turned on by default but can be disabled system-wide or for individual workqueues. Crypto was requesting NUMA affinity as encrypting data across different nodes can contribute noticeable overhead and doing it per-cpu was too limiting for certain cases and IO throughput could be bottlenecked by one CPU being fully occupied while others have idle cycles. While the new features required a lot of changes including restructuring locking, it didn't complicate the execution paths much. The unbound workqueue handling is now closer to per-cpu ones and the new features are implemented by simply associating a workqueue with different sets of backend worker pools without changing queue, execution or flush paths. As such, even though the amount of change is very high, I feel relatively safe in that it isn't likely to cause subtle issues with basic correctness of work item execution and handling. If something is wrong, it's likely to show up as being associated with worker pools with the wrong attributes or OOPS while workqueue attributes are being changed or during CPU hotplug. While this creates more backend worker pools, it doesn't add too many more workers unless, of course, there are many workqueues with unique combinations of attributes. Assuming everything else is the same, NUMA awareness costs an extra worker pool per NUMA node with online CPUs. There are also a couple things which are being routed outside the workqueue tree. - block tree pulled in workqueue for-3.10 so that writeback worker pool can be converted to unbound workqueue with sysfs control exposed. This simplifies the code, makes writeback workers NUMA-aware and allows tuning nice level and CPU affinity via sysfs. - The conversion to workqueue means that there's no 1:1 association between a specific worker, which makes writeback folks unhappy as they want to be able to tell which filesystem caused a problem from backtrace on systems with many filesystems mounted. This is resolved by allowing work items to set debug info string which is printed when the task is dumped. As this change involves unifying implementations of dump_stack() and friends in arch codes, it's being routed through Andrew's -mm tree." * 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (84 commits) workqueue: use kmem_cache_free() instead of kfree() workqueue: avoid false negative WARN_ON() in destroy_workqueue() workqueue: update sysfs interface to reflect NUMA awareness and a kernel param to disable NUMA affinity workqueue: implement NUMA affinity for unbound workqueues workqueue: introduce put_pwq_unlocked() workqueue: introduce numa_pwq_tbl_install() workqueue: use NUMA-aware allocation for pool_workqueues workqueue: break init_and_link_pwq() into two functions and introduce alloc_unbound_pwq() workqueue: map an unbound workqueues to multiple per-node pool_workqueues workqueue: move hot fields of workqueue_struct to the end workqueue: make workqueue->name[] fixed len workqueue: add workqueue->unbound_attrs workqueue: determine NUMA node of workers accourding to the allowed cpumask workqueue: drop 'H' from kworker names of unbound worker pools workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[] workqueue: move pwq_pool_locking outside of get/put_unbound_pool() workqueue: fix memory leak in apply_workqueue_attrs() workqueue: fix unbound workqueue attrs hashing / comparison workqueue: fix race condition in unbound workqueue free path workqueue: remove pwq_lock which is no longer used ...
2013-04-29Merge branch 'for-3.10-async' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull async update from Tejun Heo: "This contains three cleanup patches for async from Lai. All three patches are essentially cosmetic." * 'for-3.10-async' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: async: rename and redefine async_func_ptr async: remove unused @node from struct async_domain async: simplify lowest_in_progress()
2013-04-29Merge branch 'akpm' (incoming from Andrew)Linus Torvalds
Merge first batch of fixes from Andrew Morton: - A couple of kthread changes - A few minor audit patches - A number of fbdev patches. Florian remains AWOL so I'm picking up some of these. - A few kbuild things - ocfs2 updates - Almost all of the MM queue (And in the meantime, I already have the second big batch from Andrew pending in my mailbox ;^) * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (149 commits) memcg: take reference before releasing rcu_read_lock mem hotunplug: fix kfree() of bootmem memory mmKconfig: add an option to disable bounce mm, nobootmem: do memset() after memblock_reserve() mm, nobootmem: clean-up of free_low_memory_core_early() fs/buffer.c: remove unnecessary init operation after allocating buffer_head. numa, cpu hotplug: change links of CPU and node when changing node number by onlining CPU mm: fix memory_hotplug.c printk format warning mm: swap: mark swap pages writeback before queueing for direct IO swap: redirty page if page write fails on swap file mm, memcg: give exiting processes access to memory reserves thp: fix huge zero page logic for page with pfn == 0 memcg: avoid accessing memcg after releasing reference fs: fix fsync() error reporting memblock: fix missing comment of memblock_insert_region() mm: Remove unused parameter of pages_correctly_reserved() firmware, memmap: fix firmware_map_entry leak mm/vmstat: add note on safety of drain_zonestat mm: thp: add split tail pages to shrink page list in page reclaim mm: allow for outstanding swap writeback accounting ...
2013-04-29Merge tag 'clk-for-linus-3.10' of git://git.linaro.org/people/mturquette/linuxLinus Torvalds
Pull clock framework update from Michael Turquette: "The common clock framework changes for 3.10 include many fixes for existing platforms, as well as adoption of the framework by new platforms and devices. Some long-needed fixes to the core framework are here as well as new features such as improved initialization of clocks from DT as well as framework reentrancy for nested clock operations." * tag 'clk-for-linus-3.10' of git://git.linaro.org/people/mturquette/linux: (44 commits) clk: add clk_ignore_unused option to keep boot clocks on clk: ux500: fix mismatched types clk: vexpress: Add separate SP810 driver clk: si5351: make clk-si5351 depend on CONFIG_OF clk: export __clk_get_flags for modular clock providers clk: vt8500: Missing breaks in vtwm_pll_round_rate/_set_rate. clk: sunxi: Unify oscillator clock clk: composite: allow fixed rates & fixed dividers clk: composite: rename 'div' references to 'rate' clk: add si5351 i2c common clock driver clk: add device tree fixed-factor-clock binding support clk: Properly handle notifier return values clk: ux500: abx500: Define clock tree for ab850x clk: ux500: Add support for sysctrl clocks clk: mvebu: Fix valid value range checking for cpu_freq_select clk: Fixup locking issues for clk_set_parent clk: Fixup errorhandling for clk_set_parent clk: Restructure code for __clk_reparent clk: sunxi: drop an unnecesary kmalloc clk: sunxi: drop CLK_IGNORE_UNUSED ...
2013-04-29Merge tag 'spi-v3.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi updates from Mark Brown: "A fairly quiet release for SPI, mainly driver work. A few highlights: - Supports bits per word compatibility checking in the core. - Allow use of the IP used in Freescale SPI controllers outside Freescale SoCs. - DMA support for the Atmel SPI driver. - New drivers for the BCM2835 and Tegra114" * tag 'spi-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (68 commits) spi-topcliff-pch: fix to use list_for_each_entry_safe() when delete list items spi-topcliff-pch: missing platform_driver_unregister() on error in pch_spi_init() ARM: dts: add pinctrl property for spi node for atmel SoC ARM: dts: add spi nodes for the atmel boards ARM: dts: add spi nodes for atmel SoC ARM: at91: add clocks for spi dt entries spi/spi-atmel: add dmaengine support spi/spi-atmel: add flag to controller data for lock operations spi/spi-atmel: add physical base address spi/sirf: fix MODULE_DEVICE_TABLE MAINTAINERS: Add git repository and update my address spi/s3c64xx: Check for errors in dmaengine prepare_transfer() spi/s3c64xx: Fix non-dmaengine usage spi: omap2-mcspi: fix error return code in omap2_mcspi_probe() spi/s3c64xx: let device core setup the default pin configuration MAINTAINERS: Update Grant's email address and maintainership spi: omap2-mcspi: Fix transfers if DMADEVICES is not set spi: s3c64xx: move to generic dmaengine API spi-gpio: init CS before spi_bitbang_setup() spi: spi-mpc512x-psc: let transmiter/receiver enabled when in xfer loop ...
2013-04-29Merge tag 'regulator-v3.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator updates from Mark Brown: "The diffstat and changelog here is dominated by Lee Jones' heroic efforts to sync the ab8500 driver that's been maintained out of tree with mainline (plus Axel's cleanup work on the results) but there's a few other things here: - Axel Lin added regulator_map_voltage_ascend() optimising a common pattern for drivers using the core code. - Milo Kim tought the regulator core to handle regulators sharing an enable GPIO, avoiding the need to do hacks to support such systems. - Andrew Bresticker added code to handle missing supplies for regulators more sensibly for device tree systems, reducing the need for stubbing there. plus the usual batch of driver specific updates and fixes" * tag 'regulator-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (152 commits) regulator: mc13892: Fix MC13892_SWITCHERS0_SWxHI bit in set_voltage_sel regulator: Remove NULL test before calling regulator_unregister() regulator: mc13783: Add device tree probe support regulator: mc13xxx: Add warning of incorrect names of regulators regulator: max77686: Don't update max77686->opmode if update register fails regulator: max8952: Add missing config.of_node setting for regulator register regulator: ab3100: Fix regulator register error handling regulator: tps6524x: Use regulator_map_voltage_ascend regulator: lp8788-buck: Use regulator_map_voltage_ascend regulator: lp872x: Use regulator_map_voltage_ascend regulator: mc13892: Use regulator_map_voltage_ascend for mc13892_sw_regulator_ops regulator: tps65023: Use regulator_map_voltage_ascend regulator: tps65023: Merge tps65020 ldo1 and ldo2 vsel table regulator: tps6507x: Use regulator_map_voltage_ascend regulator: mc13892: Fix MC13892_SWITCHERS0_SWxHI bit in set_voltage_sel regulator: ab3100: device tree support regulator: ab3100: refactor probe to use IDs regulator: max8973: Don't override control1 variable when set ramp delay bits regulator: tps80031: Convert tps80031_dcdc_ops to [get|set]_voltage_sel_regmap regulator: tps80031: Fix LDO2 track mode for TPS80031 or TPS80032-ES1.0 ...
2013-04-29Merge tag 'regmap-v3.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap updates from Mark Brown: "In user visible terms just a couple of enhancements here, though there was a moderate amount of refactoring required in order to support the register cache sync performance improvements. - Support for block and asynchronous I/O during register cache syncing; this provides a use case dependant performance improvement. - Additional debugfs information on the memory consuption and register set" * tag 'regmap-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: (23 commits) regmap: don't corrupt work buffer in _regmap_raw_write() regmap: cache: Fix format specifier in dev_dbg regmap: cache: Make regcache_sync_block_raw static regmap: cache: Write consecutive registers in a single block write regmap: cache: Split raw and non-raw syncs regmap: cache: Factor out block sync regmap: cache: Factor out reg_present support from rbtree cache regmap: cache: Use raw I/O to sync rbtrees if we can regmap: core: Provide regmap_can_raw_write() operation regmap: cache: Provide a get address of value operation regmap: Cut down on the average # of nodes in the rbtree cache regmap: core: Make raw write available to regcache regmap: core: Warn on invalid operation combinations regmap: irq: Clarify error message when we fail to request primary IRQ regmap: rbtree Expose total memory consumption in the rbtree debugfs entry regmap: debugfs: Add a registers `range' file regmap: debugfs: Simplify calculation of `c->max_reg' regmap: cache: Store caches in native register format where possible regmap: core: Split out in place value parsing regmap: cache: Use regcache_get_value() to check if we updated ...
2013-04-29mm, nobootmem: clean-up of free_low_memory_core_early()Joonsoo Kim
Remove unused argument and make function static, because there is no user outside of nobootmem.c Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Yinghai Lu <yinghai@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Jiang Liu <liuj97@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm: thp: add split tail pages to shrink page list in page reclaimShaohua Li
In page reclaim, huge page is split. split_huge_page() adds tail pages to LRU list. Since we are reclaiming a huge page, it's better we reclaim all subpages of the huge page instead of just the head page. This patch adds split tail pages to shrink page list so the tail pages can be reclaimed soon. Before this patch, run a swap workload: thp_fault_alloc 3492 thp_fault_fallback 608 thp_collapse_alloc 6 thp_collapse_alloc_failed 0 thp_split 916 With this patch: thp_fault_alloc 4085 thp_fault_fallback 16 thp_collapse_alloc 90 thp_collapse_alloc_failed 0 thp_split 1272 fallback allocation is reduced a lot. [akpm@linux-foundation.org: fix CONFIG_SWAP=n build] Signed-off-by: Shaohua Li <shli@fusionio.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm: allow for outstanding swap writeback accountingSeth Jennings
To prevent flooding the swap device with writebacks, frontswap backends need to count and limit the number of outstanding writebacks. The incrementing of the counter can be done before the call to __swap_writepage(). However, the caller must receive a notification when the writeback completes in order to decrement the counter. To achieve this functionality, this patch modifies __swap_writepage() to take the bio completion callback function as an argument. end_swap_bio_write(), the normal bio completion function, is also made non-static so that code doing the accounting can call it after the accounting is done. There should be no behavioural change to existing code. Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Acked-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm: break up swap_writepage() for frontswap backendsSeth Jennings
swap_writepage() is currently where frontswap hooks into the swap write path to capture pages with the frontswap_store() function. However, if a frontswap backend wants to "resume" the writeback of a page to the swap device, it can't call swap_writepage() as the page will simply reenter the backend. This patch separates swap_writepage() into a top and bottom half, the bottom half named __swap_writepage() to allow a frontswap backend, like zswap, to resume writeback beyond the frontswap_store() hook. __add_to_swap_cache() is also made non-static so that the page for which writeback is to be resumed can be added to the swap cache. Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Acked-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29memcg: add memory.pressure_level eventsAnton Vorontsov
With this patch userland applications that want to maintain the interactivity/memory allocation cost can use the pressure level notifications. The levels are defined like this: The "low" level means that the system is reclaiming memory for new allocations. Monitoring this reclaiming activity might be useful for maintaining cache level. Upon notification, the program (typically "Activity Manager") might analyze vmstat and act in advance (i.e. prematurely shutdown unimportant services). The "medium" level means that the system is experiencing medium memory pressure, the system might be making swap, paging out active file caches, etc. Upon this event applications may decide to further analyze vmstat/zoneinfo/memcg or internal memory usage statistics and free any resources that can be easily reconstructed or re-read from a disk. The "critical" level means that the system is actively thrashing, it is about to out of memory (OOM) or even the in-kernel OOM killer is on its way to trigger. Applications should do whatever they can to help the system. It might be too late to consult with vmstat or any other statistics, so it's advisable to take an immediate action. The events are propagated upward until the event is handled, i.e. the events are not pass-through. Here is what this means: for example you have three cgroups: A->B->C. Now you set up an event listener on cgroups A, B and C, and suppose group C experiences some pressure. In this situation, only group C will receive the notification, i.e. groups A and B will not receive it. This is done to avoid excessive "broadcasting" of messages, which disturbs the system and which is especially bad if we are low on memory or thrashing. So, organize the cgroups wisely, or propagate the events manually (or, ask us to implement the pass-through events, explaining why would you need them.) Performance wise, the memory pressure notifications feature itself is lightweight and does not require much of bookkeeping, in contrast to the rest of memcg features. Unfortunately, as of current memcg implementation, pages accounting is an inseparable part and cannot be turned off. The good news is that there are some efforts[1] to improve the situation; plus, implementing the same, fully API-compatible[2] interface for CONFIG_MEMCG=n case (e.g. embedded) is also a viable option, so it will not require any changes on the userland side. [1] http://permalink.gmane.org/gmane.linux.kernel.cgroups/6291 [2] http://lkml.org/lkml/2013/2/21/454 [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix CONFIG_CGROPUPS=n warnings] Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Acked-by: Kirill A. Shutemov <kirill@shutemov.name> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Glauber Costa <glommer@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Leonid Moiseichuk <leonid.moiseichuk@nokia.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm, hotplug: avoid compiling memory hotremove functions when disabledDavid Rientjes
__remove_pages() is only necessary for CONFIG_MEMORY_HOTREMOVE. PowerPC pseries will return -EOPNOTSUPP if unsupported. Adding an #ifdef causes several other functions it depends on to also become unnecessary, which saves in .text when disabled (it's disabled in most defconfigs besides powerpc, including x86). remove_memory_block() becomes static since it is not referenced outside of drivers/base/memory.c. Build tested on x86 and powerpc with CONFIG_MEMORY_HOTREMOVE both enabled and disabled. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Toshi Kani <toshi.kani@hp.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29resource: add release_mem_region_adjustable()Toshi Kani
Add release_mem_region_adjustable(), which releases a requested region from a currently busy memory resource. This interface adjusts the matched memory resource accordingly even if the requested region does not match exactly but still fits into. This new interface is intended for memory hot-delete. During bootup, memory resources are inserted from the boot descriptor table, such as EFI Memory Table and e820. Each memory resource entry usually covers the whole contigous memory range. Memory hot-delete request, on the other hand, may target to a particular range of memory resource, and its size can be much smaller than the whole contiguous memory. Since the existing release interfaces like __release_region() require a requested region to be exactly matched to a resource entry, they do not allow a partial resource to be released. This new interface is restrictive (i.e. release under certain conditions), which is consistent with other release interfaces, __release_region() and __release_resource(). Additional release conditions, such as an overlapping region to a resource entry, can be supported after they are confirmed as valid cases. There is no change to the existing interfaces since their restriction is valid for I/O resources. [akpm@linux-foundation.org: use GFP_ATOMIC under write_lock()] [akpm@linux-foundation.org: switch back to GFP_KERNEL, less buggily] [akpm@linux-foundation.org: remove unneeded and wrong kfree(), per Toshi] Signed-off-by: Toshi Kani <toshi.kani@hp.com> Reviewed-by : Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Cc: T Makphaibulchoke <tmac@hp.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm: remove CONFIG_HOTPLUG ifdefsYijing Wang
CONFIG_HOTPLUG is going away as an option, cleanup CONFIG_HOTPLUG ifdefs in mm files. Signed-off-by: Yijing Wang <wangyijing@huawei.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm: replace hardcoded 3% with admin_reserve_pages knobAndrew Shewmaker
Add an admin_reserve_kbytes knob to allow admins to change the hardcoded memory reserve to something other than 3%, which may be multiple gigabytes on large memory systems. Only about 8MB is necessary to enable recovery in the default mode, and only a few hundred MB are required even when overcommit is disabled. This affects OVERCOMMIT_GUESS and OVERCOMMIT_NEVER. admin_reserve_kbytes is initialized to min(3% free pages, 8MB) I arrived at 8MB by summing the RSS of sshd or login, bash, and top. Please see first patch in this series for full background, motivation, testing, and full changelog. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: make init_admin_reserve() static] Signed-off-by: Andrew Shewmaker <agshew@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm: limit growth of 3% hardcoded other user reserveAndrew Shewmaker
Add user_reserve_kbytes knob. Limit the growth of the memory reserved for other user processes to min(3% current process size, user_reserve_pages). Only about 8MB is necessary to enable recovery in the default mode, and only a few hundred MB are required even when overcommit is disabled. user_reserve_pages defaults to min(3% free pages, 128MB) I arrived at 128MB by taking the max VSZ of sshd, login, bash, and top ... then adding the RSS of each. This only affects OVERCOMMIT_NEVER mode. Background 1. user reserve __vm_enough_memory reserves a hardcoded 3% of the current process size for other applications when overcommit is disabled. This was done so that a user could recover if they launched a memory hogging process. Without the reserve, a user would easily run into a message such as: bash: fork: Cannot allocate memory 2. admin reserve Additionally, a hardcoded 3% of free memory is reserved for root in both overcommit 'guess' and 'never' modes. This was intended to prevent a scenario where root-cant-log-in and perform recovery operations. Note that this reserve shrinks, and doesn't guarantee a useful reserve. Motivation The two hardcoded memory reserves should be updated to account for current memory sizes. Also, the admin reserve would be more useful if it didn't shrink too much. When the current code was originally written, 1GB was considered "enterprise". Now the 3% reserve can grow to multiple GB on large memory systems, and it only needs to be a few hundred MB at most to enable a user or admin to recover a system with an unwanted memory hogging process. I've found that reducing these reserves is especially beneficial for a specific type of application load: * single application system * one or few processes (e.g. one per core) * allocating all available memory * not initializing every page immediately * long running I've run scientific clusters with this sort of load. A long running job sometimes failed many hours (weeks of CPU time) into a calculation. They weren't initializing all of their memory immediately, and they weren't using calloc, so I put systems into overcommit 'never' mode. These clusters run diskless and have no swap. However, with the current reserves, a user wishing to allocate as much memory as possible to one process may be prevented from using, for example, almost 2GB out of 32GB. The effect is less, but still significant when a user starts a job with one process per core. I have repeatedly seen a set of processes requesting the same amount of memory fail because one of them could not allocate the amount of memory a user would expect to be able to allocate. For example, Message Passing Interfce (MPI) processes, one per core. And it is similar for other parallel programming frameworks. Changing this reserve code will make the overcommit never mode more useful by allowing applications to allocate nearly all of the available memory. Also, the new admin_reserve_kbytes will be safer than the current behavior since the hardcoded 3% of available memory reserve can shrink to something useless in the case where applications have grabbed all available memory. Risks * "bash: fork: Cannot allocate memory" The downside of the first patch-- which creates a tunable user reserve that is only used in overcommit 'never' mode--is that an admin can set it so low that a user may not be able to kill their process, even if they already have a shell prompt. Of course, a user can get in the same predicament with the current 3% reserve--they just have to launch processes until 3% becomes negligible. * root-cant-log-in problem The second patch, adding the tunable rootuser_reserve_pages, allows the admin to shoot themselves in the foot by setting it too small. They can easily get the system into a state where root-can't-log-in. However, the new admin_reserve_kbytes will be safer than the current behavior since the hardcoded 3% of available memory reserve can shrink to something useless in the case where applications have grabbed all available memory. Alternatives * Memory cgroups provide a more flexible way to limit application memory. Not everyone wants to set up cgroups or deal with their overhead. * We could create a fourth overcommit mode which provides smaller reserves. The size of useful reserves may be drastically different depending on the whether the system is embedded or enterprise. * Force users to initialize all of their memory or use calloc. Some users don't want/expect the system to overcommit when they malloc. Overcommit 'never' mode is for this scenario, and it should work well. The new user and admin reserve tunables are simple to use, with low overhead compared to cgroups. The patches preserve current behavior where 3% of memory is less than 128MB, except that the admin reserve doesn't shrink to an unusable size under pressure. The code allows admins to tune for embedded and enterprise usage. FAQ * How is the root-cant-login problem addressed? What happens if admin_reserve_pages is set to 0? Root is free to shoot themselves in the foot by setting admin_reserve_kbytes too low. On x86_64, the minimum useful reserve is: 8MB for overcommit 'guess' 128MB for overcommit 'never' admin_reserve_pages defaults to min(3% free memory, 8MB) So, anyone switching to 'never' mode needs to adjust admin_reserve_pages. * How do you calculate a minimum useful reserve? A user or the admin needs enough memory to login and perform recovery operations, which includes, at a minimum: sshd or login + bash (or some other shell) + top (or ps, kill, etc.) For overcommit 'guess', we can sum resident set sizes (RSS) because we only need enough memory to handle what the recovery programs will typically use. On x86_64 this is about 8MB. For overcommit 'never', we can take the max of their virtual sizes (VSZ) and add the sum of their RSS. We use VSZ instead of RSS because mode forces us to ensure we can fulfill all of the requested memory allocations-- even if the programs only use a fraction of what they ask for. On x86_64 this is about 128MB. When swap is enabled, reserves are useful even when they are as small as 10MB, regardless of overcommit mode. When both swap and overcommit are disabled, then the admin should tune the reserves higher to be absolutley safe. Over 230MB each was safest in my testing. * What happens if user_reserve_pages is set to 0? Note, this only affects overcomitt 'never' mode. Then a user will be able to allocate all available memory minus admin_reserve_kbytes. However, they will easily see a message such as: "bash: fork: Cannot allocate memory" And they won't be able to recover/kill their application. The admin should be able to recover the system if admin_reserve_kbytes is set appropriately. * What's the difference between overcommit 'guess' and 'never'? "Guess" allows an allocation if there are enough free + reclaimable pages. It has a hardcoded 3% of free pages reserved for root. "Never" allows an allocation if there is enough swap + a configurable percentage (default is 50) of physical RAM. It has a hardcoded 3% of free pages reserved for root, like "Guess" mode. It also has a hardcoded 3% of the current process size reserved for additional applications. * Why is overcommit 'guess' not suitable even when an app eventually writes to every page? It takes free pages, file pages, available swap pages, reclaimable slab pages into consideration. In other words, these are all pages available, then why isn't overcommit suitable? Because it only looks at the present state of the system. It does not take into account the memory that other applications have malloced, but haven't initialized yet. It overcommits the system. Test Summary There was little change in behavior in the default overcommit 'guess' mode with swap enabled before and after the patch. This was expected. Systems run most predictably (i.e. no oom kills) in overcommit 'never' mode with swap enabled. This also allowed the most memory to be allocated to a user application. Overcommit 'guess' mode without swap is a bad idea. It is easy to crash the system. None of the other tested combinations crashed. This matches my experience on the Roadrunner supercomputer. Without the tunable user reserve, a system in overcommit 'never' mode and without swap does not allow the admin to recover, although the admin can. With the new tunable reserves, a system in overcommit 'never' mode and without swap can be configured to: 1. maximize user-allocatable memory, running close to the edge of recoverability 2. maximize recoverability, sacrificing allocatable memory to ensure that a user cannot take down a system Test Description Fedora 18 VM - 4 x86_64 cores, 5725MB RAM, 4GB Swap System is booted into multiuser console mode, with unnecessary services turned off. Caches were dropped before each test. Hogs are user memtester processes that attempt to allocate all free memory as reported by /proc/meminfo In overcommit 'never' mode, memory_ratio=100 Test Results 3.9.0-rc1-mm1 Overcommit | Swap | Hogs | MB Got/Wanted | OOMs | User Recovery | Admin Recovery ---------- ---- ---- ------------- ---- ------------- -------------- guess yes 1 5432/5432 no yes yes guess yes 4 5444/5444 1 yes yes guess no 1 5302/5449 no yes yes guess no 4 - crash no no never yes 1 5460/5460 1 yes yes never yes 4 5460/5460 1 yes yes never no 1 5218/5432 no no yes never no 4 5203/5448 no no yes 3.9.0-rc1-mm1-tunablereserves User and Admin Recovery show their respective reserves, if applicable. Overcommit | Swap | Hogs | MB Got/Wanted | OOMs | User Recovery | Admin Recovery ---------- ---- ---- ------------- ---- ------------- -------------- guess yes 1 5419/5419 no - yes 8MB yes guess yes 4 5436/5436 1 - yes 8MB yes guess no 1 5440/5440 * - yes 8MB yes guess no 4 - crash - no 8MB no * process would successfully mlock, then the oom killer would pick it never yes 1 5446/5446 no 10MB yes 20MB yes never yes 4 5456/5456 no 10MB yes 20MB yes never no 1 5387/5429 no 128MB no 8MB barely never no 1 5323/5428 no 226MB barely 8MB barely never no 1 5323/5428 no 226MB barely 8MB barely never no 1 5359/5448 no 10MB no 10MB barely never no 1 5323/5428 no 0MB no 10MB barely never no 1 5332/5428 no 0MB no 50MB yes never no 1 5293/5429 no 0MB no 90MB yes never no 1 5001/5427 no 230MB yes 338MB yes never no 4* 4998/5424 no 230MB yes 338MB yes * more memtesters were launched, able to allocate approximately another 100MB Future Work - Test larger memory systems. - Test an embedded image. - Test other architectures. - Time malloc microbenchmarks. - Would it be useful to be able to set overcommit policy for each memory cgroup? - Some lines are slightly above 80 chars. Perhaps define a macro to convert between pages and kb? Other places in the kernel do this. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: make init_user_reserve() static] Signed-off-by: Andrew Shewmaker <agshew@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29include/linux/memory.h: implement register_hotmemory_notifier()Andrew Morton
When CONFIG_MEMORY_HOTPLUG=n, we don't want the memory-hotplug notifier handlers to be included in the .o files, for space reasons. The existing hotplug_memory_notifier() tries to handle this but testing with gcc-4.4.4 shows that it doesn't work - the hotplug functions are still present in the .o files. So implement a new register_hotmemory_notifier() which is a copy of register_hotcpu_notifier(), and which actually works as desired. hotplug_memory_notifier() and register_memory_notifier() callsites should be converted to use this new register_hotmemory_notifier(). While we're there, let's repair the existing hotplug_memory_notifier(): it simply stomps on the register_memory_notifier() return value, so well-behaved code cannot check for errors. Apparently non of the existing callers were well-behaved :( Cc: Andrew Shewmaker <agshew@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29page_alloc: make setup_nr_node_ids() usable for arch init codeCody P Schafer
powerpc and x86 were opencoding copies of setup_nr_node_ids(), which page_alloc provides but makes static. Make it avaliable to the archs in linux/mm.h. Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29sparse-vmemmap: specify vmemmap population range in bytesJohannes Weiner
The sparse code, when asking the architecture to populate the vmemmap, specifies the section range as a starting page and a number of pages. This is an awkward interface, because none of the arch-specific code actually thinks of the range in terms of 'struct page' units and always translates it to bytes first. In addition, later patches mix huge page and regular page backing for the vmemmap. For this, they need to call vmemmap_populate_basepages() on sub-section ranges with PAGE_SIZE and PMD_SIZE in mind. But these are not necessarily multiples of the 'struct page' size and so this unit is too coarse. Just translate the section range into bytes once in the generic sparse code, then pass byte ranges down the stack. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: David S. Miller <davem@davemloft.net> Tested-by: David S. Miller <davem@davemloft.net> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm, hugetlb: include hugepages in meminfoDavid Rientjes
Particularly in oom conditions, it's troublesome that hugetlb memory is not displayed. All other meminfo that is emitted will not add up to what is expected, and there is no artifact left in the kernel log to show that a potentially significant amount of memory is actually allocated as hugepages which are not available to be reclaimed. Booting with hugepages=8192 on the command line, this memory is now shown in oom conditions. For example, with echo m > /proc/sysrq-trigger: Node 0 hugepages_total=2048 hugepages_free=2048 hugepages_surp=0 hugepages_size=2048kB Node 1 hugepages_total=2048 hugepages_free=2048 hugepages_surp=0 hugepages_size=2048kB Node 2 hugepages_total=2048 hugepages_free=2048 hugepages_surp=0 hugepages_size=2048kB Node 3 hugepages_total=2048 hugepages_free=2048 hugepages_surp=0 hugepages_size=2048kB [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm: allow arch code to control the user page table ceilingHugh Dickins
On architectures where a pgd entry may be shared between user and kernel (e.g. ARM+LPAE), freeing page tables needs a ceiling other than 0. This patch introduces a generic USER_PGTABLES_CEILING that arch code can override. It is the responsibility of the arch code setting the ceiling to ensure the complete freeing of the page tables (usually in pgd_free()). [catalin.marinas@arm.com: commit log; shift_arg_pages(), asm-generic/pgtables.h changes] Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: <stable@vger.kernel.org> [3.3+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29kexec, vmalloc: export additional vmalloc layer informationAtsushi Kumagai
Now, vmap_area_list is exported as VMCOREINFO for makedumpfile to get the start address of vmalloc region (vmalloc_start). The address which contains vmalloc_start value is represented as below: vmap_area_list.next - OFFSET(vmap_area.list) + OFFSET(vmap_area.va_start) However, both OFFSET(vmap_area.va_start) and OFFSET(vmap_area.list) aren't exported as VMCOREINFO. So this patch exports them externally with small cleanup. [akpm@linux-foundation.org: vmalloc.h should include list.h for list_head] Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Dave Anderson <anderson@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ingo Molnar <mingo@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm, vmalloc: export vmap_area_list, instead of vmlistJoonsoo Kim
Although our intention is to unexport internal structure entirely, but there is one exception for kexec. kexec dumps address of vmlist and makedumpfile uses this information. We are about to remove vmlist, then another way to retrieve information of vmalloc layer is needed for makedumpfile. For this purpose, we export vmap_area_list, instead of vmlist. Signed-off-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Anderson <anderson@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm, vmalloc: move get_vmalloc_info() to vmalloc.cJoonsoo Kim
Now get_vmalloc_info() is in fs/proc/mmu.c. There is no reason that this code must be here and it's implementation needs vmlist_lock and it iterate a vmlist which may be internal data structure for vmalloc. It is preferable that vmlist_lock and vmlist is only used in vmalloc.c for maintainability. So move the code to vmalloc.c Signed-off-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Dave Anderson <anderson@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ingo Molnar <mingo@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm: make snapshotting pages for stable writes a per-bio operationDarrick J. Wong
Walking a bio's page mappings has proved problematic, so create a new bio flag to indicate that a bio's data needs to be snapshotted in order to guarantee stable pages during writeback. Next, for the one user (ext3/jbd) of snapshotting, hook all the places where writes can be initiated without PG_writeback set, and set BIO_SNAP_STABLE there. We must also flag journal "metadata" bios for stable writeout, since file data can be written through the journal. Finally, the MS_SNAP_STABLE mount flag (only used by ext3) is now superfluous, so get rid of it. [akpm@linux-foundation.org: rename _submit_bh()'s `flags' to `bio_flags', delobotomize the _submit_bh declaration] [akpm@linux-foundation.org: teeny cleanup] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Artem Bityutskiy <dedekind1@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/hugetlb: add more arch-defined huge_pte functionsGerald Schaefer
Commit abf09bed3cce ("s390/mm: implement software dirty bits") introduced another difference in the pte layout vs. the pmd layout on s390, thoroughly breaking the s390 support for hugetlbfs. This requires replacing some more pte_xxx functions in mm/hugetlbfs.c with a huge_pte_xxx version. This patch introduces those huge_pte_xxx functions and their generic implementation in asm-generic/hugetlb.h, which will now be included on all architectures supporting hugetlbfs apart from s390. This change will be a no-op for those architectures. [akpm@linux-foundation.org: fix warning] Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Hillf Danton <dhillf@gmail.com> Acked-by: Michal Hocko <mhocko@suse.cz> [for !s390 parts] Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29fs: don't compile in drop_caches.c when CONFIG_SYSCTL=nJosh Triplett
drop_caches.c provides code only invokable via sysctl, so don't compile it in when CONFIG_SYSCTL=n. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29cgroup: remove css_get_nextMichal Hocko
Now that we have generic and well ordered cgroup tree walkers there is no need to keep css_get_next in the place. Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Ying Han <yinghan@google.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Glauber Costa <glommer@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm: introduce free_highmem_page() helper to free highmem pages into buddy systemJiang Liu
The original goal of this patchset is to fix the bug reported by https://bugzilla.kernel.org/show_bug.cgi?id=53501 Now it has also been expanded to reduce common code used by memory initializion. This is the second part, which applies to the previous part at: http://marc.info/?l=linux-mm&m=136289696323825&w=2 It introduces a helper function free_highmem_page() to free highmem pages into the buddy system when initializing mm subsystem. Introduction of free_highmem_page() is one step forward to clean up accesses and modificaitons of totalhigh_pages, totalram_pages and zone->managed_pages etc. I hope we could remove all references to totalhigh_pages from the arch/ subdirectory. We have only tested these patchset on x86 platforms, and have done basic compliation tests using cross-compilers from ftp.kernel.org. That means some code may not pass compilation on some architectures. So any help to test this patchset are welcomed! There are several other parts still under development: Part3: refine code to manage totalram_pages, totalhigh_pages and zone->managed_pages Part4: introduce helper functions to simplify mem_init() and remove the global variable num_physpages. This patch: Introduce helper function free_highmem_page(), which will be used by architectures with HIGHMEM enabled to free highmem pages into the buddy system. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Suzuki K. Poulose" <suzuki@in.ibm.com> Cc: Alexander Graf <agraf@suse.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Attilio Rao <attilio.rao@citrix.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Cong Wang <amwang@redhat.com> Cc: David Daney <david.daney@cavium.com> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Michel Lespinasse <walken@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rik van Riel <riel@redhat.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm: introduce common help functions to deal with reserved/managed pagesJiang Liu
The original goal of this patchset is to fix the bug reported by https://bugzilla.kernel.org/show_bug.cgi?id=53501 Now it has also been expanded to reduce common code used by memory initializion. This is the first part, which applies to v3.9-rc1. It introduces following common helper functions to simplify free_initmem() and free_initrd_mem() on different architectures: adjust_managed_page_count(): will be used to adjust totalram_pages, totalhigh_pages, zone->managed_pages when reserving/unresering a page. __free_reserved_page(): free a reserved page into the buddy system without adjusting page statistics info free_reserved_page(): free a reserved page into the buddy system and adjust page statistics info mark_page_reserved(): mark a page as reserved and adjust page statistics info free_reserved_area(): free a continous ranges of pages by calling free_reserved_page() free_initmem_default(): default method to free __init pages. We have only tested these patchset on x86 platforms, and have done basic compliation tests using cross-compilers from ftp.kernel.org. That means some code may not pass compilation on some architectures. So any help to test this patchset are welcomed! There are several other parts still under development: Part2: introduce free_highmem_page() to simplify freeing highmem pages Part3: refine code to manage totalram_pages, totalhigh_pages and zone->managed_pages Part4: introduce helper functions to simplify mem_init() and remove the global variable num_physpages. This patch: Code to deal with reserved/managed pages are duplicated by many architectures, so introduce common help functions to reduce duplicated code. These common help functions will also be used to concentrate code to modify totalram_pages and zone->managed_pages, which makes the code much more clear. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Anatolij Gustschin <agust@denx.de> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Liqin <liqin.chen@sunplusct.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Howells <dhowells@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Mikael Starvik <starvik@axis.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29vm: adjust ifdef for TINY_RCUPaul E. McKenney
There is an ifdef in page_cache_get_speculative() that checks for !SMP and TREE_RCU, which has been an impossible combination since the advent of TINY_RCU. The ifdef enables a fastpath that is valid when preemption is disabled by rcu_read_lock() in UP systems, which is the case when TINY_RCU is enabled. This commit therefore adjusts the ifdef to generate the fastpath when TINY_RCU is enabled. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reported-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm/shmem.c: remove an ifdefAndrew Morton
Create a CONFIG_MMU=y stub for ramfs_nommu_expand_for_mapping() in the usual fashion. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Wolfram Sang <wsa@the-dreams.de> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm, show_mem: suppress page counts in non-blockable contextsDavid Rientjes
On large systems with a lot of memory, walking all RAM to determine page types may take a half second or even more. In non-blockable contexts, the page allocator will emit a page allocation failure warning unless __GFP_NOWARN is specified. In such contexts, irqs are typically disabled and such a lengthy delay may even result in NMI watchdog timeouts. To fix this, suppress the page walk in such contexts when printing the page allocation failure warning. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@suse.de> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29mm: trace filemap add and delRobert Jarzmik
Use the events API to trace filemap loading and unloading of file pieces into the page cache. This patch aims at tracing the eviction reload cycle of executable and shared libraries pages in a memory constrained environment. The typical usage is to spot a specific device and inode (for example /lib/libc.so) to see the eviction cycles, and find out if frequently used code is rather spread across many pages (bad) or coallesced (good). Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Cc: Dave Chinner <david@fromorbit.com> Cc: Hugh Dickins <hughd@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29debug_locks.h: make warning more verboseJames Hogan
The WARN_ON(1) in DEBUG_LOCKS_WARN_ON is surprisingly awkward to track down when it's hit, as it's usually buried in macros, causing multiple instances to land on the same line number. This patch makes it more useful by switching to: WARN(1, "DEBUG_LOCKS_WARN_ON(%s)", #c); so that the particular DEBUG_LOCKS_WARN_ON is more easily identified and grep'd for. For example: WARNING: at kernel/mutex.c:198 _mutex_lock_nested+0x31c/0x380() DEBUG_LOCKS_WARN_ON(l->magic != l) Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: David Howells <dhowells@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29drivers/video: add Hyper-V Synthetic Video Frame Buffer DriverHaiyang Zhang
This is the driver for the Hyper-V Synthetic Video, which supports screen resolution up to Full HD 1920x1080 on Windows Server 2012 host, and 1600x1200 on Windows Server 2008 R2 or earlier. It also solves the double mouse cursor issue of the emulated video mode. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Cc: Olaf Hering <olaf@aepfle.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29Merge tag 'trace-3.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "Along with the usual minor fixes and clean ups there are a few major changes with this pull request. 1) Multiple buffers for the ftrace facility This feature has been requested by many people over the last few years. I even heard that Google was about to implement it themselves. I finally had time and cleaned up the code such that you can now create multiple instances of the ftrace buffer and have different events go to different buffers. This way, a low frequency event will not be lost in the noise of a high frequency event. Note, currently only events can go to different buffers, the tracers (ie function, function_graph and the latency tracers) still can only be written to the main buffer. 2) The function tracer triggers have now been extended. The function tracer had two triggers. One to enable tracing when a function is hit, and one to disable tracing. Now you can record a stack trace on a single (or many) function(s), take a snapshot of the buffer (copy it to the snapshot buffer), and you can enable or disable an event to be traced when a function is hit. 3) A perf clock has been added. A "perf" clock can be chosen to be used when tracing. This will cause ftrace to use the same clock as perf uses, and hopefully this will make it easier to interleave the perf and ftrace data for analysis." * tag 'trace-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (82 commits) tracepoints: Prevent null probe from being added tracing: Compare to 1 instead of zero for is_signed_type() tracing: Remove obsolete macro guard _TRACE_PROFILE_INIT ftrace: Get rid of ftrace_profile_bits tracing: Check return value of tracing_init_dentry() tracing: Get rid of unneeded key calculation in ftrace_hash_move() tracing: Reset ftrace_graph_filter_enabled if count is zero tracing: Fix off-by-one on allocating stat->pages kernel: tracing: Use strlcpy instead of strncpy tracing: Update debugfs README file tracing: Fix ftrace_dump() tracing: Rename trace_event_mutex to trace_event_sem tracing: Fix comment about prefix in arch_syscall_match_sym_name() tracing: Convert trace_destroy_fields() to static tracing: Move find_event_field() into trace_events.c tracing: Use TRACE_MAX_PRINT instead of constant tracing: Use pr_warn_once instead of open coded implementation ring-buffer: Add ring buffer startup selftest tracing: Bring Documentation/trace/ftrace.txt up to date tracing: Add "perf" trace_clock ... Conflicts: kernel/trace/ftrace.c kernel/trace/trace.c
2013-04-29Merge tag 'usb-3.10-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB patches from Greg Kroah-Hartman: "Here's the big USB pull request for 3.10-rc1. Lots of USB patches here, the majority being USB gadget changes and USB-serial driver cleanups, the rest being ARM build fixes / cleanups, and individual driver updates. We also finally got some chipidea fixes, which have been delayed for a number of kernel releases, as the maintainer has now reappeared. All of these have been in linux-next for a while" * tag 'usb-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (568 commits) USB: ehci-msm: USB_MSM_OTG needs USB_PHY USB: OHCI: avoid conflicting platform drivers USB: OMAP: ISP1301 needs USB_PHY USB: lpc32xx: ISP1301 needs USB_PHY USB: ftdi_sio: enable two UART ports on ST Microconnect Lite usb: phy: tegra: don't call into tegra-ehci directly usb: phy: phy core cannot yet be a module USB: Fix initconst in ehci driver usb-storage: CY7C68300A chips do not support Cypress ATACB USB: serial: option: Added support Olivetti Olicard 145 USB: ftdi_sio: correct ST Micro Connect Lite PIDs ARM: mxs_defconfig: add CONFIG_USB_PHY ARM: imx_v6_v7_defconfig: add CONFIG_USB_PHY usb: phy: remove exported function from __init section usb: gadget: zero: put function instances on unbind usb: gadget: f_sourcesink.c: correct a copy-paste misnomer usb: gadget: cdc2: fix error return code in cdc_do_config() usb: gadget: multi: fix error return code in rndis_do_config() usb: gadget: f_obex: fix error return code in obex_bind() USB: storage: convert to use module_usb_driver() ...
2013-04-29Merge tag 'tty-3.10-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver update from Greg Kroah-Hartman: "Here's the big tty/serial driver merge request for 3.10-rc1 Once again, Jiri has a number of TTY driver fixes and cleanups, and Peter Hurley came through with a bunch of ldisc fixes that resolve a number of reported issues. There are some other serial driver cleanups as well. All of these have been in the linux-next tree for a while" * tag 'tty-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (117 commits) tty/serial/sirf: fix MODULE_DEVICE_TABLE serial: mxs: drop superfluous {get|put}_device serial: mxs: fix buffer overflow ARM: PL011: add support for extended FIFO-size of PL011-r1p5 serial_core.c: add put_device() after device_find_child() tty: Fix unsafe bit ops in tty_throttle_safe/unthrottle_safe serial: sccnxp: Replace pdata.init/exit with regulator API serial: sccnxp: Do not override device name TTY: pty, fix compilation warning TTY: rocket, fix compilation warning TTY: ircomm: fix DTR being raised on hang up TTY: synclinkmp: fix DTR being raised on hang up TTY: synclink_gt: fix DTR being raised on hang up TTY: synclink: fix DTR being raised on hang up serial: 8250_dw: Fix the stub for dw8250_probe_acpi() serial: 8250_dw: Convert to devm_ioremap() serial: 8250_dw: Set port capabilities based on CPR register serial: 8250_dw: Let ACPI code extract the DMA client info serial: 8250_dw: Support clk framework also with ACPI serial: 8250_dw: Enable runtime PM ...
2013-04-29Merge tag 'staging-3.10-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging driver tree update from Greg Kroah-Hartman: "Here's the big staging driver tree update for 3.10-rc1 This update contains loads of comedi driver cleanups and fixes in here, iio updates, android driver changes, and other various staging driver cleanups. Thanks to some drivers being removed, and the comedi driver cleanups, we have removed more code than we added: 627 files changed, 65145 insertions(+), 76321 deletions(-) which is always nice to see. All of these have been in linux-next for a while." * tag 'staging-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (940 commits) staging: comedi: ni_labpc: fix legacy driver build staging: comedi: das800: cleanup the cio-das802/16 fifo comments staging: comedi: das800: rename CamelCase vars in das800_ai_do_cmd() staging: comedi: das800: tidy up the private data staging: comedi: das800: tidy up das800_interrupt() staging: comedi: das800: tidy up das800_ai_insn_read() staging: comedi: das800: tidy up das800_di_insn_bits() staging: comedi: das800: tidy up das800_do_insn_bits() staging: comedi: das800: remove extra divisor calculation call staging: comedi: das800: rename {enable,disable}_das800 staging: comedi: das800: tidy up subdevice init staging: comedi: das800: allow attaching without interrupt support staging: comedi: das800: interrupts are required for async command support staging: comedi: das800: tidy up das800_ai_do_cmdtest() staging: comedi: das800: remove 'volatile' on private data variables staging: comedi: das800: cleanup the boardinfo staging: comedi: das800: cleanup range table declarations staging: comedi: das800: introduce das800_ind_{write, read}() staging: comedi: das800: remove forward declarations staging: comedi: das800: move das800_set_frequency() ...
2013-04-29Merge tag 'driver-core-3.10-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core update from Greg Kroah-Hartman: "Here's the merge request for the driver core tree for 3.10-rc1 It's pretty small, just a number of driver core and sysfs updates and fixes, all of which have been in linux-next for a while now. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" Fixed conflict in kernel/rtmutex-tester.c, the locking tree had a better fix for the same sysfs file mode problem. * tag 'driver-core-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: PM / Runtime: Idle devices asynchronously after probe|release driver core: handle user namespaces properly with the uid/gid devtmpfs change driver core: devtmpfs: fix compile failure with CONFIG_UIDGID_STRICT_TYPE_CHECKS devtmpfs: add base.h include driver core: add uid and gid to devtmpfs sysfs: check if one entry has been removed before freeing sysfs: fix crash_notes_size build warning sysfs: fix use after free in case of concurrent read/write and readdir rtmutex-tester: fix mode of sysfs files Documentation: Add ABI entry for crash_notes and crash_notes_size sysfs: Add crash_notes_size to export percpu note size driver core: platform_device.h: fix checkpatch errors and warnings driver core: platform.c: fix checkpatch errors and warnings driver core: warn that platform_driver_probe can not use deferred probing sysfs: use atomic_inc_unless_negative in sysfs_get_active base: core: WARN() about bogus permissions on device attributes device: separate all subsys mutexes
2013-04-29Merge tag 'char-misc-3.10-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver update from Greg Kroah-Hartman: "Here's the big char / misc driver update for 3.10-rc1 A number of various driver updates, the majority being new functionality in the MEI driver subsystem (it's now a subsystem, it started out just a single driver), extcon updates, memory updates, hyper-v updates, and a bunch of other small stuff that doesn't fit in any other tree. All of these have been in linux-next for a while" * tag 'char-misc-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (148 commits) Tools: hv: Fix a checkpatch warning tools: hv: skip iso9660 mounts in hv_vss_daemon tools: hv: use FIFREEZE/FITHAW in hv_vss_daemon tools: hv: use getmntent in hv_vss_daemon Tools: hv: Fix a checkpatch warning tools: hv: fix checks for origin of netlink message in hv_vss_daemon Tools: hv: fix warnings in hv_vss_daemon misc: mark spear13xx-pcie-gadget as broken mei: fix krealloc() misuse in in mei_cl_irq_read_msg() mei: reduce flow control only for completed messages mei: reseting -> resetting mei: fix reading large reposnes mei: revamp mei_irq_read_client_message function mei: revamp mei_amthif_irq_read_message mei: revamp hbm state machine Revert "drivers/scsi: use module_pcmcia_driver() in pcmcia drivers" Revert "scsi: pcmcia: nsp_cs: remove module init/exit function prototypes" scsi: pcmcia: nsp_cs: remove module init/exit function prototypes mei: wd: fix line over 80 characters misc: tsl2550: Use dev_pm_ops ...
2013-04-29Merge tag 'hwmon-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon update from Guenter Roeck: - New drivers for NCT6775, NCT6776, NCT6779, and LM95234. - Added support for LTC2974, LTC3883, LM25056, TMP431, TMP432, ADT7310, and ADT7320 to existing drivers. - Various code cleanups and minor improvements. * tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (54 commits) hwmon: (nct6775) Fix coding style problems hwmon: (nct6775) Constify strings hwmon: (tmp401) Add support for TMP432 hwmon: (tmp401) Add support for update_interval attribute hwmon: (tmp401) Reset valid flag when resetting temperature history hwmon: (tmp401) Simplification and cleanup hwmon: (tmp401) Use sysfs_create_group / sysfs_remove_group hwmon: (tmp401) Drop unused defines, use BIT for bit masks hwmon: (nct6775) Use ARRAY_SIZE for loops where possible documentation: hwmon: Fix typo in documentation/hwmon hwmon: (nct6775) Enable both AUXTIN and VIN3 on NCT6776 hwmon: (ad7314) use spi_get_drvdata() and spi_set_drvdata() MAINTAINERS: Add myself as maintainer for the NCT6775 driver hwmon: (nct6775) Expand scope of supported chips hwmon: (gpio-fan) Use is_visible to determine if attributes should be created hwmon: (tmp401) Fix device detection for TMP411B and TMP411C hwmon: Add driver for LM95234 hwmon: (tmp401) Add support for TMP431 hwmon: (pmbus/lm25066) Add support for LM25056 hwmon: (pmbus/lm25066) Refactor device specific coefficients ...
2013-04-29Merge tag 'pinctrl-for-v3.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pinctrl update from Linus Walleij: "These are the pinctrl changes for v3.10: - Patrice Chotard contributed a new configuration debugfs interface and reintroduced fine-grained locking into the core: instead of having a "big pinctrl lock" we have a per-controller lock and specialized locks for the global controller and pinctrl handle lists. - Haoijan Zhuang deleted all the PXA and MMP2 pinctrl drivers and replaced them with pinctrl-single (which is also used by other SoCs) so we are gaining consolidation. The platform particulars now come in through the device tree. - Haoijan also added support for generic pin config into the pinctrl-single driver which is another big consolidation win. - Finally also GPIO ranges are now supported by the pinctrl-single driver. - Tomasz Figa contributed a new Samsung S3C pinctrl driver, bringing more of the older Samsung platforms under the pinctrl umbrella and out of arch/arm. - Maxime Ripard contributed new Allwinner A10/A13 drivers. - Sachin Kamat, Wei Yongjun and Axel Lin did a lot of cleanups." * tag 'pinctrl-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (66 commits) pinctrl: move subsystem mutex to pinctrl_dev struct pinctrl/pinconfig: fix misplaced goto pinctrl: s3c64xx: Fix build error caused by undefined chained_irq_enter pinctrl/pinconfig: add debug interface pinctrl: abx500: fix issue when no pdata pinctrl: pinctrl-single: add missing double quote pinctrl: sunxi: Rename wemac functions to emac pinctrl: exynos5440: add gpio interrupt support pinctrl: exynos5440: fix probe failure due to missing pin-list in config nodes pinctrl: ab8505: Staticize some symbols pinctrl: ab8540: Staticize some symbols pinctrl: ab9540: Staticize some symbols pinctrl: ab8500: Staticize some symbols pinctrl: abx500: Staticize some symbols pinctrl: Add pinctrl-s3c64xx driver pinctrl: samsung: Handle banks with two configuration registers pinctrl: samsung: Remove hardcoded register offsets pinctrl: samsung: Split pin bank description into two structures pinctrl: samsung: Include pinctrl-exynos driver data conditionally pinctrl: samsung: Protect bank registers with a spinlock ...
2013-04-29Merge tag 'fbdev-for-3.10' of git://gitorious.org/linux-omap-dss2/linuxLinus Torvalds
Pull fbdev updates from Tomi Valkeinen: - use vm_iomap_memory() in various fb drivers to map the fb memory to userspace - Cleanups for the videomode and display_timing features - Updates to vt8500, wm8505 and auo-k190x fb drivers * tag 'fbdev-for-3.10' of git://gitorious.org/linux-omap-dss2/linux: (36 commits) fbdev: fix check for fb_mmap's mmio availability fbdev: improve fb_mmap bounds checks fbdev/ps3fb: use vm_iomap_memory() fbdev/sgivwfb: use vm_iomap_memory() fbdev/vermillion: use vm_iomap_memory() fbdev/sa1100fb: use vm_iomap_memory() fbdev/fb-puv3: use vm_iomap_memory() fbdev/controlfb: use vm_iomap_memory() fbdev/omapfb: use vm_iomap_memory() video: vt8500: fix Kconfig for videomode video/s3c: move platform_data out of arch/arm video/exynos: remove unnecessary header inclusions drivers/video: fsl-diu-fb: add hardware cursor support drivers: video: use module_platform_driver_probe() ARM: OMAP: remove "config FB_OMAP_CONSISTENT_DMA_SIZE" video: wm8505fb: Convert to devm_ioremap_resource() AUO-K190x: Add resolutions for portrait displays AUO-K190x: add framebuffer rotation support AUO-K190x: add a 16bit truecolor mode AUO-K190x: make color handling more flexible ...
2013-04-29Merge tag 'pci-v3.10-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "PCI changes for the v3.10 merge window: PCI device hotplug - Remove ACPI PCI subdrivers (Jiang Liu, Myron Stowe) - Make acpiphp builtin only, not modular (Jiang Liu) - Add acpiphp mutual exclusion (Jiang Liu) Power management - Skip "PME enabled/disabled" messages when not supported (Rafael Wysocki) - Fix fallback to PCI_D0 (Rafael Wysocki) Miscellaneous - Factor quirk_io_region (Yinghai Lu) - Cache MSI capability offsets & cleanup (Gavin Shan, Bjorn Helgaas) - Clean up EISA resource initialization and logging (Bjorn Helgaas) - Fix prototype warnings (Andy Shevchenko, Bjorn Helgaas) - MIPS: Initialize of_node before scanning bus (Gabor Juhos) - Fix pcibios_get_phb_of_node() declaration "weak" annotation (Gabor Juhos) - Add MSI INTX_DISABLE quirks for AR8161/AR8162/etc (Xiong Huang) - Fix aer_inject return values (Prarit Bhargava) - Remove PME/ACPI dependency (Andrew Murray) - Use shared PCI_BUS_NUM() and PCI_DEVID() (Shuah Khan)" * tag 'pci-v3.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (63 commits) vfio-pci: Use cached MSI/MSI-X capabilities vfio-pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK PCI: Remove "extern" from function declarations PCI: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK PCI: Drop msi_mask_reg() and remove drivers/pci/msi.h PCI: Use msix_table_size() directly, drop multi_msix_capable() PCI: Drop msix_table_offset_reg() and msix_pba_offset_reg() macros PCI: Drop is_64bit_address() and is_mask_bit_support() macros PCI: Drop msi_data_reg() macro PCI: Drop msi_lower_address_reg() and msi_upper_address_reg() macros PCI: Drop msi_control_reg() macro and use PCI_MSI_FLAGS directly PCI: Use cached MSI/MSI-X offsets from dev, not from msi_desc PCI: Clean up MSI/MSI-X capability #defines PCI: Use cached MSI-X cap while enabling MSI-X PCI: Use cached MSI cap while enabling MSI interrupts PCI: Remove MSI/MSI-X cap check in pci_msi_check_device() PCI: Cache MSI/MSI-X capability offsets in struct pci_dev PCI: Use u8, not int, for PM capability offset [SCSI] megaraid_sas: Use correct #define for MSI-X capability PCI: Remove "extern" from function declarations ...
2013-04-29Merge branch 'core-locking-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking changes from Ingo Molnar: "The most noticeable change are mutex speedups from Waiman Long, for higher loads. These scalability changes should be most noticeable on larger server systems. There are also cleanups, fixes and debuggability improvements." * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lockdep: Consolidate bug messages into a single print_lockdep_off() function lockdep: Print out additional debugging advice when we hit lockdep BUGs mutex: Back out architecture specific check for negative mutex count mutex: Queue mutex spinners with MCS lock to reduce cacheline contention mutex: Make more scalable by doing less atomic operations mutex: Move mutex spinning code from sched/core.c back to mutex.c locking/rtmutex/tester: Set correct permissions on sysfs files lockdep: Remove unnecessary 'hlock_next' variable
2013-04-29Merge tag 'stable/for-linus-3.10-rc0-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen updates from Konrad Rzeszutek Wilk: "Features: - Populate the boot_params with EDD data. - Cleanups in the IRQ code. Bug-fixes: - CPU hotplug offline/online in PVHVM mode. - Re-upload processor PM data after ACPI S3 suspend/resume cycle." And Konrad gets a gold star for sending the pull request early when he thought he'd be away for the first week of the merge window (but because of 3.9 dragging out to -rc8 he then re-sent the reminder on the first day of the merge window anyway) * tag 'stable/for-linus-3.10-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: resolve section mismatch warnings in xen-acpi-processor xen: Re-upload processor PM data to hypervisor after S3 resume (v2) xen/smp: Unifiy some of the PVs and PVHVM offline CPU path xen/smp/pvhvm: Don't initialize IRQ_WORKER as we are using the native one. xen/spinlock: Disable IRQ spinlock (PV) allocation on PVHVM xen/spinlock: Check against default value of -1 for IRQ line. xen/time: Add default value of -1 for IRQ and check for that. xen/events: Check that IRQ value passed in is valid. xen/time: Fix kasprintf splat when allocating timer%d IRQ line. xen/smp/spinlock: Fix leakage of the spinlock interrupt line for every CPU online/offline xen/smp: Fix leakage of timer interrupt line for every CPU online/offline. xen kconfig: fix select INPUT_XEN_KBDDEV_FRONTEND xen: drop tracking of IRQ vector x86/xen: populate boot_params with EDD data
2013-04-28Merge remote-tracking branch 'regulator/topic/palmas' into v3.9-rc8Mark Brown