summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2010-10-26memory hotplug: unify is_removable and offline detection codeKAMEZAWA Hiroyuki
Now, sysfs interface of memory hotplug shows whether the section is removable or not. But it checks only migrateype of pages and doesn't check details of cluster of pages. Next, memory hotplug's set_migratetype_isolate() has the same kind of check, too. This patch adds the function __count_unmovable_pages() and makes above 2 checks to use the same logic. Then, is_removable and hotremove code uses the same logic. No changes in the hotremove logic itself. TODO: need to find a way to check RECLAMABLE. But, considering bit, calling shrink_slab() against a range before starting memory hotremove sounds better. If so, this patch's logic doesn't need to be changed. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reported-by: Michal Hocko <mhocko@suse.cz> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26mm: only build per-node scan_unevictable functions when NUMA is enabledThadeu Lima de Souza Cascardo
Non-NUMA systems do never create these files anyway, since they are only created by driver subsystem when NUMA is configured. [akpm@linux-foundation.org: cleanup] Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26include/linux/pageblock-flags.h: fix set_pageblock_flags() macro definitonzeal
The presently-unused macro was missing one parameter. Signed-off-by: zeal <zealcook@gmail.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26writeback: remove nonblocking/encountered_congestion referencesWu Fengguang
This removes more dead code that was somehow missed by commit 0d99519efef (writeback: remove unused nonblocking and congestion checks). There are no behavior change except for the removal of two entries from one of the ext4 tracing interface. The nonblocking checks in ->writepages are no longer used because the flusher now prefer to block on get_request_wait() than to skip inodes on IO congestion. The latter will lead to more seeky IO. The nonblocking checks in ->writepage are no longer used because it's redundant with the WB_SYNC_NONE check. We no long set ->nonblocking in VM page out and page migration, because a) it's effectively redundant with WB_SYNC_NONE in current code b) it's old semantic of "Don't get stuck on request queues" is mis-behavior: that would skip some dirty inodes on congestion and page out others, which is unfair in terms of LRU age. Inspired by Christoph Hellwig. Thanks! Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: David Howells <dhowells@redhat.com> Cc: Sage Weil <sage@newdream.net> Cc: Steve French <sfrench@samba.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26oom: add per-mm oom disable countYing Han
It's pointless to kill a task if another thread sharing its mm cannot be killed to allow future memory freeing. A subsequent patch will prevent kills in such cases, but first it's necessary to have a way to flag a task that shares memory with an OOM_DISABLE task that doesn't incur an additional tasklist scan, which would make select_bad_process() an O(n^2) function. This patch adds an atomic counter to struct mm_struct that follows how many threads attached to it have an oom_score_adj of OOM_SCORE_ADJ_MIN. They cannot be killed by the kernel, so their memory cannot be freed in oom conditions. This only requires task_lock() on the task that we're operating on, it does not require mm->mmap_sem since task_lock() pins the mm and the operation is atomic. [rientjes@google.com: changelog and sys_unshare() code] [rientjes@google.com: protect oom_disable_count with task_lock in fork] [rientjes@google.com: use old_mm for oom_disable_count in exec] Signed-off-by: Ying Han <yinghan@google.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26kfifo: disable __kfifo_must_check_helper()Andrew Morton
This helper is wrong: it coerces signed values into unsigned ones, so code such as if (kfifo_alloc(...) < 0) { error } will fail to detect the error. So let's disable __kfifo_must_check_helper() for 2.6.36. Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Stefani Seibold <stefani@seibold.net> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26types.h: move misplaced commentAndrew Morton
This comment landed in the wrong place. Cc: Andi Kleen <andi@firstfloor.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Miller <davem@davemloft.net> Cc: Eric Paris <eparis@redhat.com> Cc: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26Merge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'ehca', 'iboe', 'ipoib', ↵Roland Dreier
'misc', 'mlx4', 'nes', 'qib' and 'srp' into for-next
2010-10-26resources: support allocating space within a region from the top downBjorn Helgaas
Allocate space from the top of a region first, then work downward, if an architecture desires this. When we allocate space from a resource, we look for gaps between children of the resource. Previously, we always looked at gaps from the bottom up. For example, given this: [mem 0xbff00000-0xf7ffffff] PCI Bus 0000:00 [mem 0xbff00000-0xbfffffff] gap -- available [mem 0xc0000000-0xdfffffff] PCI Bus 0000:02 [mem 0xe0000000-0xf7ffffff] gap -- available we attempted to allocate from the [mem 0xbff00000-0xbfffffff] gap first, then the [mem 0xe0000000-0xf7ffffff] gap. With this patch an architecture can choose to allocate from the top gap [mem 0xe0000000-0xf7ffffff] first. We can't do this across the board because iomem_resource.end is initialized to 0xffffffff_ffffffff on 64-bit architectures, and most machines can't address the entire 64-bit physical address space. Therefore, we only allocate top-down if the arch requests it by clearing "resource_alloc_from_bottom". Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-10-26Merge branch 'misc' into releaseLen Brown
2010-10-26Merge branch 'acpi-mmio' into releaseLen Brown
Conflicts: drivers/acpi/osl.c Signed-off-by: Len Brown <len.brown@intel.com>
2010-10-26Merge branch 'ima-memory-use-fixes'Linus Torvalds
* ima-memory-use-fixes: IMA: fix the ToMToU logic IMA: explicit IMA i_flag to remove global lock on inode_delete IMA: drop refcnt from ima_iint_cache since it isn't needed IMA: only allocate iint when needed IMA: move read counter into struct inode IMA: use i_writecount rather than a private counter IMA: use inode->i_lock to protect read and write counters IMA: convert internal flags from long to char IMA: use unsigned int instead of long for counters IMA: drop the inode opencount since it isn't needed for operation IMA: use rbtree instead of radix tree for inode information cache
2010-10-26IMA: explicit IMA i_flag to remove global lock on inode_deleteEric Paris
Currently for every removed inode IMA must take a global lock and search the IMA rbtree looking for an associated integrity structure. Instead we explicitly mark an inode when we add an integrity structure so we only have to take the global lock and do the removal if it exists. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26IMA: move read counter into struct inodeEric Paris
IMA currently allocated an inode integrity structure for every inode in core. This stucture is about 120 bytes long. Most files however (especially on a system which doesn't make use of IMA) will never need any of this space. The problem is that if IMA is enabled we need to know information about the number of readers and the number of writers for every inode on the box. At the moment we collect that information in the per inode iint structure and waste the rest of the space. This patch moves those counters into the struct inode so we can eventually stop allocating an IMA integrity structure except when absolutely needed. This patch does the minimum needed to move the location of the data. Further cleanups, especially the location of counter updates, may still be possible. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26Merge git://git.infradead.org/battery-2.6Linus Torvalds
* git://git.infradead.org/battery-2.6: power_supply: Makefile cleanup bq27x00_battery: Add missing kfree(di->bus) in bq27x00_battery_remove() power_supply: Introduce maximum current property power_supply: Add types for USB chargers ds2782_battery: Fix units power_supply: Add driver for TWL4030/TPS65950 BCI charger bq20z75: Add support for more power supply properties wm831x_power: Add missing kfree(wm831x_power) in wm831x_power_remove() jz4740-battery: Add missing kfree(jz_battery) in jz_battery_remove() ds2760_battery: Add missing kfree(di) in ds2760_battery_remove() olpc_battery: Fix endian neutral breakage for s16 values ds2760_battery: Fix W1 and W1_SLAVE_DS2760 dependency pcf50633-charger: Add missing sysfs_remove_group() power_supply: Add driver for TI BQ20Z75 gas gauge IC wm831x_power: Remove duplicate chg mask omap: rx51: Add support for USB chargers power_supply: Add isp1704 charger detection driver
2010-10-26Merge branch 'hwpoison' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (22 commits) Add _addr_lsb field to ia64 siginfo Fix migration.c compilation on s390 HWPOISON: Remove retry loop for try_to_unmap HWPOISON: Turn addr_valid from bitfield into char HWPOISON: Disable DEBUG by default HWPOISON: Convert pr_debugs to pr_info HWPOISON: Improve comments in memory-failure.c x86: HWPOISON: Report correct address granuality for huge hwpoison faults Encode huge page size for VM_FAULT_HWPOISON errors Fix build error with !CONFIG_MIGRATION hugepage: move is_hugepage_on_freelist inside ifdef to avoid warning Clean up __page_set_anon_rmap HWPOISON, hugetlb: fix unpoison for hugepage HWPOISON, hugetlb: soft offlining for hugepage HWPOSION, hugetlb: recover from free hugepage error when !MF_COUNT_INCREASED hugetlb: move refcounting in hugepage allocation inside hugetlb_lock HWPOISON, hugetlb: add free check to dequeue_hwpoison_huge_page() hugetlb: hugepage migration core hugetlb: redefine hugepage copy functions hugetlb: add allocate function for hugepage migration ...
2010-10-26Merge branch 'for-2.6.37' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'for-2.6.37' of git://linux-nfs.org/~bfields/linux: (99 commits) svcrpc: svc_tcp_sendto XPT_DEAD check is redundant svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue svcrpc: assume svc_delete_xprt() called only once svcrpc: never clear XPT_BUSY on dead xprt nfsd4: fix connection allocation in sequence() nfsd4: only require krb5 principal for NFSv4.0 callbacks nfsd4: move minorversion to client nfsd4: delay session removal till free_client nfsd4: separate callback change and callback probe nfsd4: callback program number is per-session nfsd4: track backchannel connections nfsd4: confirm only on succesful create_session nfsd4: make backchannel sequence number per-session nfsd4: use client pointer to backchannel session nfsd4: move callback setup into session init code nfsd4: don't cache seq_misordered replies SUNRPC: Properly initialize sock_xprt.srcaddr in all cases SUNRPC: Use conventional switch statement when reclassifying sockets sunrpc/xprtrdma: clean up workqueue usage sunrpc: Turn list_for_each-s into the ..._entry-s ... Fix up trivial conflicts (two different deprecation notices added in separate branches) in Documentation/feature-removal-schedule.txt
2010-10-26Merge branch 'nfs-for-2.6.37' of ↵Linus Torvalds
git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: net/sunrpc: Use static const char arrays nfs4: fix channel attribute sanity-checks NFSv4.1: Use more sensible names for 'initialize_mountpoint' NFSv4.1: pnfs: filelayout: add driver's LAYOUTGET and GETDEVICEINFO infrastructure NFSv4.1: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure NFS: client needs to maintain list of inodes with active layouts NFS: create and destroy inode's layout cache NFSv4.1: pnfs: filelayout: introduce minimal file layout driver NFSv4.1: pnfs: full mount/umount infrastructure NFS: set layout driver NFS: ask for layouttypes during v4 fsinfo call NFS: change stateid to be a union NFSv4.1: pnfsd, pnfs: protocol level pnfs constants SUNRPC: define xdr_decode_opaque_fixed NFSD: remove duplicate NFS4_STATEID_SIZE
2010-10-26[SCSI] libosd: write/read_sg_kern APIBoaz Harrosh
This is a trivial addition to the SG API that can receive kernel pointers. It is only used by the out-of-tree test module. So it's immediate need is questionable. For maintenance ease it might just get in, as it's very small. John. do you need this in the Kernel, or is it only for osd_ktest.ko? Signed-off-by: John A. Chandy <john.chandy@uconn.edu> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-10-26[SCSI] libosd: Support for scatter gather write/read commandsBoaz Harrosh
This patch adds the Scatter-Gather (sg) API to libosd. Scatter-gather enables a write/read of multiple none-contiguous areas of an object, in a single call. The extents may overlap and/or be in any order. The Scatter-Gather list is sent to the target in what is called a "cdb continuation segment". This is yet another possible segment in the osd-out-buffer. It is unlike all other segments in that it sits before the actual "data" segment (which until now was always first), and that it is signed by itself and not part of the data buffer. This is because the cdb-continuation-segment is considered a spill-over of the CDB data, and is therefor signed under OSD_SEC_CAPKEY and higher. TODO: A new osd_finalize_request_ex version should be supplied so the @caps received on the network also contains a size parameter and can be spilled over into the "cdb continuation segment". Thanks to John Chandy <john.chandy@uconn.edu> for the original code, and investigations. And the implementation of SG support in the osd-target. Original-coded-by: John Chandy <john.chandy@uconn.edu> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-10-26Merge branch 'tip/perf/ringbuffer-2' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent
2010-10-26mtd: cfi_cmdset_0002: make sector erase command variableGuillaume LECERF
Some old SST chips use 0x50 as sector erase command, instead of 0x30. Make this value variable to handle such chips. Signed-off-by: Guillaume LECERF <glecerf@gmail.com> Acked-by: Wolfram Sang <w.sang@pengutronix.de> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2010-10-26genirq: Add single IRQ reservation helperPaul Mundt
For cases that wish to reserve a single IRQ at a given place simply provide a wrapper in to the ranged reservation routine. Signed-off-by: Paul Mundt <lethal@linux-sh.org> LKML-Reference: <20101026071912.GD4733@linux-sh.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-10-26sh: Switch dynamic IRQ creation to generic irq allocator.Paul Mundt
Now that the genirq code provides an IRQ bitmap of its own and the necessary API to manipulate it, there's no need to keep our own version around anymore. In the process we kill off some unused IRQ reservation code, with future users now having to tie in to the genirq API as normal. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-10-25Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin: Blackfin: fix inverted anomaly 05000481 logic Blackfin: drop unused irq_panic()/DEBUG_ICACHE_CHECK Blackfin: ppi/spi/twi headers: add missing __BFP undef Blackfin: update defconfigs Blackfin: bfin_twi.h: start a common TWI header netdev: bfin_mac: push settings to platform resources
2010-10-25fs: inode split IO and LRU listsNick Piggin
The use of the same inode list structure (inode->i_list) for two different list constructs with different lifecycles and purposes makes it impossible to separate the locking of the different operations. Therefore, to enable the separation of the locking of the writeback and reclaim lists, split the inode->i_list into two separate lists dedicated to their specific tracking functions. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25fs: use percpu counter for nr_dentry and nr_dentry_unusedChristoph Hellwig
The nr_dentry stat is a globally touched cacheline and atomic operation twice over the lifetime of a dentry. It is used for the benfit of userspace only. Turn it into a per-cpu counter and always decrement it in d_free instead of doing various batching operations to reduce lock hold times in the callers. Based on an earlier patch from Nick Piggin <npiggin@suse.de>. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25fs: do not assign default i_ino in new_inodeChristoph Hellwig
Instead of always assigning an increasing inode number in new_inode move the call to assign it into those callers that actually need it. For now callers that need it is estimated conservatively, that is the call is added to all filesystems that do not assign an i_ino by themselves. For a few more filesystems we can avoid assigning any inode number given that they aren't user visible, and for others it could be done lazily when an inode number is actually needed, but that's left for later patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25new helper: ihold()Al Viro
Clones an existing reference to inode; caller must already hold one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25fs: remove inode_add_to_list/__inode_add_to_listChristoph Hellwig
Split up inode_add_to_list/__inode_add_to_list. Locking for the two lists will be split soon so these helpers really don't buy us much anymore. The __ prefixes for the sb list helpers will go away soon, but until inode_lock is gone we'll need them to distinguish between the locked and unlocked variants. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25fs: Implement lazy LRU updates for inodesNick Piggin
Convert the inode LRU to use lazy updates to reduce lock and cacheline traffic. We avoid moving inodes around in the LRU list during iget/iput operations so these frequent operations don't need to access the LRUs. Instead, we defer the refcount checks to reclaim-time and use a per-inode state flag, I_REFERENCED, to tell reclaim that iget has touched the inode in the past. This means that only reclaim should be touching the LRU with any frequency, hence significantly reducing lock acquisitions and the amount contention on LRU updates. This also removes the inode_in_use list, which means we now only have one list for tracking the inode LRU status. This makes it much simpler to split out the LRU list operations under it's own lock. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25fs: Convert nr_inodes and nr_unused to per-cpu countersDave Chinner
The number of inodes allocated does not need to be tied to the addition or removal of an inode to/from a list. If we are not tied to a list lock, we could update the counters when inodes are initialised or destroyed, but to do that we need to convert the counters to be per-cpu (i.e. independent of a lock). This means that we have the freedom to change the list/locking implementation without needing to care about the counters. Based on a patch originally from Eric Dumazet. [AV: cleaned up a bit, fixed build breakage on weird configs Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25list.h: new helper - hlist_add_fake()Al Viro
Make node look as if it was on hlist, with hlist_del() working correctly. Usable without any locking... Convert a couple of places where we want to do that to inode->i_hash. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25new helper: inode_unhashed()Al Viro
note: for race-free uses you inode_lock held Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25unexport invalidate_inodesAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25vfs: introduce FMODE_UNSIGNED_OFFSET for allowing negative f_posKAMEZAWA Hiroyuki
Now, rw_verify_area() checsk f_pos is negative or not. And if negative, returns -EINVAL. But, some special files as /dev/(k)mem and /proc/<pid>/mem etc.. has negative offsets. And we can't do any access via read/write to the file(device). So introduce FMODE_UNSIGNED_OFFSET to allow negative file offsets. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25fs: allow for more than 2^31 filesEric Dumazet
Andrew, Could you please review this patch, you probably are the right guy to take it, because it crosses fs and net trees. Note : /proc/sys/fs/file-nr is a read-only file, so this patch doesnt depend on previous patch (sysctl: fix min/max handling in __do_proc_doulongvec_minmax()) Thanks ! [PATCH V4] fs: allow for more than 2^31 files Robin Holt tried to boot a 16TB system and found af_unix was overflowing a 32bit value : <quote> We were seeing a failure which prevented boot. The kernel was incapable of creating either a named pipe or unix domain socket. This comes down to a common kernel function called unix_create1() which does: atomic_inc(&unix_nr_socks); if (atomic_read(&unix_nr_socks) > 2 * get_max_files()) goto out; The function get_max_files() is a simple return of files_stat.max_files. files_stat.max_files is a signed integer and is computed in fs/file_table.c's files_init(). n = (mempages * (PAGE_SIZE / 1024)) / 10; files_stat.max_files = n; In our case, mempages (total_ram_pages) is approx 3,758,096,384 (0xe0000000). That leaves max_files at approximately 1,503,238,553. This causes 2 * get_max_files() to integer overflow. </quote> Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long integers, and change af_unix to use an atomic_long_t instead of atomic_t. get_max_files() is changed to return an unsigned long. get_nr_files() is changed to return a long. unix_nr_socks is changed from atomic_t to atomic_long_t, while not strictly needed to address Robin problem. Before patch (on a 64bit kernel) : # echo 2147483648 >/proc/sys/fs/file-max # cat /proc/sys/fs/file-max -18446744071562067968 After patch: # echo 2147483648 >/proc/sys/fs/file-max # cat /proc/sys/fs/file-max 2147483648 # cat /proc/sys/fs/file-nr 704 0 2147483648 Reported-by: Robin Holt <holt@sgi.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David Miller <davem@davemloft.net> Reviewed-by: Robin Holt <holt@sgi.com> Tested-by: Robin Holt <holt@sgi.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25fs: kill block_prepare_writeChristoph Hellwig
__block_write_begin and block_prepare_write are identical except for slightly different calling conventions. Convert all callers to the __block_write_begin calling conventions and drop block_prepare_write. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25fs: mark destroy_inode staticChristoph Hellwig
Hugetlbfs used to need it, but after the destroy_inode and evict_inode changes it's not required anymore. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25fs: add sync_inode_metadataChristoph Hellwig
Add a new helper to write out the inode using the writeback code, that is including the correct dirty bit and list manipulation. A few of filesystems already opencode this, and a lot of others should be using it instead of using write_inode_now which also writes out the data. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25Merge branch 'hwmon-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/staging * 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/staging: (24 commits) hwmon: lis3: Release resources in case of failure hwmon: lis3: Short explanations of platform data fields hwmon: lis3: Enhance lis3 selftest with IRQ line test hwmon: lis3: use block read to access data registers hwmon: lis3: Adjust fuzziness for 8 bit device hwmon: lis3: New parameters to platform data hwmon: lis3: restore axis enabled bits hwmon: lis3: Power on corrections hwmon: lis3: Update coordinates at polled device open hwmon: lis3: Cleanup interrupt handling hwmon: lis3: regulator control hwmon: lis3: pm_runtime support Kirkwood: add fan support for Network Space Max v2 hwmon: add generic GPIO fan driver hwmon: (coretemp) fix reading of microcode revision (v2) hwmon: ({core, pkg, via-cpu}temp) remove unnecessary CONFIG_HOTPLUG_CPU ifdefs hwmon: (pkgtemp) align driver initialization style with coretemp hwmon: LTC4261 Hardware monitoring driver hwmon: (lis3) add axes module parameter for custom axis-mapping hwmon: (hp_accel) Add HP Mini 510x family support ...
2010-10-26Merge remote branch 'intel/drm-intel-next' of ../drm-next into drm-core-nextDave Airlie
* 'intel/drm-intel-next' of ../drm-next: (63 commits) drm/i915: Move gpu_write_list to per-ring drm/i915: Invalidate the to-ring, flush the old-ring when updating domains drm/i915/ringbuffer: Write the value passed in to the tail register agp/intel: Restore valid PTE bit for Sandybridge after bdd3072 drm/i915: Fix flushing regression from 9af90d19f drm/i915/sdvo: Remove unused encoding member i915: enable AVI infoframe for intel_hdmi.c [v4] drm/i915: Fix current fb blocking for page flip drm/i915: IS_IRONLAKE is synonymous with gen == 5 drm/i915: Enable SandyBridge blitter ring drm/i915/ringbuffer: Remove broken intel_fill_struct() drm/i915/ringbuffer: Fix emit batch buffer regression from 8187a2b drm/i915: Copy the updated reloc->presumed_offset back to the user drm/i915: Track objects in global active list (as well as per-ring) drm/i915: Simplify most HAS_BSD() checks drm/i915: cache the last object lookup during pin_and_relocate() drm/i915: Do interrupible mutex lock first to avoid locking for unreference drivers: gpu: drm: i915: Fix a typo. agp/intel: Also add B43.1 to list of supported devices drm/i915: rearrange mutex acquisition for pread ...
2010-10-25ipv4: add __rcu annotations to ip_ra_chainEric Dumazet
Add __rcu annotations to : (struct ip_ra_chain)->next struct ip_ra_chain *ip_ra_chain; And use appropriate rcu primitives. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25net: add __rcu annotation to sk_filterEric Dumazet
Add __rcu annotation to : (struct sock)->sk_filter And use appropriate rcu primitives to reduce sparse warnings if CONFIG_SPARSE_RCU_POINTER=y Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25net_ns: add __rcu annotationsEric Dumazet
add __rcu annotation to (struct net)->gen, and use rcu_dereference_protected() in net_assign_generic() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25rps: add __rcu annotationsEric Dumazet
Add __rcu annotations to : (struct netdev_rx_queue)->rps_map (struct netdev_rx_queue)->rps_flow_table struct rps_sock_flow_table *rps_sock_flow_table; And use appropriate rcu primitives. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25hwmon: lis3: Short explanations of platform data fieldsSamu Onkalo
Short documentation at kernel doc format. Signed-off-by: Samu Onkalo <samu.p.onkalo@nokia.com> Acked-by: Jonathan Cameron <jic23@cam.ac.uk> Acked-by: Eric Piel <eric.piel@tremplin-utc.net> Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
2010-10-25hwmon: lis3: use block read to access data registersSamu Onkalo
Add optional blockread function to interface driver. If available the chip driver uses it for data register access. For 12 bit device it reads 6 bytes to get 3*16bit data. For 8 bit device it reads out 5 bytes since every second byte is dummy. This optimizes bus usage and reduces number of operations and interrupts needed for one data update. Signed-off-by: Samu Onkalo <samu.p.onkalo@nokia.com> Acked-by: Jonathan Cameron <jic23@cam.ac.uk> Acked-by: Eric Piel <eric.piel@tremplin-utc.net> Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
2010-10-25hwmon: lis3: New parameters to platform dataSamu Onkalo
Added default output data rate setting to platform data. If default rate is 0, reset default value is used. Added control for duration via platform data. Added possibility to configure interrupts to trig on both rising and falling edge. The lis3 WU unit can be configured quite many ways and with some configurations it is quite handy to get coordinate refresh when some event trigs and when it reason goes away. Signed-off-by: Samu Onkalo <samu.p.onkalo@nokia.com> Acked-by: Jonathan Cameron <jic23@cam.ac.uk> Acked-by: Eric Piel <eric.piel@tremplin-utc.net> Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
2010-10-25hwmon: lis3: regulator controlSamu Onkalo
Based on pm_runtime control, turn lis3 regulators on and off. Perform context save and restore on transitions. Feature is optional and must be enabled in platform data. Signed-off-by: Samu Onkalo <samu.p.onkalo@nokia.com> Acked-by: Jonathan Cameron <jic23@cam.ac.uk> Acked-by: Eric Piel <eric.piel@tremplin-utc.net> Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>