summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2012-01-12mm: vmscan: distinguish between memcg triggering reclaim and memcg being scannedJohannes Weiner
Memory cgroup hierarchies are currently handled completely outside of the traditional reclaim code, which is invoked with a single memory cgroup as an argument for the whole call stack. Subsequent patches will switch this code to do hierarchical reclaim, so there needs to be a distinction between a) the memory cgroup that is triggering reclaim due to hitting its limit and b) the memory cgroup that is being scanned as a child of a). This patch introduces a struct mem_cgroup_zone that contains the combination of the memory cgroup and the zone being scanned, which is then passed down the stack instead of the zone argument. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Ying Han <yinghan@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12mm: vmscan: distinguish global reclaim from global LRU scanningJohannes Weiner
The traditional zone reclaim code is scanning the per-zone LRU lists during direct reclaim and kswapd, and the per-zone per-memory cgroup LRU lists when reclaiming on behalf of a memory cgroup limit. Subsequent patches will convert the traditional reclaim code to reclaim exclusively from the per-memory cgroup LRU lists. As a result, using the predicate for which LRU list is scanned will no longer be appropriate to tell global reclaim from limit reclaim. This patch adds a global_reclaim() predicate to tell direct/kswapd reclaim from memory cgroup limit reclaim and substitutes it in all places where currently scanning_global_lru() is used for that. Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Ying Han <yinghan@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12mm: memcg: consolidate hierarchy iteration primitivesJohannes Weiner
The memcg naturalization series: Memory control groups are currently bolted onto the side of traditional memory management in places where better integration would be preferrable. To reclaim memory, for example, memory control groups maintain their own LRU list and reclaim strategy aside from the global per-zone LRU list reclaim. But an extra list head for each existing page frame is expensive and maintaining it requires additional code. This patchset disables the global per-zone LRU lists on memory cgroup configurations and converts all its users to operate on the per-memory cgroup lists instead. As LRU pages are then exclusively on one list, this saves two list pointers for each page frame in the system: page_cgroup array size with 4G physical memory vanilla: allocated 31457280 bytes of page_cgroup patched: allocated 15728640 bytes of page_cgroup At the same time, system performance for various workloads is unaffected: 100G sparse file cat, 4G physical memory, 10 runs, to test for code bloat in the traditional LRU handling and kswapd & direct reclaim paths, without/with the memory controller configured in vanilla: 71.603(0.207) seconds patched: 71.640(0.156) seconds vanilla: 79.558(0.288) seconds patched: 77.233(0.147) seconds 100G sparse file cat in 1G memory cgroup, 10 runs, to test for code bloat in the traditional memory cgroup LRU handling and reclaim path vanilla: 96.844(0.281) seconds patched: 94.454(0.311) seconds 4 unlimited memcgs running kbuild -j32 each, 4G physical memory, 500M swap on SSD, 10 runs, to test for regressions in kswapd & direct reclaim using per-memcg LRU lists with multiple memcgs and multiple allocators within each memcg vanilla: 717.722(1.440) seconds [ 69720.100(11600.835) majfaults ] patched: 714.106(2.313) seconds [ 71109.300(14886.186) majfaults ] 16 unlimited memcgs running kbuild, 1900M hierarchical limit, 500M swap on SSD, 10 runs, to test for regressions in hierarchical memcg setups vanilla: 2742.058(1.992) seconds [ 26479.600(1736.737) majfaults ] patched: 2743.267(1.214) seconds [ 27240.700(1076.063) majfaults ] This patch: There are currently two different implementations of iterating over a memory cgroup hierarchy tree. Consolidate them into one worker function and base the convenience looping-macros on top of it. Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Ying Han <yinghan@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12memcg: add mem_cgroup_replace_page_cache() to fix LRU issueKAMEZAWA Hiroyuki
Commit ef6a3c6311 ("mm: add replace_page_cache_page() function") added a function replace_page_cache_page(). This function replaces a page in the radix-tree with a new page. WHen doing this, memory cgroup needs to fix up the accounting information. memcg need to check PCG_USED bit etc. In some(many?) cases, 'newpage' is on LRU before calling replace_page_cache(). So, memcg's LRU accounting information should be fixed, too. This patch adds mem_cgroup_replace_page_cache() and removes the old hooks. In that function, old pages will be unaccounted without touching res_counter and new page will be accounted to the memcg (of old page). WHen overwriting pc->mem_cgroup of newpage, take zone->lru_lock and avoid races with LRU handling. Background: replace_page_cache_page() is called by FUSE code in its splice() handling. Here, 'newpage' is replacing oldpage but this newpage is not a newly allocated page and may be on LRU. LRU mis-accounting will be critical for memory cgroup because rmdir() checks the whole LRU is empty and there is no account leak. If a page is on the other LRU than it should be, rmdir() will fail. This bug was added in March 2011, but no bug report yet. I guess there are not many people who use memcg and FUSE at the same time with upstream kernels. The result of this bug is that admin cannot destroy a memcg because of account leak. So, no panic, no deadlock. And, even if an active cgroup exist, umount can succseed. So no problem at shutdown. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12epoll: limit pathsJason Baron
The current epoll code can be tickled to run basically indefinitely in both loop detection path check (on ep_insert()), and in the wakeup paths. The programs that tickle this behavior set up deeply linked networks of epoll file descriptors that cause the epoll algorithms to traverse them indefinitely. A couple of these sample programs have been previously posted in this thread: https://lkml.org/lkml/2011/2/25/297. To fix the loop detection path check algorithms, I simply keep track of the epoll nodes that have been already visited. Thus, the loop detection becomes proportional to the number of epoll file descriptor and links. This dramatically decreases the run-time of the loop check algorithm. In one diabolical case I tried it reduced the run-time from 15 mintues (all in kernel time) to .3 seconds. Fixing the wakeup paths could be done at wakeup time in a similar manner by keeping track of nodes that have already been visited, but the complexity is harder, since there can be multiple wakeups on different cpus...Thus, I've opted to limit the number of possible wakeup paths when the paths are created. This is accomplished, by noting that the end file descriptor points that are found during the loop detection pass (from the newly added link), are actually the sources for wakeup events. I keep a list of these file descriptors and limit the number and length of these paths that emanate from these 'source file descriptors'. In the current implemetation I allow 1000 paths of length 1, 500 of length 2, 100 of length 3, 50 of length 4 and 10 of length 5. Note that it is sufficient to check the 'source file descriptors' reachable from the newly added link, since no other 'source file descriptors' will have newly added links. This allows us to check only the wakeup paths that may have gotten too long, and not re-check all possible wakeup paths on the system. In terms of the path limit selection, I think its first worth noting that the most common case for epoll, is probably the model where you have 1 epoll file descriptor that is monitoring n number of 'source file descriptors'. In this case, each 'source file descriptor' has a 1 path of length 1. Thus, I believe that the limits I'm proposing are quite reasonable and in fact may be too generous. Thus, I'm hoping that the proposed limits will not prevent any workloads that currently work to fail. In terms of locking, I have extended the use of the 'epmutex' to all epoll_ctl add and remove operations. Currently its only used in a subset of the add paths. I need to hold the epmutex, so that we can correctly traverse a coherent graph, to check the number of paths. I believe that this additional locking is probably ok, since its in the setup/teardown paths, and doesn't affect the running paths, but it certainly is going to add some extra overhead. Also, worth noting is that the epmuex was recently added to the ep_ctl add operations in the initial path loop detection code using the argument that it was not on a critical path. Another thing to note here, is the length of epoll chains that is allowed. Currently, eventpoll.c defines: /* Maximum number of nesting allowed inside epoll sets */ #define EP_MAX_NESTS 4 This basically means that I am limited to a graph depth of 5 (EP_MAX_NESTS + 1). However, this limit is currently only enforced during the loop check detection code, and only when the epoll file descriptors are added in a certain order. Thus, this limit is currently easily bypassed. The newly added check for wakeup paths, stricly limits the wakeup paths to a length of 5, regardless of the order in which ep's are linked together. Thus, a side-effect of the new code is a more consistent enforcement of the graph depth. Thus far, I've tested this, using the sample programs previously mentioned, which now either return quickly or return -EINVAL. I've also testing using the piptest.c epoll tester, which showed no difference in performance. I've also created a number of different epoll networks and tested that they behave as expectded. I believe this solves the original diabolical test cases, while still preserving the sane epoll nesting. Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Nelson Elhage <nelhage@ksplice.com> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12pipe: fail cleanly when root tries F_SETPIPE_SZ with big sizeSasha Levin
When a user with the CAP_SYS_RESOURCE cap tries to F_SETPIPE_SZ a pipe with size bigger than kmalloc() can alloc it spits out an ugly warning: ------------[ cut here ]------------ WARNING: at mm/page_alloc.c:2095 __alloc_pages_nodemask+0x5d3/0x7a0() Pid: 733, comm: a.out Not tainted 3.2.0-rc1+ #4 Call Trace: warn_slowpath_common+0x75/0xb0 warn_slowpath_null+0x15/0x20 __alloc_pages_nodemask+0x5d3/0x7a0 __get_free_pages+0x12/0x50 __kmalloc+0x12b/0x150 pipe_set_size+0x75/0x120 pipe_fcntl+0xf8/0x140 do_fcntl+0x2d4/0x410 sys_fcntl+0x66/0xa0 system_call_fastpath+0x16/0x1b ---[ end trace 432f702e6db7b5ee ]--- Instead, make kcalloc() handle the overflow case and fail quietly. [akpm@linux-foundation.org: switch to sizeof(*bufs) for 80-column niceness] Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Acked-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12slub: document setting min order with debug_guardpage_minorder > 0Stanislaw Gruszka
Acked-by: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12parisc, exec: remove redundant set_fs(USER_DS)Mathias Krause
The address limit is already set in flush_old_exec() so those calls to set_fs(USER_DS) are redundant. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Helge Deller <deller@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12ia64, exec: remove redundant set_fs(USER_DS)Mathias Krause
The address limit is already set in flush_old_exec() so this set_fs(USER_DS) is redundant. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12drivers/video/nvidia/nvidia.c: fix warningAndrew Morton
Fix the int/bool confusion in there. drivers/video/nvidia/nvidia.c:1602: warning: return from incompatible pointer type Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12mm,x86,um: move CMPXCHG_DOUBLE config optionHeiko Carstens
Move CMPXCHG_DOUBLE and rename it to HAVE_CMPXCHG_DOUBLE so architectures can simply select the option if it is supported. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12mm,x86,um: move CMPXCHG_LOCAL config optionHeiko Carstens
Move CMPXCHG_LOCAL and rename it to HAVE_CMPXCHG_LOCAL so architectures can simply select the option if it is supported. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12mm,slub,x86: decouple size of struct page from CONFIG_CMPXCHG_LOCALHeiko Carstens
While implementing cmpxchg_double() on s390 I realized that we don't set CONFIG_CMPXCHG_LOCAL despite the fact that we have support for it. However setting that option will increase the size of struct page by eight bytes on 64 bit, which we certainly do not want. Also, it doesn't make sense that a present cpu feature should increase the size of struct page. Besides that it looks like the dependency to CMPXCHG_LOCAL is wrong and that it should depend on CMPXCHG_DOUBLE instead. This patch: If an architecture supports CMPXCHG_LOCAL this shouldn't result automatically in larger struct pages if the SLUB allocator is used. Instead introduce a new config option "HAVE_ALIGNED_STRUCT_PAGE" which can be selected if a double word aligned struct page is required. Also update x86 Kconfig so that it should work as before. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12include/linux/linkage.h: remove unused ATTRIB_NORET macroJoe Perches
The uses have been renamed so delete the unused macro. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12treewide: convert uses of ATTRIB_NORETURN to __noreturnJoe Perches
Use the more commonly used __noreturn instead of ATTRIB_NORETURN. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Joe Perches <joe@perches.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12treewide: remove useless NORET_TYPE macro and usesJoe Perches
It's a very old and now unused prototype marking so just delete it. Neaten panic pointer argument style to keep checkpatch quiet. Signed-off-by: Joe Perches <joe@perches.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12include/linux/linkage.h: remove unused NORET_AND macroJoe Perches
The only use in kernel.h is gone so remove the macro. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12kernel.h: neaten panic prototypeJoe Perches
Use __printf macro. Convert NORET_AND to ATTRIB_NORET. Use the normal kernel style for pointer arguments. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12kprobes: silence DEBUG_STRICT_USER_COPY_CHECKS=y warningStephen Boyd
Enabling DEBUG_STRICT_USER_COPY_CHECKS causes the following warning: In file included from arch/x86/include/asm/uaccess.h:573, from kernel/kprobes.c:55: In function 'copy_from_user', inlined from 'write_enabled_file_bool' at kernel/kprobes.c:2191: arch/x86/include/asm/uaccess_64.h:65: warning: call to 'copy_from_user_overflow' declared with attribute warning: copy_from_user() buffer size is not provably correct presumably due to buf_size being signed causing GCC to fail to see that buf_size can't become negative. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12proc: fix null pointer deref in proc_pid_permission()Xiaotian Feng
get_proc_task() can fail to search the task and return NULL, put_task_struct() will then bomb the kernel with following oops: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffff81217d34>] proc_pid_permission+0x64/0xe0 PGD 112075067 PUD 112814067 PMD 0 Oops: 0002 [#1] PREEMPT SMP This is a regression introduced by commit 0499680a ("procfs: add hidepid= and gid= mount options"). The kernel should return -ESRCH if get_proc_task() failed. Signed-off-by: Xiaotian Feng <dannyfeng@tencent.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: Stephen Wilson <wilsons@start.ca> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12pptp: Accept packet with seq zeroBradley Peterson
Initialize the PPTP "seq received" value to 0xffffffff, so we don't ignore packets with seq zero. Signed-off-by: Bradley Peterson <despite@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-12RDS: Remove some unused iWARP codeRoland Dreier
rds_iw_flush_goal() just returns a count, but it is only called in one place and its return value is ignored there. So delete all the dead code. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-12net: fsl: fec: handle 10Mbps speed in RMII modeEric Benard
when the link is 10 Mbps and the mode is RMII, it's necessary to set FRCONT to 1 in MIIGSK_CFGR to divide the RMII source clock by 10 in order to support 10 Mbps operations. Signed-off-by: Eric Bénard <eric@eukrea.com> Acked-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-12drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c: add missing iounmapJulia Lawall
Add missing iounmap in error handling code, in a case where the function already preforms iounmap on some other execution path. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression e; statement S,S1; int ret; @@ e = \(ioremap\|ioremap_nocache\)(...) ... when != iounmap(e) if (<+...e...+>) S ... when any when != iounmap(e) *if (...) { ... when != iounmap(e) return ...; } ... when any iounmap(e); // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-12drivers/net/ethernet/tundra/tsi108_eth.c: add missing iounmapJulia Lawall
Add missing iounmap in error handling code, in a case where the function already preforms iounmap on some other execution path. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression e; statement S,S1; int ret; @@ e = \(ioremap\|ioremap_nocache\)(...) ... when != iounmap(e) if (<+...e...+>) S ... when any when != iounmap(e) *if (...) { ... when != iounmap(e) return ...; } ... when any iounmap(e); // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-12ksz884x: fix mtu for VLANDoug Kehn
The Ethernet header does not account for the addition of a VLAN header. Full size Ethernet frames containing VLAN header are not processed because the frame is larger than the resulting hw mtu. Signed-off-by: Doug Kehn <rdkehn@yahoo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-12net_sched: sfq: add optional RED on top of SFQEric Dumazet
Adds an optional Random Early Detection on each SFQ flow queue. Traditional SFQ limits count of packets, while RED permits to also control number of bytes per flow, and adds ECN capability as well. 1) We dont handle the idle time management in this RED implementation, since each 'new flow' begins with a null qavg. We really want to address backlogged flows. 2) if headdrop is selected, we try to ecn mark first packet instead of currently enqueued packet. This gives faster feedback for tcp flows compared to traditional RED [ marking the last packet in queue ] Example of use : tc qdisc add dev $DEV parent 1:1 handle 10: est 1sec 4sec sfq \ limit 3000 headdrop flows 512 divisor 16384 \ redflowlimit 100000 min 8000 max 60000 probability 0.20 ecn qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop flows 512/16384 divisor 16384 ewma 6 min 8000b max 60000b probability 0.2 ecn prob_mark 0 prob_mark_head 4876 prob_drop 6131 forced_mark 0 forced_mark_head 0 forced_drop 0 Sent 1175211782 bytes 777537 pkt (dropped 6131, overlimits 11007 requeues 0) rate 99483Kbit 8219pps backlog 689392b 456p requeues 0 In this test, with 64 netperf TCP_STREAM sessions, 50% using ECN enabled flows, we can see number of packets CE marked is smaller than number of drops (for non ECN flows) If same test is run, without RED, we can check backlog is much bigger. qdisc sfq 10: parent 1:1 limit 3000p quantum 1514b depth 127 headdrop flows 512/16384 divisor 16384 Sent 1148683617 bytes 795006 pkt (dropped 0, overlimits 0 requeues 0) rate 98429Kbit 8521pps backlog 1221290b 841p requeues 0 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Dave Taht <dave.taht@gmail.com> Tested-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-12module_param: make bool parameters really bool (drivers/video/i810)Rusty Russell
module_param(bool) used to counter-intuitively take an int. In fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy trick. It's time to remove the int/unsigned int option. For this version it'll simply give a warning, but it'll break next kernel version. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
2012-01-12dp83640: Fix NOHZ local_softirq_pending 08 warningManfred Rudigier
Similar problem as in 481a8199142c050b72bff8a1956a49fd0a75bbe0 ("can: fix NOHZ local_softirq_pending 08 warning"). This fix replaces netif_rx() with netif_rx_ni() which has to be used from process/softirq context. Signed-off-by: Manfred Rudigier <manfred.rudigier@omicron.at> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-12gianfar: Fix invalid TX frames returned on error queue when time stampingManfred Rudigier
When TX time stamping for PTP messages is enabled on a socket, a time stamp is returned on the socket error queue to the user space application after the frame was transmitted. The transmitted frame is also returned on the error queue so that an application knows to which frame the time stamp belongs. In the current implementation the TxFCB is immediately followed by the frame. Since the eTSEC inserts the TX time stamp 8 bytes after the TxFCB, parts of the frame have been overwritten and an invalid frame was returned on the socket error queue. This patch fixes the described problem by adding additional 16 padding bytes between the TxFCB and the frame for all messages sent from a time stamping enabled socket (other sockets are not affected). Signed-off-by: Manfred Rudigier <manfred.rudigier@omicron.at> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-12gianfar: Fix missing sock reference when processing TX time stampsManfred Rudigier
When there is not enough headroom in the skb a private copy will be made. However, the private copy had no reference to the socket and consequently no time stamp could be queued on the socket error queue during the skb_tstamp_tx function. This patch fixes this issue by also stealing the sock reference from the original skb after making the private copy. Signed-off-by: Manfred Rudigier <manfred.rudigier@omicron.at> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-12phylib: introduce mdiobus_alloc_size()Timur Tabi
Introduce function mdiobus_alloc_size() as an alternative to mdiobus_alloc(). Most callers of mdiobus_alloc() also allocate a private data structure, and then manually point bus->priv to this object. mdiobus_alloc_size() combines the two operations into one, which simplifies memory management. The original mdiobus_alloc() now just calls mdiobus_alloc_size(0). Signed-off-by: Timur Tabi <timur@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-13module_param: check that bool parameters really are bool.Rusty Russell
module_param(bool) used to counter-intuitively take an int. In fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy trick. This tightens the check (you'll get a warning about incompatible return type) but still allows it. Next kernel version, we'll remove it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13intelfbdrv.c: bailearly is an int module_paramRusty Russell
Dan Carpenter points out that it's an int, not a bool: intelfbdrv.c:818: if (bailearly == 1) intelfbdrv.c:828: if (bailearly == 2) intelfbdrv.c:836: if (bailearly == 3) intelfbdrv.c:842: if (bailearly == 4) intelfbdrv.c:851: if (bailearly == 5) intelfbdrv.c:859: if (bailearly == 6) intelfbdrv.c:866: bailearly > 6 ? bailearly - 6 : 0); intelfbdrv.c:874: if (bailearly == 18) intelfbdrv.c:886: if (bailearly == 19) intelfbdrv.c:893: if (bailearly == 20) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Dan Carpenter <dan.carpenter@oracle.com>
2012-01-13paride/pcd: fix bool verbose module parameter.Rusty Russell
Dan Carpenter points out that it's an int, not a bool: pcd.c:427: if (verbose > 1) pcd.c:433: if (verbose > 1) pcd.c:437: if (verbose < 2) pcd.c:506:#define DBMSG(msg) ((verbose>1)?(msg):NULL) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Dan Carpenter <dan.carpenter@oracle.com>
2012-01-13module_param: make bool parameters really bool (drivers & misc)Rusty Russell
module_param(bool) used to counter-intuitively take an int. In fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy trick. It's time to remove the int/unsigned int option. For this version it'll simply give a warning, but it'll break next kernel version. Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13module_param: make bool parameters really bool (arch)Rusty Russell
module_param(bool) used to counter-intuitively take an int. In fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy trick. It's time to remove the int/unsigned int option. For this version it'll simply give a warning, but it'll break next kernel version. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13module_param: make bool parameters really bool (core code)Rusty Russell
module_param(bool) used to counter-intuitively take an int. In fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy trick. It's time to remove the int/unsigned int option. For this version it'll simply give a warning, but it'll break next kernel version. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13kernel/async: remove redundant declaration.Rusty Russell
It's in linux/init.h, and I'm about to change it to a bool. Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13printk: fix unnecessary module_param_name.Rusty Russell
You don't need module_param_name if the name is the same! Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13lirc_parallel: fix module parameter description.Rusty Russell
Cut and paste bug. Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: devel@driverdev.osuosl.org Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13module_param: avoid bool abuse, add bint for special cases.Rusty Russell
For historical reasons, we allow module_param(bool) to take an int (or an unsigned int). That's going away. A few drivers really want an int: they set it to -1 and a parameter will set it to 0 or 1. This sucks: reading them from sysfs will give 'Y' for both -1 and 1, but if we change it to an int, then the users might be broken (if they did "param" instead of "param=1"). Use a new 'bint' parser for them. (ntfs has a different problem: it needs an int for debug_msgs because it's also exposed via sysctl.) Cc: Steve Glendinning <steve.glendinning@smsc.com> Cc: Jean Delvare <khali@linux-fr.org> Cc: Guenter Roeck <guenter.roeck@ericsson.com> Cc: Hoang-Nam Nguyen <hnguyen@de.ibm.com> Cc: Christoph Raisch <raisch@de.ibm.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: linux390@de.ibm.com Cc: Anton Altaparmakov <anton@tuxera.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.de> Cc: lm-sensors@lm-sensors.org Cc: linux-rdma@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: linux-ntfs-dev@lists.sourceforge.net Cc: alsa-devel@alsa-project.org Acked-by: Takashi Iwai <tiwai@suse.de> (For the sound part) Acked-by: Guenter Roeck <guenter.roeck@ericsson.com> (For the hwmon driver) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13module_param: check type correctness for module_param_arrayRusty Russell
module_param_array(), unlike its non-array cousins, didn't check the type of the variable. Fixing this found two bugs. Cc: Luca Risolia <luca.risolia@studio.unibo.it> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Eric Piel <eric.piel@tremplin-utc.net> Cc: linux-media@vger.kernel.org Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13modpost: use linker section to generate table.Rusty Russell
This means (most) future busses need only have one hunk in their patch. Also took the opportunity to check that function matches the type. Again, inspired by Alessandro's patch series. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Alessandro Rubini <rubini@gnudd.com>
2012-01-13modpost: use a table rather than a giant if/else statement.Rusty Russell
We look for symbols of form __mod_<busname>_device_table, and for all but three cases we use a standard interation function (do_table) to walk over the contents and dump out the aliases. Alessandro Rubini did this first, I just repainted the bikeshed a bit. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Alessandro Rubini <rubini@gnudd.com>
2012-01-13modules: sysfs - export: taint, coresize, initsizeKay Sievers
Recent tools do not want to use /proc to retrieve module information. A few values are currently missing from sysfs to replace the information available in /proc/modules. This adds /sys/module/*/{coresize,initsize,taint} attributes. TAINT_PROPRIETARY_MODULE (P) and TAINT_OOT_MODULE (O) flags are both always shown now, and do no longer exclude each other, also in /proc/modules. Replace the open-coded sysfs attribute initializers with the __ATTR() macro. Add the new attributes to Documentation/ABI. Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13kernel/params: replace DEBUGP with pr_debugJim Cromie
Use more flexible pr_debug. This allows: echo "module params +p" > /dbg/dynamic_debug/control to turn on debug messages when needed. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13module: replace DEBUGP with pr_debugJim Cromie
Use more flexible pr_debug. This allows: echo "module module +p" > /dbg/dynamic_debug/control to turn on debug messages when needed. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13module: struct module_ref should contains long fieldsEric Dumazet
module_ref contains two "unsigned int" fields. Thats now too small, since some machines can open more than 2^32 files. Check commit 518de9b39e8 (fs: allow for more than 2^31 files) for reference. We can add an aligned(2 * sizeof(unsigned long)) attribute to force alloc_percpu() allocating module_ref areas in single cache lines. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Rusty Russell <rusty@rustcorp.com.au> CC: Tejun Heo <tj@kernel.org> CC: Robin Holt <holt@sgi.com> CC: David Miller <davem@davemloft.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13module: Fix performance regression on modules with large symbol tablesKevin Cernekee
Looking at /proc/kallsyms, one starts to ponder whether all of the extra strtab-related complexity in module.c is worth the memory savings. Instead of making the add_kallsyms() loop even more complex, I tried the other route of deleting the strmap logic and naively copying each string into core_strtab with no consideration for consolidating duplicates. Performance on an "already exists" insmod of nvidia.ko (runs add_kallsyms() but does not actually initialize the module): Original scheme: 1.230s With naive copying: 0.058s Extra space used: 35k (of a 408k module). Signed-off-by: Kevin Cernekee <cernekee@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> LKML-Reference: <73defb5e4bca04a6431392cc341112b1@localhost>