linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2009-01-05	CRED: Fix regression in cap_capable() as shown up by sys_faccessat() [ver #2]	David Howells
	Fix a regression in cap_capable() due to: commit 5ff7711e635b32f0a1e558227d030c7e45b4a465 Author: David Howells <dhowells@redhat.com> Date: Wed Dec 31 02:52:28 2008 +0000 CRED: Differentiate objective and effective subjective credentials on a task The problem is that the above patch allows a process to have two sets of credentials, and for the most part uses the subjective credentials when accessing current's creds. There is, however, one exception: cap_capable(), and thus capable(), uses the real/objective credentials of the target task, whether or not it is the current task. Ordinarily this doesn't matter, since usually the two cred pointers in current point to the same set of creds. However, sys_faccessat() makes use of this facility to override the credentials of the calling process to make its test, without affecting the creds as seen from other processes. One of the things sys_faccessat() does is to make an adjustment to the effective capabilities mask, which cap_capable(), as it stands, then ignores. The affected capability check is in generic_permission(): if (!(mask & MAY_EXEC) \|\| execute_ok(inode)) if (capable(CAP_DAC_OVERRIDE)) return 0; This change splits capable() from has_capability() down into the commoncap and SELinux code. The capable() security op now only deals with the current process, and uses the current process's subjective creds. A new security op - task_capable() - is introduced that can check any task's objective creds. strictly the capable() security op is superfluous with the presence of the task_capable() op, however it should be faster to call the capable() op since two fewer arguments need be passed down through the various layers. This can be tested by compiling the following program from the XFS testsuite: /* * t_access_root.c - trivial test program to show permission bug. * * Written by Michael Kerrisk - copyright ownership not pursued. * Sourced from: http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-10/6030.html / #include <limits.h> #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <fcntl.h> #include <sys/stat.h> #define UID 500 #define GID 100 #define PERM 0 #define TESTPATH "/tmp/t_access" static void errExit(char msg) { perror(msg); exit(EXIT_FAILURE); } /* errExit / static void accessTest(char file, int mask, char mstr) { printf("access(%s, %s) returns %d\n", file, mstr, access(file, mask)); } / accessTest / int main(int argc, char argv[]) { int fd, perm, uid, gid; char testpath; char cmd[PATH_MAX + 20]; testpath = (argc > 1) ? argv[1] : TESTPATH; perm = (argc > 2) ? strtoul(argv[2], NULL, 8) : PERM; uid = (argc > 3) ? atoi(argv[3]) : UID; gid = (argc > 4) ? atoi(argv[4]) : GID; unlink(testpath); fd = open(testpath, O_RDWR \| O_CREAT, 0); if (fd == -1) errExit("open"); if (fchown(fd, uid, gid) == -1) errExit("fchown"); if (fchmod(fd, perm) == -1) errExit("fchmod"); close(fd); snprintf(cmd, sizeof(cmd), "ls -l %s", testpath); system(cmd); if (seteuid(uid) == -1) errExit("seteuid"); accessTest(testpath, 0, "0"); accessTest(testpath, R_OK, "R_OK"); accessTest(testpath, W_OK, "W_OK"); accessTest(testpath, X_OK, "X_OK"); accessTest(testpath, R_OK \| W_OK, "R_OK \| W_OK"); accessTest(testpath, R_OK \| X_OK, "R_OK \| X_OK"); accessTest(testpath, W_OK \| X_OK, "W_OK \| X_OK"); accessTest(testpath, R_OK \| W_OK \| X_OK, "R_OK \| W_OK \| X_OK"); exit(EXIT_SUCCESS); } / main */ This can be run against an Ext3 filesystem as well as against an XFS filesystem. If successful, it will show: [root@andromeda src]# ./t_access_root /tmp/xxx 0 4043 4043 ---------- 1 dhowells dhowells 0 2008-12-31 03:00 /tmp/xxx access(/tmp/xxx, 0) returns 0 access(/tmp/xxx, R_OK) returns 0 access(/tmp/xxx, W_OK) returns 0 access(/tmp/xxx, X_OK) returns -1 access(/tmp/xxx, R_OK \| W_OK) returns 0 access(/tmp/xxx, R_OK \| X_OK) returns -1 access(/tmp/xxx, W_OK \| X_OK) returns -1 access(/tmp/xxx, R_OK \| W_OK \| X_OK) returns -1 If unsuccessful, it will show: [root@andromeda src]# ./t_access_root /tmp/xxx 0 4043 4043 ---------- 1 dhowells dhowells 0 2008-12-31 02:56 /tmp/xxx access(/tmp/xxx, 0) returns 0 access(/tmp/xxx, R_OK) returns -1 access(/tmp/xxx, W_OK) returns -1 access(/tmp/xxx, X_OK) returns -1 access(/tmp/xxx, R_OK \| W_OK) returns -1 access(/tmp/xxx, R_OK \| X_OK) returns -1 access(/tmp/xxx, W_OK \| X_OK) returns -1 access(/tmp/xxx, R_OK \| W_OK \| X_OK) returns -1 I've also tested the fix with the SELinux and syscalls LTP testsuites. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2009-01-04	gro: Add page frag support	Herbert Xu
	This patch allows GRO to merge page frags (skb_shinfo(skb)->frags) in one skb, rather than using the less efficient frag_list. It also adds a new interface, napi_gro_frags to allow drivers to inject page frags directly into the stack without allocating an skb. This is intended to be the GRO equivalent for LRO's lro_receive_frags interface. The existing GSO interface can already handle page frags with or without an appended frag_list so nothing needs to be changed there. The merging itself is rather simple. We store any new frag entries after the last existing entry, without checking whether the first new entry can be merged with the last existing entry. Making this check would actually be easy but since no existing driver can produce contiguous frags anyway it would just be mental masturbation. If the total number of entries would exceed the capacity of a single skb, we simply resort to using frag_list as we do now. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04	gro: Use gso_size to store MSS	Herbert Xu
	In order to allow GRO packets without frag_list at all, we need to store the MSS in the packet itself. The obvious place is gso_size. The only thing to watch out for is if the packet ends up not being GRO then we need to clear gso_size before pushing the packet into the stack. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04	starfire: use request_firmware()	Jaswinder Singh Rajput
	Firmware blob is big endian Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04	firmware: convert tg3 driver to request_firmware()	Jaswinder Singh Rajput
	Firmware blob looks like this... u8 firmware_major u8 firmware_minor u8 firmware_fix u8 pad __be32 start_address __be32 length (total, including BSS sections to be zeroed) data... (in __be32 words, which is native for the firmware) Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04	firmware: convert acenic driver to request_firmware()	Jaswinder Singh
	We store the firmware in its native big-endian form now, so the loop in ace_copy() is modified to use be32_to_cpup() when writing it out. We can forget the BSS,SBSS sections of the firmware, since we were clearing all the device's RAM anyway. And the text,rodata,data sections can all be loaded as a single chunk since they're contiguous (give or take a few dozen bytes in between). Signed-off-by: Jaswinder Singh <jaswinder@infradead.org> Signed-off-by: David Woodhouse <dwmw2@infradead.org> Acked-by: Jes Sorensen <jes@sgi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04	ipv6: Fix sporadic sendmsg -EINVAL when sending to multicast groups.	David S. Miller
	Thanks to excellent diagnosis by Eduard Guzovsky. The core problem is that on a network with lots of active multicast traffic, the neighbour cache can fill up. If we try to allocate a new route and thus neighbour cache entry, the bog-standard GC attempt the neighbour layer does in ineffective because route entries hold a reference to the existing neighbour entries and GC can only liberate entries with no references. IPV4 already has a way to handle this, by doing a route cache GC in such situations (when neigh attach returns -ENOBUFS). So simply mimick this on the ipv6 side. Tested-by: Eduard Guzovsky <eguzovsky@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04	MAINTAINERS: update sparc maintainer	Sam Ravnborg
	Reflect the current situation where David Miller is the sparc maintainer. I have tried to contact Bill on following adresses: wli@holomorphy.com wlirwin@us.ibm.com with no success and Bill has not been active on the sparclinux mailing list for a long time. As sparc and sparc64 are unified I unified the two entries in the MAINTAINERS file too. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04	sparc: unify ipcbuf.h	Sam Ravnborg
	The ony difference is the size of the mode. sparc has extra padding to compensate for this. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-04	firewire: reorder struct fw_card for better cache efficiency	Stefan Richter
	topology_map is by far the largest member in struct fw_card. Move it to the very end of the struct so that card pointer dereferences have better chances to hit the CPU cache. This requires to increase the topology_map backing store to the size specified in IEEE 1394, i.e. 256 rather than 255 quadlets. Otherwise the topology_map response handler may access invalid memory. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04	firewire: fix resetting of bus manager retry counter	Stefan Richter
	An earlier change, maybe long ago, removed the copying of self_id_count into card->self_id_count. Since then each bus reset cleared card->bm_retries even when it shouldn't. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04	firewire: improve refcounting of fw_card	Jay Fenlason
	Take a reference to the card whenever fw_card_bm_work() is scheduled on that card and release it when the work is done. This allows us to remove the cancel_delayed_work_sync() in fw_core_remove_card(). Signed-off-by: Jay Fenlason <fenlason@redhat.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (patch update)
2009-01-04	firewire: typo in comment	Jay Fenlason
	Signed-off-by: Jay Fenlason <fenlason@redhat.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04	firewire: fix small memory leak at module removal	Stefan Richter
	Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04	firewire: fw-sbp2: remove unnecessary locking	Stefan Richter
	What was I thinking when I added sbp2_set_generation()? Its locking did nothing (except for implicitly providing the necessary barrier between node IDs update and generation update). Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04	ieee1934: dv1394: interrupt enabling/disabling broken on big-endian	Harvey Harrison
	After annotating the frame structs, this was left: drivers/ieee1394/dv1394.c:2113:23: warning: invalid assignment: \|= drivers/ieee1394/dv1394.c:2113:23: left side has type restricted __le32 drivers/ieee1394/dv1394.c:2113:23: right side has type int drivers/ieee1394/dv1394.c:2121:24: warning: invalid assignment: &= drivers/ieee1394/dv1394.c:2121:24: left side has type restricted __le32 drivers/ieee1394/dv1394.c:2121:24: right side has type int drivers/ieee1394/dv1394.c:2123:24: warning: invalid assignment: \|= drivers/ieee1394/dv1394.c:2123:24: left side has type restricted __le32 drivers/ieee1394/dv1394.c:2123:24: right side has type int Which looks like a real bug on a big-endian arch as it would set/clear the wrong bit. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Bill Fink writes: I finally got a chance to test the patch on my kernel, and live DV viewing using xine still worked fine. Although I admit to being mystified how it works both before and after the patch, since the cpu_to_le32() calls that were added should result in byte swapping on PPC that wasn't being done before. I guess that either the code paths involved aren't actually being triggered by my xine DV viewing, or there's some fortuitous palindromic setting of bits. Tested-by: Bill Fink <billfink@mindspring.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04	ieee1394: dv1394: annotate frame input/output structs as little endian	Harvey Harrison
	No Functional changes. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04	ieee1394: eth1394: trivial sparse annotations	Harvey Harrison
	Mostly annotations of ether_type as a be16. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04	ieee1394: mark bus_info_data as a __be32 array	Harvey Harrison
	Two access functions get_max_rom and set_hw_config_rom are changed to take __be32 as well. Only bus_info_data was ever passed in so this is OK. All other uses of bus_info_data treated it as a be32 value already. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04	ieee1394: replace CSR_SET_BUS_INFO_GENERATION macro	Harvey Harrison
	Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04	ieee1394: pcilynx: trivial endian annotation	Harvey Harrison
	bus_info_block was treated as a be32 everywhere, annotate as such. Removes plenty of sparse warnings. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04	ieee1394: ignore nonzero Bus_Info_Block.max_rom, fetch config ROM in quadlets	Stefan Richter
	It is already known that buggy firmwares exist which report a bogus link_spd in their config ROM bus info block. We now got the first report of a bogus max_rom too (Freecom FireWire Hard Drive 1TB, http://bugzilla.kernel.org/show_bug.cgi?id=12206). I suspect other OSs only use quadlet reads to fetch the config ROM, otherwise the firmware authors would have noticed their mistake. Hence limit ieee1394's config ROM fetching routine to quadlets as the safe minimum regardless of what the bus info block says. This will potentially slow the bus reset handling by nodemgr somewhat down. But most existing devices support only quadlet reads anyway, hence there will often be no actual difference to before this change. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04	ieee1394: consolidate uses of IEEE1934_BUSID_MAGIC	Harvey Harrison
	Move the definition out of nodemgr.h and use it in csr.c/pcilynx.c Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04	ieee1394: ohci1394: flush MMIO writes before delay in initialization	Stefan Richter
	and replace busy-wait by msleep. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04	ieee1394: ohci1394: pass error codes from request_irq through	Stefan Richter
	Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04	ieee1394: ohci1394: don't leave interrupts enabled during suspend/resume	Frans Pop
	On my HP 2510p I get the following in dmesg during near the end of most resumes from suspend to RAM: irq 19: nobody cared (try booting with the "irqpoll" option) Pid: 0, comm: swapper Not tainted 2.6.28-rc7 #67 Call Trace: <IRQ> [<ffffffffa00ee9e1>] ? ohci_irq_handler+0x60/0x7e9 [ohci1394] [<ffffffff8026aa4d>] __report_bad_irq+0x38/0x87 [<ffffffff8026abaa>] note_interrupt+0x10e/0x174 [<ffffffff8026b262>] handle_fasteoi_irq+0xa7/0xd1 [<ffffffff8020eb87>] do_IRQ+0x73/0xe4 [<ffffffff8020c626>] ret_from_intr+0x0/0xa <EOI> [<ffffffffa0012606>] ? acpi_idle_enter_bm+0x26b/0x2b2 [processor] [<ffffffffa00125fc>] ? acpi_idle_enter_bm+0x261/0x2b2 [processor] [<ffffffff8024f30f>] ? notifier_call_chain+0x33/0x5b [<ffffffff803b9c64>] ? cpuidle_idle_call+0x8c/0xc4 [<ffffffff8020b312>] ? cpu_idle+0x4a/0x9a [<ffffffff8042c5c8>] ? rest_init+0x5c/0x5e handlers: [<ffffffffa00ee981>] (ohci_irq_handler+0x0/0x7e9 [ohci1394]) Disabling IRQ #19 There also seems to be an interrupt storm during suspend/resume when this happens: 19: 99968 33 IO-APIC-fasteoi ohci1394 This patch gets rid of both issues and makes the resume as a whole significantly faster. Signed-off-by: Frans Pop <elendil@planet.nl> As was pointed out in http://lkml.org/lkml/2008/12/6/127, this does not fix the cause of the interrupt storm. However, since the source of the interrupts could not be determined yet, we make the system at least more usable with this change. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04	ieee1394: mark all hpsb_address_ops instances as const	Stefan Richter
	These are never modified. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-04	ieee1394: replace a GFP_ATOMIC by GFP_KERNEL allocation	Stefan Richter
	All callers of hpsb_register_addrspace() can sleep. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-01-05	module: convert to stop_machine_create/destroy.	Heiko Carstens
	The module code relies on a non-failing stop_machine call. So we create the kstop threads in advance and with that make sure the call won't fail. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-01-05	stop_machine: introduce stop_machine_create/destroy.	Heiko Carstens
	Introduce stop_machine_create/destroy. With this interface subsystems that need a non-failing stop_machine environment can create the stop_machine machine threads before actually calling stop_machine. When the threads aren't needed anymore they can be killed with stop_machine_destroy again. When stop_machine gets called and the threads aren't present they will be created and destroyed automatically. This restores the old behaviour of stop_machine. This patch also converts cpu hotplug to the new interface since it is special: cpu_down calls __stop_machine instead of stop_machine. However the kstop threads will only be created when stop_machine gets called. Changing the code so that the threads would be created automatically on __stop_machine is currently not possible: when __stop_machine gets called we hold cpu_add_remove_lock, which is the same lock that create_rt_workqueue would take. So the workqueue needs to be created before the cpu hotplug code locks cpu_add_remove_lock. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-01-05	parisc: fix module loading failure of large kernel modules	Helge Deller
	On 32bit (and sometimes 64bit) and with big kernel modules like xfs or ipv6 the relocation types R_PARISC_PCREL17F and R_PARISC_PCREL22F may fail to reach their PLT stub if we only create one big stub array for all sections at the beginning of the core or init section. With this patch we now instead add individual PLT stub entries directly in front of the code sections where the stubs are actually called. This reduces the distance between the PCREL location and the stub entry so that the relocations can be fulfilled. While calculating the final layout of the kernel module in memory, the kernel module loader calls arch_mod_section_prepend() to request the to be reserved amount of memory in front of each individual section. Tested with 32- and 64bit kernels. Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-01-05	module: fix module loading failure of large kernel modules for parisc	Helge Deller
	When creating the final layout of a kernel module in memory, allow the module loader to reserve some additional memory in front of a given section. This is currently only needed for the parisc port which needs to put the stub entries there to fulfill the 17/22bit PCREL relocations with large kernel modules like xfs. Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (renamed fn)
2009-01-05	module: fix warning of unused function when !CONFIG_PROC_FS	Jianjun Kong
	Fix this warning: kernel/module.c:824: warning: ‘print_unload_info’ defined but not used print_unload_info() just was used when CONFIG_PROC_FS was defined. This patch mark print_unload_info() inline to solve the problem. Signed-off-by: Jianjun Kong <jianjun@zeuux.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> CC: Ingo Molnar <mingo@elte.hu> CC: Américo Wang <xiyou.wangcong@gmail.com>
2009-01-05	kernel/module.c: compare symbol values when marking symbols as exported in ↵	Tim Abbott
	/proc/kallsyms. When there are two symbols in a module with the same name, one of which is exported, both will be marked as exported in /proc/kallsyms. There aren't any instances of this in the current kernel, but it is easy to construct a simple module with two compilation units that exhibits the problem. $ objdump -j .text -t testmod.ko \| grep foo 00000000 l F .text 00000032 foo 00000080 g F .text 00000001 foo $ sudo insmod testmod.ko $ grep "T foo" /proc/kallsyms c28e8000 T foo [testmod] c28e8080 T foo [testmod] Fix this by comparing the symbol values once we've found the exported symbol table entry matching the symbol name. Tested using Ksplice: $ ksplice-create --patch=this_commit.patch --id=bar . $ sudo ksplice-apply ksplice-bar.tar.gz Done! $ grep "T foo" /proc/kallsyms c28e8080 T foo [testmod] Signed-off-by: Tim Abbott <tabbott@mit.edu> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-01-05	remove CONFIG_KMOD	Johannes Berg
	Now that nothing depends on it any more, remove CONFIG_KMOD. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-01-05	Merge branch 'master' of ↵	James Morris
	git://git.infradead.org/users/pcmoore/lblnet-2.6_next into next
2009-01-04	rtc: add alarm/update irq interfaces	Alessandro Zummo
	Add standard interfaces for alarm/update irqs enabling. Drivers are no more required to implement equivalent ioctl code as rtc-dev will provide it. UIE emulation should now be handled correctly and will work even for those RTC drivers who cannot be configured to do both UIE and AIE. Signed-off-by: Alessandro Zummo <a.zummo@towertech.it> Cc: David Brownell <david-b@pacbell.net> Cc: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04	fs: symlink write_begin allocation context fix	Nick Piggin
	With the write_begin/write_end aops, page_symlink was broken because it could no longer pass a GFP_NOFS type mask into the point where the allocations happened. They are done in write_begin, which would always assume that the filesystem can be entered from reclaim. This bug could cause filesystem deadlocks. The funny thing with having a gfp_t mask there is that it doesn't really allow the caller to arbitrarily tinker with the context in which it can be called. It couldn't ever be GFP_ATOMIC, for example, because it needs to take the page lock. The only thing any callers care about is __GFP_FS anyway, so turn that into a single flag. Add a new flag for write_begin, AOP_FLAG_NOFS. Filesystems can now act on this flag in their write_begin function. Change __grab_cache_page to accept a nofs argument as well, to honour that flag (while we're there, change the name to grab_cache_page_write_begin which is more instructive and does away with random leading underscores). This is really a more flexible way to go in the end anyway -- if a filesystem happens to want any extra allocations aside from the pagecache ones in ints write_begin function, it may now use GFP_KERNEL (rather than GFP_NOFS) for common case allocations (eg. ocfs2_alloc_write_ctxt, for a random example). [kosaki.motohiro@jp.fujitsu.com: fix ubifs] [kosaki.motohiro@jp.fujitsu.com: fix fuse] Signed-off-by: Nick Piggin <npiggin@suse.de> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <stable@kernel.org> [2.6.28.x] Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Cleaned up the calling convention: just pass in the AOP flags untouched to the grab_cache_page_write_begin() function. That just simplifies everybody, and may even allow future expansion of the logic. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04	viafb: fix crashes due to 4k stack overflow	Bruno Prémont
	The function viafb_cursor() uses 2 stack-variables of CURSOR_SIZE bits; CURSOR_SIZE is defined as (8 * 1024). Using up twice 1k on stack is too much for 4k-stack (though it works with 8k-stacks). Make those two variables kzalloc'ed to preserve stack space. Also merge the whole lot of local struct's in viafb_ioctl into a union so the stack usage gets minimized here as well. (struct's are only accessed in their indicidual IOCTL case) This second part is only compile-tested as I know of no userspace app using the IOCTLs. Signed-off-by: Bruno Prémont <bonbons@linux-vserver.org> Cc: <JosephChan@via.com.tw> Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04	fs: introduce bgl_lock_ptr()	Pekka Enberg
	As suggested by Andreas Dilger, introduce a bgl_lock_ptr() helper in <linux/blockgroup_lock.h> and add separate sb_bgl_lock() helpers to filesystem specific header files to break the hidden dependency to struct ext[234]_sb_info. Also, while at it, convert the macros to static inlines to try make up for all the times I broke Andrew Morton's tree. Acked-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04	spi.h uses/needs device.h	Randy Dunlap
	Include header files as used/needed: In file included from drivers/leds/leds-dac124s085.c:16: include/linux/spi/spi.h:66: error: field 'dev' has incomplete type include/linux/spi/spi.h: In function 'to_spi_device': include/linux/spi/spi.h:100: warning: type defaults to 'int' in declaration of '__mptr' ... Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04	vmalloc.c: fix flushing in vmap_page_range()	Adam Lackorzynski
	The flush_cache_vmap in vmap_page_range() is called with the end of the range twice. The following patch fixes this for me. Signed-off-by: Adam Lackorzynski <adam@os.inf.tu-dresden.de> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04	cgroups: fix a race between cgroup_clone and umount	Li Zefan
	The race is calling cgroup_clone() while umounting the ns cgroup subsys, and thus cgroup_clone() might access invalid cgroup_fs, or kill_sb() is called after cgroup_clone() created a new dir in it. The BUG I triggered is BUG_ON(root->number_of_cgroups != 1); ------------[ cut here ]------------ kernel BUG at kernel/cgroup.c:1093! invalid opcode: 0000 [#1] SMP ... Process umount (pid: 5177, ti=e411e000 task=e40c4670 task.ti=e411e000) ... Call Trace: [<c0493df7>] ? deactivate_super+0x3f/0x51 [<c04a3600>] ? mntput_no_expire+0xb3/0xdd [<c04a3ab2>] ? sys_umount+0x265/0x2ac [<c04a3b06>] ? sys_oldumount+0xd/0xf [<c0403911>] ? sysenter_do_call+0x12/0x31 ... EIP: [<c0456e76>] cgroup_kill_sb+0x23/0xe0 SS:ESP 0068:e411ef2c ---[ end trace c766c1be3bf944ac ]--- Cc: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: "Serge E. Hallyn" <serue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04	audit: validate comparison operations, store them in sane form	Al Viro
	Don't store the field->op in the messy (and very inconvenient for e.g. audit_comparator()) form; translate to dense set of values and do full validation of userland-submitted value while we are at it. ->audit_init_rule() and ->audit_match_rule() get new values now; in-tree instances updated. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-04	clean up audit_rule_{add,del} a bit	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-04	make sure that filterkey of task,always rules is reported	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-04	audit rules ordering, part 2	Al Viro
	Fix the actual rule listing; add per-type lists _not_ used for matching, with all exit,... sitting on one such list. Simplifies "do something for all rules" logics, while we are at it... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-04	fixing audit rule ordering mess, part 1	Al Viro
	Problem: ordering between the rules on exit chain is currently lost; all watch and inode rules are listed after everything else _and_ exit,never on one kind doesn't stop exit,always on another from being matched. Solution: assign priorities to rules, keep track of the current highest-priority matching rule and its result (always/never). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-04	audit_update_lsm_rules() misses the audit_inode_hash[] ones	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-04	sanitize audit_log_capset()	Al Viro
	* no allocations * return void * don't duplicate checked for dummy context Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>