summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2010-08-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (39 commits) random: Reorder struct entropy_store to remove padding on 64bits padata: update API documentation padata: Remove padata_get_cpumask crypto: pcrypt - Update pcrypt cpumask according to the padata cpumask notifier crypto: pcrypt - Rename pcrypt_instance padata: Pass the padata cpumasks to the cpumask_change_notifier chain padata: Rearrange set_cpumask functions padata: Rename padata_alloc functions crypto: pcrypt - Dont calulate a callback cpu on empty callback cpumask padata: Check for valid cpumasks padata: Allocate cpumask dependend recources in any case padata: Fix cpu index counting crypto: geode_aes - Convert pci_table entries to PCI_VDEVICE (if PCI_ANY_ID is used) pcrypt: Added sysfs interface to pcrypt padata: Added sysfs primitives to padata subsystem padata: Make two separate cpumasks padata: update documentation padata: simplify serialization mechanism padata: make padata_do_parallel to return zero on success padata: Handle empty padata cpumasks ...
2010-08-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1443 commits) phy/marvell: add 88ec048 support igb: Program MDICNFG register prior to PHY init e1000e: correct MAC-PHY interconnect register offset for 82579 hso: Add new product ID can: Add driver for esd CAN-USB/2 device l2tp: fix export of header file for userspace can-raw: Fix skb_orphan_try handling Revert "net: remove zap_completion_queue" net: cleanup inclusion phy/marvell: add 88e1121 interface mode support u32: negative offset fix net: Fix a typo from "dev" to "ndev" igb: Use irq_synchronize per vector when using MSI-X ixgbevf: fix null pointer dereference due to filter being set for VLAN 0 e1000e: Fix irq_synchronize in MSI-X case e1000e: register pm_qos request on hardware activation ip_fragment: fix subtracting PPPOE_SES_HLEN from mtu twice net: Add getsockopt support for TCP thin-streams cxgb4: update driver version cxgb4: add new PCI IDs ... Manually fix up conflicts in: - drivers/net/e1000e/netdev.c: due to pm_qos registration infrastructure changes - drivers/net/phy/marvell.c: conflict between adding 88ec048 support and cleaning up the IDs - drivers/net/wireless/ipw2x00/ipw2100.c: trivial ipw2100_pm_qos_req conflict (registration change vs marking it static)
2010-08-04CRED: Fix RCU warning due to previous patch fixing __task_cred()'s checksDavid Howells
Commit 8f92054e7ca1 ("CRED: Fix __task_cred()'s lockdep check and banner comment") fixed the lockdep checks on __task_cred(). This has shown up a place in the signalling code where a lock should be held - namely that check_kill_permission() requires its callers to hold the RCU lock. Fix group_send_sig_info() to get the RCU read lock around its call to check_kill_permission(). Without this patch, the following warning can occur: =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- kernel/signal.c:660 invoked rcu_dereference_check() without protection! ... Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM / Runtime: Add runtime PM statistics (v3) PM / Runtime: Make runtime_status attribute not debug-only (v. 2) PM: Do not use dynamically allocated objects in pm_wakeup_event() PM / Suspend: Fix ordering of calls in suspend error paths PM / Hibernate: Fix snapshot error code path PM / Hibernate: Fix hibernation_platform_enter() pm_qos: Get rid of the allocation in pm_qos_add_request() pm_qos: Reimplement using plists plist: Add plist_last PM: Make it possible to avoid races between wakeup and system sleep PNPACPI: Add support for remote wakeup PM: describe kernel policy regarding wakeup defaults (v. 2) PM / Hibernate: Fix typos in comments in kernel/power/swap.c
2010-08-03timer: add on-stack deferrable timer interfacesJesse Barnes
In some cases (for instance with kernel threads) it may be desireable to use on-stack deferrable timers to get their power saving benefits. Add interfaces to support this for the IPS driver. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Matthew Garrett <mjg@redhat.com>
2010-07-31padata: Remove padata_get_cpumaskSteffen Klassert
A function that copies the padata cpumasks to a user buffer is a bit error prone. The cpumask can change any time so we can't be sure to have the right cpumask when using this function. A user who is interested in the padata cpumasks should register to the padata cpumask notifier chain instead. Users of padata_get_cpumask are already updated, so we can remove it. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-31padata: Pass the padata cpumasks to the cpumask_change_notifier chainSteffen Klassert
We pass a pointer to the new padata cpumasks to the cpumask_change_notifier chain. So users can access the cpumasks without the need of an extra padata_get_cpumask function. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-31padata: Rearrange set_cpumask functionsSteffen Klassert
padata_set_cpumask needs to be protected by a lock. We make __padata_set_cpumasks unlocked and static. So this function can be used by the exported and locked padata_set_cpumask and padata_set_cpumasks functions. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-31padata: Rename padata_alloc functionsSteffen Klassert
We rename padata_alloc to padata_alloc_possible because this function allocates a padata_instance and uses the cpu_possible mask for parallel and serial workers. Also we rename __padata_alloc to padata_alloc to avoid to export underlined functions. Underlined functions are considered to be private to padata. Users are updated accordingly. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-29CRED: Fix get_task_cred() and task_state() to not resurrect dead credentialsDavid Howells
It's possible for get_task_cred() as it currently stands to 'corrupt' a set of credentials by incrementing their usage count after their replacement by the task being accessed. What happens is that get_task_cred() can race with commit_creds(): TASK_1 TASK_2 RCU_CLEANER -->get_task_cred(TASK_2) rcu_read_lock() __cred = __task_cred(TASK_2) -->commit_creds() old_cred = TASK_2->real_cred TASK_2->real_cred = ... put_cred(old_cred) call_rcu(old_cred) [__cred->usage == 0] get_cred(__cred) [__cred->usage == 1] rcu_read_unlock() -->put_cred_rcu() [__cred->usage == 1] panic() However, since a tasks credentials are generally not changed very often, we can reasonably make use of a loop involving reading the creds pointer and using atomic_inc_not_zero() to attempt to increment it if it hasn't already hit zero. If successful, we can safely return the credentials in the knowledge that, even if the task we're accessing has released them, they haven't gone to the RCU cleanup code. We then change task_state() in procfs to use get_task_cred() rather than calling get_cred() on the result of __task_cred(), as that suffers from the same problem. Without this change, a BUG_ON in __put_cred() or in put_cred_rcu() can be tripped when it is noticed that the usage count is not zero as it ought to be, for example: kernel BUG at kernel/cred.c:168! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/kernel/mm/ksm/run CPU 0 Pid: 2436, comm: master Not tainted 2.6.33.3-85.fc13.x86_64 #1 0HR330/OptiPlex 745 RIP: 0010:[<ffffffff81069881>] [<ffffffff81069881>] __put_cred+0xc/0x45 RSP: 0018:ffff88019e7e9eb8 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff880161514480 RCX: 00000000ffffffff RDX: 00000000ffffffff RSI: ffff880140c690c0 RDI: ffff880140c690c0 RBP: ffff88019e7e9eb8 R08: 00000000000000d0 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000040 R12: ffff880140c690c0 R13: ffff88019e77aea0 R14: 00007fff336b0a5c R15: 0000000000000001 FS: 00007f12f50d97c0(0000) GS:ffff880007400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8f461bc000 CR3: 00000001b26ce000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process master (pid: 2436, threadinfo ffff88019e7e8000, task ffff88019e77aea0) Stack: ffff88019e7e9ec8 ffffffff810698cd ffff88019e7e9ef8 ffffffff81069b45 <0> ffff880161514180 ffff880161514480 ffff880161514180 0000000000000000 <0> ffff88019e7e9f28 ffffffff8106aace 0000000000000001 0000000000000246 Call Trace: [<ffffffff810698cd>] put_cred+0x13/0x15 [<ffffffff81069b45>] commit_creds+0x16b/0x175 [<ffffffff8106aace>] set_current_groups+0x47/0x4e [<ffffffff8106ac89>] sys_setgroups+0xf6/0x105 [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b Code: 48 8d 71 ff e8 7e 4e 15 00 85 c0 78 0b 8b 75 ec 48 89 df e8 ef 4a 15 00 48 83 c4 18 5b c9 c3 55 8b 07 8b 07 48 89 e5 85 c0 74 04 <0f> 0b eb fe 65 48 8b 04 25 00 cc 00 00 48 3b b8 58 04 00 00 75 RIP [<ffffffff81069881>] __put_cred+0xc/0x45 RSP <ffff88019e7e9eb8> ---[ end trace df391256a100ebdd ]--- Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-28cgroups: Add an API to attach a task to current task's cgroupSridhar Samudrala
Add a new kernel API to attach a task to current task's cgroup in all the active hierarchies. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Paul Menage <menage@google.com> Acked-by: Li Zefan <lizf@cn.fujitsu.com>
2010-07-27dynamic debug: move ddebug_remove_module() down into free_module()Jason Baron
The command echo "file ec.c +p" >/sys/kernel/debug/dynamic_debug/control causes an oops. Move the call to ddebug_remove_module() down into free_module(). In this way it should be called from all error paths. Currently, we are missing the remove if the module init routine fails. Signed-off-by: Jason Baron <jbaron@redhat.com> Reported-by: Thomas Renninger <trenn@suse.de> Tested-by: Thomas Renninger <trenn@suse.de> Cc: <stable@kernel.org> [2.6.32+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-26padata: Check for valid cpumasksSteffen Klassert
Now that we allow to change the cpumasks from userspace, we have to check for valid cpumasks in padata_do_parallel. This patch adds the necessary check. This fixes a division by zero crash if the parallel cpumask contains no active cpu. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-26padata: Allocate cpumask dependend recources in any caseSteffen Klassert
The cpumask separation work assumes the cpumask dependend recources present regardless of valid or invalid cpumasks. With this patch we allocate the cpumask dependend recources in any case. This fixes two NULL pointer dereference crashes in padata_replace and in padata_get_cpumask. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-26padata: Fix cpu index countingSteffen Klassert
The counting of the cpu index got lost with a recent commit. This patch restores it. This fixes a hang of the parallel worker threads on cpu hotplug. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-21sysrq,kdb: Use __handle_sysrq() for kdb's sysrq functionJason Wessel
The kdb code should not toggle the sysrq state in case an end user wants to try and resume the normal kernel execution. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2010-07-21debug_core,kdb: fix kgdb_connected bit set in the wrong placeJason Wessel
Immediately following an exit from the kdb shell the kgdb_connected variable should be set to zero, unless there are breakpoints planted. If the kgdb_connected variable is not zeroed out with kdb, it is impossible to turn off kdb. This patch is merely a work around for now, the real fix will check for the breakpoints. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-07-21Fix merge regression from external kdb to upstream kdbJason Wessel
In the process of merging kdb to the mainline, the kdb lsmod command stopped printing the base load address of kernel modules. This is needed for using kdb in conjunction with external tools such as gdb. Simply restore the functionality by adding a kdb_printf for the base load address of the kernel modules. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-07-21repair gdbstub to match the gdbserial protocol specificationJason Wessel
The gdbserial protocol handler should return an empty packet instead of an error string when ever it responds to a command it does not implement. The problem cases come from a debugger client sending qTBuffer, qTStatus, qSearch, qSupported. The incorrect response from the gdbstub leads the debugger clients to not function correctly. Recent versions of gdb will not detach correctly as a result of this behavior. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
2010-07-21kdb: break out of kdb_ll() when command is terminatedMartin Hicks
Without this patch the "ll" linked-list traversal command won't terminate when you hit q/Q. Signed-off-by: Martin Hicks <mort@sgi.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-07-20drop_monitor: convert some kfree_skb call sites to consume_skbNeil Horman
Convert a few calls from kfree_skb to consume_skb Noticed while I was working on dropwatch that I was detecting lots of internal skb drops in several places. While some are legitimate, several were not, freeing skbs that were at the end of their life, rather than being discarded due to an error. This patch converts those calls sites from using kfree_skb to consume_skb, which quiets the in-kernel drop_monitor code from detecting them as drops. Tested successfully by myself Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-19kmemleak: Add support for NO_BOOTMEM configurationsCatalin Marinas
With commits 08677214 and 59be5a8e, alloc_bootmem()/free_bootmem() and friends use the early_res functions for memory management when NO_BOOTMEM is enabled. This patch adds the kmemleak calls in the corresponding code paths for bootmem allocations. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: stable@kernel.org
2010-07-19padata: Added sysfs primitives to padata subsystemDan Kruchinin
Added sysfs primitives to padata subsystem. Now API user may embedded kobject each padata instance contains into any sysfs hierarchy. For now padata sysfs interface provides only two objects: serial_cpumask [RW] - cpumask for serial workers parallel_cpumask [RW] - cpumask for parallel workers Signed-off-by: Dan Kruchinin <dkruchinin@acm.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-19padata: Make two separate cpumasksDan Kruchinin
The aim of this patch is to make two separate cpumasks for padata parallel and serial workers respectively. It allows user to make more thin and sophisticated configurations of padata framework. For example user may bind parallel and serial workers to non-intersecting CPU groups to gain better performance. Also each padata instance has notifiers chain for its cpumasks now. If either parallel or serial or both masks were changed all interested subsystems will get notification about that. It's especially useful if padata user uses algorithm for callback CPU selection according to serial cpumask. Signed-off-by: Dan Kruchinin <dkruchinin@acm.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-19PM / Suspend: Fix ordering of calls in suspend error pathsRafael J. Wysocki
The ACPI suspend code calls suspend_nvs_free() at a wrong place, which may lead to a memory leak if there's an error executing acpi_pm_prepare(), because acpi_pm_finish() will not be called in that case. However, the root cause of this problem is the apparently confusing ordering of calls in suspend error paths that needs to be fixed. In addition to that, fix a typo in a label name in suspend.c. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Len Brown <len.brown@intel.com>
2010-07-19PM / Hibernate: Fix snapshot error code pathRafael J. Wysocki
There is an inconsistency between hibernation_platform_enter() and hibernation_snapshot(), because the latter calls hibernation_ops->end() after failing hibernation_ops->begin(), while the former doesn't do that. Make hibernation_snapshot() behave in the same way as hibernation_platform_enter() in that respect. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Len Brown <len.brown@intel.com>
2010-07-19PM / Hibernate: Fix hibernation_platform_enter()Rafael J. Wysocki
The hibernation_platform_enter() function calls dpm_suspend_noirq() instead of dpm_resume_noirq() by mistake. Fix this. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Len Brown <len.brown@intel.com>
2010-07-19pm_qos: Get rid of the allocation in pm_qos_add_request()James Bottomley
All current users of pm_qos_add_request() have the ability to supply the memory required by the pm_qos routines, so make them do this and eliminate the kmalloc() with pm_qos_add_request(). This has the double benefit of making the call never fail and allowing it to be called from atomic context. Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: mark gross <markgross@thegnar.org> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-07-19pm_qos: Reimplement using plistsJames Bottomley
A lot of the pm_qos extremal value handling is really duplicating what a priority ordered list does, just in a less efficient fashion. Simply redoing the implementation in terms of a plist gets rid of a lot of this junk (although there are several other strange things that could do with tidying up, like pm_qos_request_list has to carry the pm_qos_class with every node, simply because it doesn't get passed in to pm_qos_update_request even though every caller knows full well what parameter it's updating). I think this redo is a win independent of android, so we should do something like this now. Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: mark gross <markgross@thegnar.org> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-07-19PM: Make it possible to avoid races between wakeup and system sleepRafael J. Wysocki
One of the arguments during the suspend blockers discussion was that the mainline kernel didn't contain any mechanisms making it possible to avoid races between wakeup and system suspend. Generally, there are two problems in that area. First, if a wakeup event occurs exactly when /sys/power/state is being written to, it may be delivered to user space right before the freezer kicks in, so the user space consumer of the event may not be able to process it before the system is suspended. Second, if a wakeup event occurs after user space has been frozen, it is not generally guaranteed that the ongoing transition of the system into a sleep state will be aborted. To address these issues introduce a new global sysfs attribute, /sys/power/wakeup_count, associated with a running counter of wakeup events and three helper functions, pm_stay_awake(), pm_relax(), and pm_wakeup_event(), that may be used by kernel subsystems to control the behavior of this attribute and to request the PM core to abort system transitions into a sleep state already in progress. The /sys/power/wakeup_count file may be read from or written to by user space. Reads will always succeed (unless interrupted by a signal) and return the current value of the wakeup events counter. Writes, however, will only succeed if the written number is equal to the current value of the wakeup events counter. If a write is successful, it will cause the kernel to save the current value of the wakeup events counter and to abort the subsequent system transition into a sleep state if any wakeup events are reported after the write has returned. [The assumption is that before writing to /sys/power/state user space will first read from /sys/power/wakeup_count. Next, user space consumers of wakeup events will have a chance to acknowledge or veto the upcoming system transition to a sleep state. Finally, if the transition is allowed to proceed, /sys/power/wakeup_count will be written to and if that succeeds, /sys/power/state will be written to as well. Still, if any wakeup events are reported to the PM core by kernel subsystems after that point, the transition will be aborted.] Additionally, put a wakeup events counter into struct dev_pm_info and make these per-device wakeup event counters available via sysfs, so that it's possible to check the activity of various wakeup event sources within the kernel. To illustrate how subsystems can use pm_wakeup_event(), make the low-level PCI runtime PM wakeup-handling code use it. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: markgross <markgross@thegnar.org> Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
2010-07-19PM / Hibernate: Fix typos in comments in kernel/power/swap.cCesar Eduardo Barros
There are a few typos in kernel/power/swap.c. Fix them. Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.net> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-07-14padata: simplify serialization mechanismSteffen Klassert
We count the number of processed objects on a percpu basis, so we need to go through all the percpu reorder queues to calculate the sequence number of the next object that needs serialization. This patch changes this to count the number of processed objects global. So we can calculate the sequence number and the percpu reorder queue of the next object that needs serialization without searching through the percpu reorder queues. This avoids some accesses to memory of foreign cpus. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-14padata: make padata_do_parallel to return zero on successSteffen Klassert
To return -EINPROGRESS on success in padata_do_parallel was considered to be odd. This patch changes this to return zero on success. Also the only user of padata, pcrypt is adapted to convert a return of zero to -EINPROGRESS within the crypto layer. This also removes the pcrypt fallback if padata_do_parallel was called on a not running padata instance as we can't handle it anymore. This fallback was unused, so it's save to remove it. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-14padata: Handle empty padata cpumasksSteffen Klassert
This patch fixes a bug when the padata cpumask does not intersect with the active cpumask. In this case we get a division by zero in padata_alloc_pd and we end up with a useless padata instance. Padata can end up with an empty cpumask for two reasons: 1. A user removed the last cpu that belongs to the padata cpumask and the active cpumask. 2. The last cpu that belongs to the padata cpumask and the active cpumask goes offline. We introduce a function padata_validate_cpumask to check if the padata cpumask does intersect with the active cpumask. If the cpumasks do not intersect we mark the instance as invalid, so it can't be used. We do not allocate the cpumask dependend recources in this case. This fixes the division by zero and keeps the padate instance in a consistent state. It's not possible to trigger this bug by now because the only padata user, pcrypt uses always the possible cpumask. Reported-by: Dan Kruchinin <dkruchinin@acm.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-14padata: Block until the instance is unused on stopSteffen Klassert
This patch makes padata_stop to block until the padata instance is unused. Also we split padata_stop to a locked and a unlocked version. This is in preparation to be able to change the cpumask after a call to patata stop. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-14padata: Check for valid padata instance on startSteffen Klassert
This patch introduces the PADATA_INVALID flag which is checked on padata start. This will be used to mark a padata instance as invalid, if the padata cpumask does not intersect with the active cpumask. we change padata_start to return an error if the PADATA_INVALID is set. Also we adapt the only padata user, pcrypt to this change. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-07-04module: initialize module dynamic debug laterYehuda Sadeh
We should initialize the module dynamic debug datastructures only after determining that the module is not loaded yet. This fixes a bug that introduced in 2.6.35-rc2, where when a trying to load a module twice, we also load it's dynamic printing data twice which causes all sorts of nasty issues. Also handle the dynamic debug cleanup later on failure. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (removed a #ifdef) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-02Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Cure nr_iowait_cpu() users init: Fix comment init, sched: Fix race between init and kthreadd
2010-07-01sched: Cure nr_iowait_cpu() usersPeter Zijlstra
Commit 0224cf4c5e (sched: Intoduce get_cpu_iowait_time_us()) broke things by not making sure preemption was indeed disabled by the callers of nr_iowait_cpu() which took the iowait value of the current cpu. This resulted in a heap of preempt warnings. Cure this by making nr_iowait_cpu() take a cpu number and fix up the callers to pass in the right number. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Maxim Levitsky <maximlevitsky@gmail.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Jiri Slaby <jslaby@suse.cz> Cc: linux-pm@lists.linux-foundation.org LKML-Reference: <1277968037.1868.120.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-30futex: futex_find_get_task remove credentails checkMichal Hocko
futex_find_get_task is currently used (through lookup_pi_state) from two contexts, futex_requeue and futex_lock_pi_atomic. None of the paths looks it needs the credentials check, though. Different (e)uids shouldn't matter at all because the only thing that is important for shared futex is the accessibility of the shared memory. The credentail check results in glibc assert failure or process hang (if glibc is compiled without assert support) for shared robust pthread mutex with priority inheritance if a process tries to lock already held lock owned by a process with a different euid: pthread_mutex_lock.c:312: __pthread_mutex_lock_full: Assertion `(-(e)) != 3 || !robust' failed. The problem is that futex_lock_pi_atomic which is called when we try to lock already held lock checks the current holder (tid is stored in the futex value) to get the PI state. It uses lookup_pi_state which in turn gets task struct from futex_find_get_task. ESRCH is returned either when the task is not found or if credentials check fails. futex_lock_pi_atomic simply returns if it gets ESRCH. glibc code, however, doesn't expect that robust lock returns with ESRCH because it should get either success or owner died. Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: Darren Hart <dvhltc@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Nick Piggin <npiggin@suse.de> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-06-29kexec: fix Oops in crash_shrink_memory()Pavan Naregundi
When crashkernel is not enabled, "echo 0 > /sys/kernel/kexec_crash_size" OOPSes the kernel in crash_shrink_memory. This happens when crash_shrink_memory tries to release the 'crashk_res' resource which are not reserved. Also value of "/sys/kernel/kexec_crash_size" shows as 1, which should be 0. This patch fixes the OOPS in crash_shrink_memory and shows "/sys/kernel/kexec_crash_size" as 0 when crash kernel memory is not reserved. Signed-off-by: Pavan Naregundi <pavan@linux.vnet.ibm.com> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Simon Horman <horms@verge.net.au> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-06-28Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: Fix undeclared ENOSYS in include/linux/tracepoint.h perf record: prevent kill(0, SIGTERM); perf session: Remove threads from tree on PERF_RECORD_EXIT perf/tracing: Fix regression of perf losing kprobe events perf_events: Fix Intel Westmere event constraints perf record: Don't call newt functions when not initialized
2010-06-28Merge branch 'irq-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq: Deal with desc->set_type() changing desc->chip
2010-06-28Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Prevent compiler from optimising the sched_avg_update() loop sched: Fix over-scheduling bug sched: Fix PROVE_RCU vs cpu_cgroup
2010-06-28Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: nohz: Fix nohz ratelimit
2010-06-28Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: silence PROVE_RCU in sched_fork() idr: fix RCU lockdep splat in idr_get_next() rcu: apply RCU protection to wake_affine()
2010-06-25sched: Prevent compiler from optimising the sched_avg_update() loopWill Deacon
GCC 4.4.1 on ARM has been observed to replace the while loop in sched_avg_update with a call to uldivmod, resulting in the following build failure at link-time: kernel/built-in.o: In function `sched_avg_update': kernel/sched.c:1261: undefined reference to `__aeabi_uldivmod' kernel/sched.c:1261: undefined reference to `__aeabi_uldivmod' make: *** [.tmp_vmlinux1] Error 1 This patch introduces a fake data hazard to the loop body to prevent the compiler optimising the loop away. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-23sched: silence PROVE_RCU in sched_fork()Peter Zijlstra
Because cgroup_fork() is ran before sched_fork() [ from copy_process() ] and the child's pid is not yet visible the child is pinned to its cgroup. Therefore we can silence this warning. A nicer solution would be moving cgroup_fork() to right after dup_task_struct() and exclude PF_STARTING from task_subsys_state(). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-06-23rcu: apply RCU protection to wake_affine()Daniel J Blueman
The task_group() function returns a pointer that must be protected by either RCU, the ->alloc_lock, or the cgroup lock (see the rcu_dereference_check() in task_subsys_state(), which is invoked by task_group()). The wake_affine() function currently does none of these, which means that a concurrent update would be within its rights to free the structure returned by task_group(). Because wake_affine() uses this structure only to compute load-balancing heuristics, there is no reason to acquire either of the two locks. Therefore, this commit introduces an RCU read-side critical section that starts before the first call to task_group() and ends after the last use of the "tg" pointer returned from task_group(). Thanks to Li Zefan for pointing out the need to extend the RCU read-side critical section from that proposed by the original patch. Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-06-18sched: Fix over-scheduling bugAlex,Shi
Commit e70971591 ("sched: Optimize unused cgroup configuration") introduced an imbalanced scheduling bug. If we do not use CGROUP, function update_h_load won't update h_load. When the system has a large number of tasks far more than logical CPU number, the incorrect cfs_rq[cpu]->h_load value will cause load_balance() to pull too many tasks to the local CPU from the busiest CPU. So the busiest CPU keeps going in a round robin. That will hurt performance. The issue was found originally by a scientific calculation workload that developed by Yanmin. With that commit, the workload performance drops about 40%. CPU before after 00 : 2 : 7 01 : 1 : 7 02 : 11 : 6 03 : 12 : 7 04 : 6 : 6 05 : 11 : 7 06 : 10 : 6 07 : 12 : 7 08 : 11 : 6 09 : 12 : 6 10 : 1 : 6 11 : 1 : 6 12 : 6 : 6 13 : 2 : 6 14 : 2 : 6 15 : 1 : 6 Reviewed-by: Yanmin zhang <yanmin.zhang@intel.com> Signed-off-by: Alex Shi <alex.shi@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1276754893.9452.5442.camel@debian> Signed-off-by: Ingo Molnar <mingo@elte.hu>