summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2011-10-25Merge branch 'pm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm * 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (63 commits) PM / Clocks: Remove redundant NULL checks before kfree() PM / Documentation: Update docs about suspend and CPU hotplug ACPI / PM: Add Sony VGN-FW21E to nonvs blacklist. ARM: mach-shmobile: sh7372 A4R support (v4) ARM: mach-shmobile: sh7372 A3SP support (v4) PM / Sleep: Mark devices involved in wakeup signaling during suspend PM / Hibernate: Improve performance of LZO/plain hibernation, checksum image PM / Hibernate: Do not initialize static and extern variables to 0 PM / Freezer: Make fake_signal_wake_up() wake TASK_KILLABLE tasks too PM / Hibernate: Add resumedelay kernel param in addition to resumewait MAINTAINERS: Update linux-pm list address PM / ACPI: Blacklist Vaio VGN-FW520F machine known to require acpi_sleep=nonvs PM / ACPI: Blacklist Sony Vaio known to require acpi_sleep=nonvs PM / Hibernate: Add resumewait param to support MMC-like devices as resume file PM / Hibernate: Fix typo in a kerneldoc comment PM / Hibernate: Freeze kernel threads after preallocating memory PM: Update the policy on default wakeup settings PM / VT: Cleanup #if defined uglyness and fix compile error PM / Suspend: Off by one in pm_suspend() PM / Hibernate: Include storage keys in hibernation image on s390 ...
2011-10-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1745 commits) dp83640: free packet queues on remove dp83640: use proper function to free transmit time stamping packets ipv6: Do not use routes from locally generated RAs |PATCH net-next] tg3: add tx_dropped counter be2net: don't create multiple RX/TX rings in multi channel mode be2net: don't create multiple TXQs in BE2 be2net: refactor VF setup/teardown code into be_vf_setup/clear() be2net: add vlan/rx-mode/flow-control config to be_setup() net_sched: cls_flow: use skb_header_pointer() ipv4: avoid useless call of the function check_peer_pmtu TCP: remove TCP_DEBUG net: Fix driver name for mdio-gpio.c ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT rtnetlink: Add missing manual netlink notification in dev_change_net_namespaces ipv4: fix ipsec forward performance regression jme: fix irq storm after suspend/resume route: fix ICMP redirect validation net: hold sock reference while processing tx timestamps tcp: md5: add more const attributes Add ethtool -g support to virtio_net ... Fix up conflicts in: - drivers/net/Kconfig: The split-up generated a trivial conflict with removal of a stale reference to Documentation/networking/net-modules.txt. Remove it from the new location instead. - fs/sysfs/dir.c: Fairly nasty conflicts with the sysfs rb-tree usage, conflicting with Eric Biederman's changes for tagged directories.
2011-10-25Merge branch 'usb-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb * 'usb-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (260 commits) usb: renesas_usbhs: fixup inconsistent return from usbhs_pkt_push() usb/isp1760: Allow to optionally trigger low-level chip reset via GPIOLIB. USB: gadget: midi: memory leak in f_midi_bind_config() USB: gadget: midi: fix range check in f_midi_out_open() QE/FHCI: fixed the CONTROL bug usb: renesas_usbhs: tidyup for smatch warnings USB: Fix USB Kconfig dependency problem on 85xx/QoirQ platforms EHCI: workaround for MosChip controller bug usb: gadget: file_storage: fix race on unloading USB: ftdi_sio.c: Use ftdi async_icount structure for TIOCMIWAIT, as in other drivers USB: ftdi_sio.c:Fill MSR fields of the ftdi async_icount structure USB: ftdi_sio.c: Fill LSR fields of the ftdi async_icount structure USB: ftdi_sio.c:Fill TX field of the ftdi async_icount structure USB: ftdi_sio.c: Fill the RX field of the ftdi async_icount structure USB: ftdi_sio.c: Basic icount infrastructure for ftdi_sio usb/isp1760: Let OF bindings depend on general CONFIG_OF instead of PPC_OF . USB: ftdi_sio: Support TI/Luminary Micro Stellaris BD-ICDI Board USB: Fix runtime wakeup on OHCI xHCI/USB: Make xHCI driver have a BOS descriptor. usb: gadget: add new usb gadget for ACM and mass storage ...
2011-10-25Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (59 commits) MAINTAINERS: linux-m32r is moderated for non-subscribers linux@lists.openrisc.net is moderated for non-subscribers Drop default from "DM365 codec select" choice parisc: Kconfig: cleanup Kernel page size default Kconfig: remove redundant CONFIG_ prefix on two symbols cris: remove arch/cris/arch-v32/lib/nand_init.S microblaze: add missing CONFIG_ prefixes h8300: drop puzzling Kconfig dependencies MAINTAINERS: microblaze-uclinux@itee.uq.edu.au is moderated for non-subscribers tty: drop superfluous dependency in Kconfig ARM: mxc: fix Kconfig typo 'i.MX51' Fix file references in Kconfig files aic7xxx: fix Kconfig references to READMEs Fix file references in drivers/ide/ thinkpad_acpi: Fix printk typo 'bluestooth' bcmring: drop commented out line in Kconfig btmrvl_sdio: fix typo 'btmrvl_sdio_sd6888' doc: raw1394: Trivial typo fix CIFS: Don't free volume_info->UNC until we are entirely done with it. treewide: Correct spelling of successfully in comments ...
2011-10-25Merge branch 'next' of git://selinuxproject.org/~jmorris/linux-securityLinus Torvalds
* 'next' of git://selinuxproject.org/~jmorris/linux-security: (95 commits) TOMOYO: Fix incomplete read after seek. Smack: allow to access /smack/access as normal user TOMOYO: Fix unused kernel config option. Smack: fix: invalid length set for the result of /smack/access Smack: compilation fix Smack: fix for /smack/access output, use string instead of byte Smack: domain transition protections (v3) Smack: Provide information for UDS getsockopt(SO_PEERCRED) Smack: Clean up comments Smack: Repair processing of fcntl Smack: Rule list lookup performance Smack: check permissions from user space (v2) TOMOYO: Fix quota and garbage collector. TOMOYO: Remove redundant tasklist_lock. TOMOYO: Fix domain transition failure warning. TOMOYO: Remove tomoyo_policy_memory_lock spinlock. TOMOYO: Simplify garbage collector. TOMOYO: Fix make namespacecheck warnings. target: check hex2bin result encrypted-keys: check hex2bin result ...
2011-10-24Merge branch 'master' of ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2011-10-18cputimer: Cure lock inversionPeter Zijlstra
There's a lock inversion between the cputimer->lock and rq->lock; notably the two callchains involved are: update_rlimit_cpu() sighand->siglock set_process_cpu_timer() cpu_timer_sample_group() thread_group_cputimer() cputimer->lock thread_group_cputime() task_sched_runtime() ->pi_lock rq->lock scheduler_tick() rq->lock task_tick_fair() update_curr() account_group_exec() cputimer->lock Where the first one is enabling a CLOCK_PROCESS_CPUTIME_ID timer, and the second one is keeping up-to-date. This problem was introduced by e8abccb7193 ("posix-cpu-timers: Cure SMP accounting oddities"). Cure the problem by removing the cputimer->lock and rq->lock nesting, this leaves concurrent enablers doing duplicate work, but the time wasted should be on the same order otherwise wasted spinning on the lock and the greater-than assignment filter should ensure we preserve monotonicity. Reported-by: Dave Jones <davej@redhat.com> Reported-by: Simon Kirby <sim@hostway.ca> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Link: http://lkml.kernel.org/r/1318928713.21167.4.camel@twins Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-10-17Avoid using variable-length arrays in kernel/sys.cLinus Torvalds
The size is always valid, but variable-length arrays generate worse code for no good reason (unless the function happens to be inlined and the compiler sees the length for the simple constant it is). Also, there seems to be some code generation problem on POWER, where Henrik Bakken reports that register r28 can get corrupted under some subtle circumstances (interrupt happening at the wrong time?). That all indicates some seriously broken compiler issues, but since variable length arrays are bad regardless, there's little point in trying to chase it down. "Just don't do that, then". Reported-by: Henrik Grindal Bakken <henribak@cisco.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-16PM / Hibernate: Improve performance of LZO/plain hibernation, checksum imageBojan Smojver
Use threads for LZO compression/decompression on hibernate/thaw. Improve buffering on hibernate/thaw. Calculate/verify CRC32 of the image pages on hibernate/thaw. In my testing, this improved write/read speed by a factor of about two. Signed-off-by: Bojan Smojver <bojan@rexursive.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16PM / Hibernate: Do not initialize static and extern variables to 0Barry Song
Static and extern variables in kernel/power/hibernate.c need not be initialized to 0 explicitly, so remove those initializations. [rjw: Modified subject, added changelog.] Signed-off-by: Barry Song <Baohua.Song@csr.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16PM / Freezer: Make fake_signal_wake_up() wake TASK_KILLABLE tasks tooJeff Layton
TASK_KILLABLE is often used to put tasks to sleep for quite some time. One of the most common uses is to put tasks to sleep while waiting for replies from a server on a networked filesystem (such as CIFS or NFS). Unfortunately, fake_signal_wake_up does not currently wake up tasks that are sleeping in TASK_KILLABLE state. This means that even if the code were in place to allow them to freeze while in this sleep, it wouldn't work anyway. This patch changes this function to wake tasks in this state as well. This should be harmless -- if the code doing the sleeping doesn't have handling to deal with freezer events, it should just go back to sleep. If it does, then this will allow that code to do the right thing. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16PM / Hibernate: Add resumedelay kernel param in addition to resumewaitBarry Song
Patch "PM / Hibernate: Add resumewait param to support MMC-like devices as resume file" added the resumewait kernel command line option. The present patch adds resumedelay so that resumewait/delay were analogous to rootwait/delay. [rjw: Modified the subject and changelog slightly.] Signed-off-by: Barry Song <baohua.song@csr.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16PM / Hibernate: Add resumewait param to support MMC-like devices as resume fileBarry Song
Some devices like MMC are async detected very slow. For example, drivers/mmc/host/sdhci.c launches a 200ms delayed work to detect MMC partitions then add disk. We have wait_for_device_probe() and scsi_complete_async_scans() before calling swsusp_check(), but it is not enough to wait for MMC. This patch adds resumewait kernel param just like rootwait so that we have enough time to wait until MMC is ready. The difference is that we wait for resume partition whereas rootwait waits for rootfs partition (which may be on a different device). This patch will make hibernation support many embedded products without SCSI devices, but with devices like MMC. [rjw: Modified the changelog slightly.] Signed-off-by: Barry Song <Baohua.Song@csr.com> Reviewed-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16PM / Hibernate: Fix typo in a kerneldoc commentBarry Song
Fix a typo in a function name in the kerneldoc comment next to resume_target_kernel(). [rjw: Changed the subject slightly, added the changelog.] Signed-off-by: Barry Song <Baohua.Song@csr.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16PM / Hibernate: Freeze kernel threads after preallocating memoryRafael J. Wysocki
There is a problem with the current ordering of hibernate code which leads to deadlocks in some filesystems' memory shrinkers. Namely, some filesystems use freezable kernel threads that are inactive when the hibernate memory preallocation is carried out. Those same filesystems use memory shrinkers that may be triggered by the hibernate memory preallocation. If those memory shrinkers wait for the frozen kernel threads, the hibernate process deadlocks (this happens with XFS, for one example). Apparently, it is not technically viable to redesign the filesystems in question to avoid the situation described above, so the only possible solution of this issue is to defer the freezing of kernel threads until the hibernate memory preallocation is done, which is implemented by this change. Unfortunately, this requires the memory preallocation to be done before the "prepare" stage of device freeze, so after this change the only way drivers can allocate additional memory for their freeze routines in a clean way is to use PM notifiers. Reported-by: Christoph <cr2005@u-club.de> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16PM / VT: Cleanup #if defined uglyness and fix compile errorH Hartley Sweeten
Introduce the config option CONFIG_VT_CONSOLE_SLEEP in order to cleanup the #if defined ugliness for the vt suspend support functions. Note that CONFIG_VT_CONSOLE is already dependant on CONFIG_VT. The function pm_set_vt_switch is actually dependant on CONFIG_VT and not CONFIG_PM_SLEEP. This fixes a compile error when CONFIG_PM_SLEEP is not set: drivers/tty/vt/vt_ioctl.c:1794: error: redefinition of 'pm_set_vt_switch' include/linux/suspend.h:17: error: previous definition of 'pm_set_vt_switch' was here Also, remove the incorrect path from the comment in console.c. [rjw: Replaced #if defined() with #ifdef in suspend.h.] Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16PM / Suspend: Off by one in pm_suspend()Dan Carpenter
In enter_state() we use "state" as an offset for the pm_states[] array. The pm_states[] array only has PM_SUSPEND_MAX elements so this test is off by one. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: stable@kernel.org
2011-10-16PM / Hibernate: Include storage keys in hibernation image on s390Martin Schwidefsky
For s390 there is one additional byte associated with each page, the storage key. This byte contains the referenced and changed bits and needs to be included into the hibernation image. If the storage keys are not restored to their previous state all original pages would appear to be dirty. This can cause inconsistencies e.g. with read-only filesystems. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16PM: Fix build issue in main.c for CONFIG_PM_SLEEP unsetRafael J. Wysocki
Suspend statistics should depend on CONFIG_PM_SLEEP, so make that happen. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16PM / Suspend: Add statistics debugfs file for suspend to RAMShuoX Liu
Record S3 failure time about each reason and the latest two failed devices' names in S3 progress. We can check it through 'suspend_stats' entry in debugfs. The motivation of the patch: We are enabling power features on Medfield. Comparing with PC/notebook, a mobile enters/exits suspend-2-ram (we call it s3 on Medfield) far more frequently. If it can't enter suspend-2-ram in time, the power might be used up soon. We often find sometimes, a device suspend fails. Then, system retries s3 over and over again. As display is off, testers and developers don't know what happens. Some testers and developers complain they don't know if system tries suspend-2-ram, and what device fails to suspend. They need such info for a quick check. The patch adds suspend_stats under debugfs for users to check suspend to RAM statistics quickly. If not using this patch, we have other methods to get info about what device fails. One is to turn on CONFIG_PM_DEBUG, but users would get too much info and testers need recompile the system. In addition, dynamic debug is another good tool to dump debug info. But it still doesn't match our utilization scenario closely. 1) user need write a user space parser to process the syslog output; 2) Our testing scenario is we leave the mobile for at least hours. Then, check its status. No serial console available during the testing. One is because console would be suspended, and the other is serial console connecting with spi or HSU devices would consume power. These devices are powered off at suspend-2-ram. Signed-off-by: ShuoX Liu <shuox.liu@intel.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-07Merge branch 'pm-qos' into pm-for-linusRafael J. Wysocki
* pm-qos: PM / QoS: Update Documentation for the pm_qos and dev_pm_qos frameworks PM / QoS: Add function dev_pm_qos_read_value() (v3) PM QoS: Add global notification mechanism for device constraints PM QoS: Implement per-device PM QoS constraints PM QoS: Generalize and export constraints management code PM QoS: Reorganize data structs PM QoS: Code reorganization PM QoS: Minor clean-ups PM QoS: Move and rename the implementation files
2011-10-07Merge branch 'pm-runtime' into pm-for-linusRafael J. Wysocki
* pm-runtime: PM / Tracing: build rpm-traces.c only if CONFIG_PM_RUNTIME is set PM / Runtime: Replace dev_dbg() with trace_rpm_*() PM / Runtime: Introduce trace points for tracing rpm_* functions PM / Runtime: Don't run callbacks under lock for power.irq_safe set USB: Add wakeup info to debugging messages PM / Runtime: pm_runtime_idle() can be called in atomic context PM / Runtime: Add macro to test for runtime PM events PM / Runtime: Add might_sleep() to runtime PM functions
2011-10-07Merge branch 'master' of github.com:davem330/netDavid S. Miller
Conflicts: net/batman-adv/soft-interface.c
2011-10-03ipv4: NET_IPV4_ROUTE_GC_INTERVAL removalVasily Averin
removing obsoleted sysctl, ip_rt_gc_interval variable no longer used since 2.6.38 Signed-off-by: Vasily Averin <vvs@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-01Merge branches 'irq-urgent-for-linus', 'x86-urgent-for-linus' and ↵Linus Torvalds
'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip * 'irq-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip: irq: Fix check for already initialized irq_domain in irq_domain_add irq: Add declaration of irq_domain_simple_ops to irqdomain.h * 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip: x86/rtc: Don't recursively acquire rtc_lock * 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip: posix-cpu-timers: Cure SMP wobbles sched: Fix up wchan borkage sched/rt: Migrate equal priority tasks to available CPUs
2011-09-30posix-cpu-timers: Cure SMP wobblesPeter Zijlstra
David reported: Attached below is a watered-down version of rt/tst-cpuclock2.c from GLIBC. Just build it with "gcc -o test test.c -lpthread -lrt" or similar. Run it several times, and you will see cases where the main thread will measure a process clock difference before and after the nanosleep which is smaller than the cpu-burner thread's individual thread clock difference. This doesn't make any sense since the cpu-burner thread is part of the top-level process's thread group. I've reproduced this on both x86-64 and sparc64 (using both 32-bit and 64-bit binaries). For example: [davem@boricha build-x86_64-linux]$ ./test process: before(0.001221967) after(0.498624371) diff(497402404) thread: before(0.000081692) after(0.498316431) diff(498234739) self: before(0.001223521) after(0.001240219) diff(16698) [davem@boricha build-x86_64-linux]$ The diff of 'process' should always be >= the diff of 'thread'. I make sure to wrap the 'thread' clock measurements the most tightly around the nanosleep() call, and that the 'process' clock measurements are the outer-most ones. --- #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <time.h> #include <fcntl.h> #include <string.h> #include <errno.h> #include <pthread.h> static pthread_barrier_t barrier; static void *chew_cpu(void *arg) { pthread_barrier_wait(&barrier); while (1) __asm__ __volatile__("" : : : "memory"); return NULL; } int main(void) { clockid_t process_clock, my_thread_clock, th_clock; struct timespec process_before, process_after; struct timespec me_before, me_after; struct timespec th_before, th_after; struct timespec sleeptime; unsigned long diff; pthread_t th; int err; err = clock_getcpuclockid(0, &process_clock); if (err) return 1; err = pthread_getcpuclockid(pthread_self(), &my_thread_clock); if (err) return 1; pthread_barrier_init(&barrier, NULL, 2); err = pthread_create(&th, NULL, chew_cpu, NULL); if (err) return 1; err = pthread_getcpuclockid(th, &th_clock); if (err) return 1; pthread_barrier_wait(&barrier); err = clock_gettime(process_clock, &process_before); if (err) return 1; err = clock_gettime(my_thread_clock, &me_before); if (err) return 1; err = clock_gettime(th_clock, &th_before); if (err) return 1; sleeptime.tv_sec = 0; sleeptime.tv_nsec = 500000000; nanosleep(&sleeptime, NULL); err = clock_gettime(th_clock, &th_after); if (err) return 1; err = clock_gettime(my_thread_clock, &me_after); if (err) return 1; err = clock_gettime(process_clock, &process_after); if (err) return 1; diff = process_after.tv_nsec - process_before.tv_nsec; printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n", process_before.tv_sec, process_before.tv_nsec, process_after.tv_sec, process_after.tv_nsec, diff); diff = th_after.tv_nsec - th_before.tv_nsec; printf("thread: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n", th_before.tv_sec, th_before.tv_nsec, th_after.tv_sec, th_after.tv_nsec, diff); diff = me_after.tv_nsec - me_before.tv_nsec; printf("self: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n", me_before.tv_sec, me_before.tv_nsec, me_after.tv_sec, me_after.tv_nsec, diff); return 0; } This is due to us using p->se.sum_exec_runtime in thread_group_cputime() where we iterate the thread group and sum all data. This does not take time since the last schedule operation (tick or otherwise) into account. We can cure this by using task_sched_runtime() at the cost of having to take locks. This also means we can (and must) do away with thread_group_sched_runtime() since the modified thread_group_cputime() is now more accurate and would deadlock when called from thread_group_sched_runtime(). Aside of that it makes the function safe on 32 bit systems. The old code added t->se.sum_exec_runtime unprotected. sum_exec_runtime is a 64bit value and could be changed on another cpu at the same time. Reported-by: David Miller <davem@davemloft.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins Tested-by: David Miller <davem@davemloft.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-09-29Resource: fix wrong resource window calculationRam Pai
__find_resource() incorrectly returns a resource window which overlaps an existing allocated window. This happens when the parent's resource-window spans 0x00000000 to 0xffffffff and is entirely allocated to all its children resource-windows. __find_resource() looks for gaps in resource allocation among the children resource windows. When it encounters the last child window it blindly tries the range next to one allocated to the last child. Since the last child's window ends at 0xffffffff the calculation overflows, leading the algorithm to believe that any window in the range 0x0000000 to 0xfffffff is available for allocation. This leads to a conflicting window allocation. Michal Ludvig reported this issue seen on his platform. The following patch fixes the problem and has been verified by Michal. I believe this bug has been there for ages. It got exposed by git commit 2bbc6942273b ("PCI : ability to relocate assigned pci-resources") Signed-off-by: Ram Pai <linuxram@us.ibm.com> Tested-by: Michal Ludvig <mludvig@logix.net.nz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-29user namespace: usb: make usb urbs user namespace aware (v2)Serge Hallyn
Add to the dev_state and alloc_async structures the user namespace corresponding to the uid and euid. Pass these to kill_pid_info_as_uid(), which can then implement a proper, user-namespace-aware uid check. Changelog: Sep 20: Per Oleg's suggestion: Instead of caching and passing user namespace, uid, and euid each separately, pass a struct cred. Sep 26: Address Alan Stern's comments: don't define a struct cred at usbdev_open(), and take and put a cred at async_completed() to ensure it lasts for the duration of kill_pid_info_as_cred(). Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-09-29PM / Tracing: build rpm-traces.c only if CONFIG_PM_RUNTIME is setMing Lei
Do not build kernel/trace/rpm-traces.c if CONFIG_PM_RUNTIME is not set, which avoids a build failure. [rjw: Added the changelog and modified the subject slightly.] Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-09-28connector: add comm change event report to proc connectorVladimir Zapolskiy
Add an event to monitor comm value changes of tasks. Such an event becomes vital, if someone desires to control threads of a process in different manner. A natural characteristic of threads is its comm value, and helpfully application developers have an opportunity to change it in runtime. Reporting about such events via proc connector allows to fine-grain monitoring and control potentials, for instance a process control daemon listening to proc connector and following comm value policies can place specific threads to assigned cgroup partitions. It might be possible to achieve a pale partial one-shot likeness without this update, if an application changes comm value of a thread generator task beforehand, then a new thread is cloned, and after that proc connector listener gets the fork event and reads new thread's comm value from procfs stat file, but this change visibly simplifies and extends the matter. Signed-off-by: Vladimir Zapolskiy <vzapolskiy@gmail.com> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-27PM / Runtime: Introduce trace points for tracing rpm_* functionsMing Lei
This patch introduces 3 trace points to prepare for tracing rpm_idle/rpm_suspend/rpm_resume functions, so we can use these trace points to replace the current dev_dbg(). Signed-off-by: Ming Lei <ming.lei@canonical.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-09-27treewide: Correct spelling of successfully in commentsJoe Perches
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-26sched: Fix up wchan borkageSimon Kirby
Commit c259e01a1ec ("sched: Separate the scheduler entry for preemption") contained a boo-boo wrecking wchan output. It forgot to put the new schedule() function in the __sched section and thereby doesn't get properly ignored for things like wchan. Tested-by: Simon Kirby <sim@hostway.ca> Cc: stable@kernel.org # 2.6.39+ Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110923000346.GA25425@hostway.ca Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-25ptrace: PTRACE_LISTEN forgets to unlock ->siglockOleg Nesterov
If PTRACE_LISTEN fails after lock_task_sighand() it doesn't drop ->siglock. Reported-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-20irq: Fix check for already initialized irq_domain in irq_domain_addRob Herring
The sanity check in irq_domain_add() tests desc->irq_data != NULL or irq_data->domain != NULL. This prevents adding an irq_domain to a irq descriptor when irq_data exists, which true when the irq descriptor exists. This went unnoticed so far as the simple domain code did not enter this code path because domain->nr_irqs is always 0 for the simple domains. Split the check for irq_data == NULL out and have a separate warning for it. [ tglx: Made the check for irq_data == NULL separate ] Signed-off-by: Rob Herring <rob.herring@calxeda.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: marc.zyngier@arm.com Cc: thomas.abraham@linaro.org Cc: jamie@jamieiles.com Cc: b-cousson@ti.com Cc: shawn.guo@linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: devicetree-discuss@lists.ozlabs.org Link: http://lkml.kernel.org/r/1316017900-19918-3-git-send-email-robherring2@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-09-19Merge branch 'irq-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tipLinus Torvalds
* 'irq-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip: x86, iommu: Mark DMAR IRQ as non-threaded genirq: Make irq_shutdown() symmetric vs. irq_startup again
2011-09-19Make taskstats round statistics down to nearest 1k bytes/eventsLinus Torvalds
Even with just the interface limited to admin, there really is little to reason to give byte-per-byte counts for taskstats. So round it down to something less intrusive. Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-19Make TASKSTATS require root accessLinus Torvalds
Ok, this isn't optimal, since it means that 'iotop' needs admin capabilities, and we may have to work on this some more. But at the same time it is very much not acceptable to let anybody just read anybody elses IO statistics quite at this level. Use of the GENL_ADMIN_PERM suggested by Johannes Berg as an alternative to checking the capabilities by hand. Reported-by: Vasiliy Kulikov <segoon@openwall.com> Cc: Johannes Berg <johannes.berg@intel.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-18sched/rt: Migrate equal priority tasks to available CPUsShawn Bohrer
Commit 43fa5460fe60dea5c610490a1d263415419c60f6 ("sched: Try not to migrate higher priority RT tasks") also introduced a change in behavior which keeps RT tasks on the same CPU if there is an equal priority RT task currently running even if there are empty CPUs available. This can cause unnecessary wakeup latencies, and can prevent the scheduler from balancing all RT tasks across available CPUs. This change causes an RT task to search for a new CPU if an equal priority RT task is already running on wakeup. Lower priority tasks will still have to wait on higher priority tasks, but the system should still balance out because there is always the possibility that if there are both a high and low priority RT tasks on a given CPU that the high priority task could wakeup while the low priority task is running and force it to search for a better runqueue. Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@kernel.org # 37+ Link: http://lkml.kernel.org/r/1315837684-18733-1-git-send-email-sbohrer@rgmadvisors.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-15Merge branch 'master' into for-nextJiri Kosina
Fast-forward merge with Linus to be able to merge patches based on more recent version of the tree.
2011-09-15futex: Fix spelling in a source code commentBart Van Assche
Change a single occurrence of "unlcoked" into "unlocked". Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-15futex: uninitialized warning correctionsVitaliy Ivanov
The variables here are really not used uninitialized. kernel/futex.c: In function 'fixup_pi_state_owner.clone.17': kernel/futex.c:1582:6: warning: 'curval' may be used uninitialized in this function kernel/futex.c: In function 'handle_futex_death': kernel/futex.c:2486:6: warning: 'nval' may be used uninitialized in this function kernel/futex.c: In function 'do_futex': kernel/futex.c:863:11: warning: 'curval' may be used uninitialized in this function kernel/futex.c:828:6: note: 'curval' was declared here kernel/futex.c:898:5: warning: 'oldval' may be used uninitialized in this function kernel/futex.c:890:6: note: 'oldval' was declared here Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com> Acked-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-15async: uninitialized warning correctionsVitaliy Ivanov
The variables here are really not used uninitialized. kernel/async.c: In function 'async_synchronize_cookie_domain': kernel/async.c:270:10: warning: 'starttime.tv64' may be used uninitialized in this function kernel/async.c: In function 'async_run_entry_fn': kernel/async.c:122:10: warning: 'calltime.tv64' may be used uninitialized in this function Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Viresh Kumar <viresh.kumar@st.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-14workqueue: lock cwq access in drain_workqueueThomas Tuttle
Take cwq->gcwq->lock to avoid racing between drain_workqueue checking to make sure the workqueues are empty and cwq_dec_nr_in_flight decrementing and then incrementing nr_active when it activates a delayed work. We discovered this when a corner case in one of our drivers resulted in us trying to destroy a workqueue in which the remaining work would always requeue itself again in the same workqueue. We would hit this race condition and trip the BUG_ON on workqueue.c:3080. Signed-off-by: Thomas Tuttle <ttuttle@chromium.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-12genirq: Make irq_shutdown() symmetric vs. irq_startup againGeert Uytterhoeven
If an irq_chip provides .irq_shutdown(), but neither of .irq_disable() or .irq_mask(), free_irq() crashes when jumping to NULL. Fix this by only trying .irq_disable() and .irq_mask() if there's no .irq_shutdown() provided. This revives the symmetry with irq_startup(), which tries .irq_startup(), .irq_enable(), and irq_unmask(), and makes it consistent with the comment for irq_chip.irq_shutdown() in <linux/irq.h>, which says: * @irq_shutdown: shut down the interrupt (defaults to ->disable if NULL) This is also how __free_irq() behaved before the big overhaul, cfr. e.g. 3b56f0585fd4c02d047dc406668cb40159b2d340 ("genirq: Remove bogus conditional"), where the core interrupt code always overrode .irq_shutdown() to .irq_disable() if .irq_shutdown() was NULL. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: linux-m68k@lists.linux-m68k.org Link: http://lkml.kernel.org/r/1315742394-16036-2-git-send-email-geert@linux-m68k.org Cc: stable@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-09-07Merge branch 'timers-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tipLinus Torvalds
* 'timers-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip: rtc: twl: Fix registration vs. init order rtc: Initialized rtc_time->tm_isdst rtc: Fix RTC PIE frequency limit rtc: rtc-twl: Remove lockdep related local_irq_enable() rtc: rtc-twl: Switch to using threaded irq rtc: ep93xx: Fix 'rtc' may be used uninitialized warning alarmtimers: Avoid possible denial of service with high freq periodic timers alarmtimers: Memset itimerspec passed into alarm_timer_get alarmtimers: Avoid possible null pointer traversal
2011-09-07Merge branch 'sched-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tipLinus Torvalds
* 'sched-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip: sched: Fix a memory leak in __sdt_free() sched: Move blk_schedule_flush_plug() out of __schedule() sched: Separate the scheduler entry for preemption
2011-08-31perf_event: Fix broken calc_timer_values()Eric B Munson
We detected a serious issue with PERF_SAMPLE_READ and timing information when events were being multiplexing. Samples would have time_running > time_enabled. That was easy to reproduce with a libpfm4 example (ran 3 times to cause multiplexing on Core 2): $ syst_smpl -e uops_retired:freq=1 & $ syst_smpl -e uops_retired:freq=1 & $ syst_smpl -e uops_retired:freq=1 & IIP:0x0000000040062d ... PERIOD:2355332948 ENA=40144625315 RUN=60014875184 syst_smpl: WARNING: time_running > time_enabled 63277537998 uops_retired:freq=1 , scaled The bug was not present in kernel up to (and including) 3.0. It turns out the bug was introduced by the following commit: commit c4794295917ebeda8013b6cb9c8d71ab4f74a1fa events: Move lockless timer calculation into helper function The parameters of the function got reversed yet the call sites were not updated to reflect the change. That lead to time_running and time_enabled being swapped. That had no effect when there was no multiplexing because in that case time_running = time_enabled but it would show up in any other scenario. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110829124112.GA4828@quad Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-08-29perf events: Fix slow and broken cgroup context switch codeStephane Eranian
The current cgroup context switch code was incorrect leading to bogus counts. Furthermore, as soon as there was an active cgroup event on a CPU, the context switch cost on that CPU would increase by a significant amount as demonstrated by a simple ping/pong example: $ ./pong Both processes pinned to CPU1, running for 10s 10684.51 ctxsw/s Now start a cgroup perf stat: $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 100 $ ./pong Both processes pinned to CPU1, running for 10s 6674.61 ctxsw/s That's a 37% penalty. Note that pong is not even in the monitored cgroup. The results shown by perf stat are bogus: $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 100 Performance counter stats for 'sleep 100': CPU1 <not counted> cycles test CPU1 16,984,189,138 cycles # 0.000 GHz The second 'cycles' event should report a count @ CPU clock (here 2.4GHz) as it is counting across all cgroups. The patch below fixes the bogus accounting and bypasses any cgroup switches in case the outgoing and incoming tasks are in the same cgroup. With this patch the same test now yields: $ ./pong Both processes pinned to CPU1, running for 10s 10775.30 ctxsw/s Start perf stat with cgroup: $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10 Run pong outside the cgroup: $ /pong Both processes pinned to CPU1, running for 10s 10687.80 ctxsw/s The penalty is now less than 2%. And the results for perf stat are correct: $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10 Performance counter stats for 'sleep 10': CPU1 <not counted> cycles test # 0.000 GHz CPU1 23,933,981,448 cycles # 0.000 GHz Now perf stat reports the correct counts for for the non cgroup event. If we run pong inside the cgroup, then we also get the correct counts: $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10 Performance counter stats for 'sleep 10': CPU1 22,297,726,205 cycles test # 0.000 GHz CPU1 23,933,981,448 cycles # 0.000 GHz 10.001457237 seconds time elapsed Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110825135803.GA4697@quad Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-08-29sched: Fix a memory leak in __sdt_free()WANG Cong
This patch fixes the following memory leak: unreferenced object 0xffff880107266800 (size 512): comm "sched-powersave", pid 3718, jiffies 4323097853 (age 27495.450s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81133940>] create_object+0x187/0x28b [<ffffffff814ac103>] kmemleak_alloc+0x73/0x98 [<ffffffff811232ba>] __kmalloc_node+0x104/0x159 [<ffffffff81044b98>] kzalloc_node.clone.97+0x15/0x17 [<ffffffff8104cb90>] build_sched_domains+0xb7/0x7f3 [<ffffffff8104d4df>] partition_sched_domains+0x1db/0x24a [<ffffffff8109ee4a>] do_rebuild_sched_domains+0x3b/0x47 [<ffffffff810a00c7>] rebuild_sched_domains+0x10/0x12 [<ffffffff8104d5ba>] sched_power_savings_store+0x6c/0x7b [<ffffffff8104d5df>] sched_mc_power_savings_store+0x16/0x18 [<ffffffff8131322c>] sysdev_class_store+0x20/0x22 [<ffffffff81193876>] sysfs_write_file+0x108/0x144 [<ffffffff81135b10>] vfs_write+0xaf/0x102 [<ffffffff81135d23>] sys_write+0x4d/0x74 [<ffffffff814c8a42>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@kernel.org # 3.0 Link: http://lkml.kernel.org/r/1313671017-4112-1-git-send-email-amwang@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>