summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-02-20Btrfs: if we aren't committing just end the transaction if we error outJosef Bacik
I hit a deadlock where transaction commit was waiting on num_writers to be 0. This happened because somebody came into btrfs_commit_transaction and noticed we had aborted and it went to cleanup_transaction. This shouldn't happen because cleanup_transaction is really to fixup a bad commit, it doesn't do the normal trans handle cleanup things. So if we have an error just do the normal btrfs_end_transaction dance and return. Once we are in the actual commit path we can use cleanup_transaction and be good to go. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: handle errors in compression submission pathJosef Bacik
I noticed we would deadlock if we aborted a transaction while doing compressed io. This is because we don't unlock our pages if something goes horribly wrong. To fix this we need to make sure that we call extent_clear_unlock_delalloc in order to unlock all the pages. If we have to cow in the async submission thread we need to make sure to unlock our locked_page as the cow error path will not unlock the locked page as it depends on the caller to unlock that page. With this patch we no longer deadlock on the page lock when we have an aborted transaction. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: rework the overcommit logic to be based on the total sizeJosef Bacik
People have been complaining about random ENOSPC errors that will clear up after a umount or just a given amount of time. Chris was able to reproduce this with stress.sh and lots of processes and so was I. Basically the overcommit stuff would really let us get out of hand, in my tests I saw up to 30 gigs of outstanding reservations with only 2 gigs total of metadata space. This usually worked out fine but with so much outstanding reservation the flushing stuff short circuits to make sure we don't hang forever flushing when we really need ENOSPC. Plus we allocate chunks in order to alleviate the pressure, but this doesn't actually help us since we only use the non-allocated area in our over commit logic. So instead of basing overcommit on the amount of non-allocated space, instead just do it based on how much total space we have, and then limit it to the non-allocated space in case we are short on space to spill over into. This allows us to have the same performance as well as no longer giving random ENOSPC. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: account for orphan inodes properly during cleanupJosef Bacik
Dave sent me a panic where we were doing the orphan cleanup and panic'ed trying to release our reservation from the orphan block rsv. The reason for this is because our orphan block rsv had been free'd out from underneath us because the transaction commit found that there were no orphan inodes according to its count and decided to free it. This is incorrect so make sure we inc the orphan inodes count so the accounting is all done properly. This would also cause the warning in the orphan commit code normally if you had any orphans to cleanup as they would only decrement the orphan count so you'd get a negative orphan count which could cause problems during runtime. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: unreserve space if our ordered extent fails to workJosef Bacik
When a transaction aborts or there's an EIO on an ordered extent or any error really we will not free up the space we reserved for this ordered extent. This results in warnings from the block group cache cleanup in the case of a transaction abort, or leaking space in the case of EIO on an ordered extent. Fix this up by free'ing the reserved space if we have an error at all trying to complete an ordered extent. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: fix how we discard outstanding ordered extents on abortJosef Bacik
When we abort we've been just free'ing up all the ordered extents and hoping for the best. This results in lots of warnings from various places, warnings from btrfs_destroy_inode() because it's ENOSPC accounting isn't fixed. It will also screw up lots of pages who have been set private but never get cleared because the ordered extents are never allowed to be submitted. This patch fixes those warnings. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: fix freeing delayed ref head while still holding its mutexJosef Bacik
I hit this error when reproducing a bug that would end in a transaction abort. We take the delayed ref head's mutex to keep anybody from processing it while we're destroying it, but we fail to drop the mutex before we carry on and free the damned thing. Fix this by doing the remove logic for the head ourselves and unlock the mutex, that way we can avoid use after free's or hung tasks waiting on that mutex to come back so they know the delayed ref completed. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20btrfs: ensure we don't overrun devices_info[] in __btrfs_alloc_chunkEric Sandeen
WARN_ON isn't enough, we need to stop the loop if for any reason we would overrun the devices_info array. I tried to track down the connection between the length of the alloc_devices list and the rw_devices counter but it wasn't immediately obvious, so be defensive about it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20btrfs: remove unnecessary DEFINE_WAIT() declarationsEric Sandeen
No point in DEFINE_WAIT(wait) if it's not used! Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20btrfs: remove unused "item" in btrfs_insert_delayed_item()Eric Sandeen
"item" was set but never used in this function. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20btrfs: fix varargs in __btrfs_std_errorEric Sandeen
__btrfs_std_error didn't always properly call va_end, and might call va_start even if fmt was NULL. Move all the varargs handling into the block where we have fmt. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20btrfs: add missing break in btrfs_print_leaf()Eric Sandeen
I don't think that BTRFS_DEV_EXTENT_KEY is supposed to fall through to BTRFS_DEV_STATS_KEY ... Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20btrfs: annotate intentional switch case fallthroughsEric Sandeen
This keeps static checkers happy. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20btrfs: handle null fs_info in btrfs_panic()Eric Sandeen
At least backref_tree_panic() can apparently pass in a null fs_info, so handle that in __btrfs_panic to get the message out on the console. The btrfs_panic macro also uses fs_info, but that's largely pointless; it's testing to see if BTRFS_MOUNT_PANIC_ON_FATAL_ERROR is not set. But if it *were* set, __btrfs_panic() would have, well, paniced and we wouldn't be here, testing it! So just BUG() at this point. And since we only use fs_info once now, just use it directly. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20btrfs: remove unused fs_info from btrfs_decode_error()Eric Sandeen
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20btrfs: list_entry can't return NULLEric Sandeen
No need to test the result, we can't get a null pointer from list_entry() Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20btrfs: remove unused fd in btrfs_ioctl_send()Eric Sandeen
All we do is set it to NULL and test it :) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: do not overcommit if we don't have enough space for global rsvJosef Bacik
Because of how little we allocate chunks now we can get really tight on metadata space before we will allocate a new chunk. This resulted in being unable to add device extents when allocating a new metadata chunk as we did not have enough space. This is because we were allowed to overcommit too much metadata without actually making sure we had enough space to make allocations. The idea behind overcommit is that we are allowed to say "sure you can have that reservation" when most of the free space is occupied by reservations, not actual allocations. But in this case where a majority of the total space is in use by actual allocations we can screw ourselves by not being able to make real allocations when it matters. So make sure we have enough real space for our global reserve, and if not then don't allow overcommitting. Thanks, Reported-and-tested-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: remove extent mapping if we fail to add chunkJosef Bacik
I got a double free error when unmounting a file system that failed to add a chunk during its operation. This is because we will kfree the mapping that we created but leave the extent_map in the em_tree for chunks. So to fix this just remove the extent_map when we error out so we don't run into this problem. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: fix chunk allocation error handlingJosef Bacik
If we error out allocating a dev extent we will have already created the block group and such which will cause problems since the allocator may have tried to allocate out of the block group that no longer exists. This will cause BUG_ON()'s in the bio submission path. This also makes a failure to allocate a dev extent a non-abort error, we will just clean up the dev extents we did allocate and exit. Now if we fail to delete the dev extents we will abort since we can't have half of the dev extents hanging around, but this will make us much less likely to abort. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: use bit operation for ->fs_stateMiao Xie
There is no lock to protect fs_info->fs_state, it will introduce some problems, such as the value may be covered by the other task when several tasks modify it. For example: Task0 - CPU0 Task1 - CPU1 mov %fs_state rax or $0x1 rax mov %fs_state rax or $0x2 rax mov rax %fs_state mov rax %fs_state The expected value is 3, but in fact, it is 2. Though this problem doesn't happen now (because there is only one flag currently), the code is error prone, if we add other flags, the above problem will happen to a certainty. Now we use bit operation for it to fix the above problem. In this way, we can make the code more robust and be easy to add new flags. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: use seqlock to protect fs_info->avail_{data, metadata, system}_alloc_bitsMiao Xie
There is no lock to protect fs_info->avail_{data, metadata, system}_alloc_bits, it may introduce some problem, such as the wrong profile information, so we add a seqlock to protect them. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: use the inode own lock to protect its delalloc_bytesMiao Xie
We need not use a global lock to protect the delalloc_bytes of the inode, just use its own lock. In this way, we can reduce the lock contention and ->delalloc_lock will just protect delalloc inode list. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: use percpu counter for fs_info->delalloc_bytesMiao Xie
fs_info->delalloc_bytes is accessed very frequently, so use percpu counter instead of the u64 variant for it to reduce the lock contention. This patch also fixed the problem that we access the variant without the lock protection.At worst, we would not flush the delalloc inodes, and just return ENOSPC error when we still have some free space in the fs. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: use percpu counter for dirty metadata countMiao Xie
->dirty_metadata_bytes is accessed very frequently, so use percpu counter instead of the u64 variant to reduce the contention of the lock. This patch also fixed the problem that we access it without lock protection in __btrfs_btree_balance_dirty(), which may cause we skip the dirty pages flush. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: protect fs_info->alloc_startMiao Xie
fs_info->alloc_start is a 64bits variant, can be accessed by multi-task, but it is not protected strictly, it can be changed while we are accessing it. On 32bit machine, we will get wrong value because we access it by two instructions.(In fact, it is also possible that the same problem happens on the 64bit machine, because the compiler may split the 64bit operation into two 32bit operation.) For example: Assuming -> alloc_start is 0x0000 0000 0001 0000 at the beginning, then we remount and set ->alloc_start to 0x0000 0100 0000 0000. Task0 Task1 load high 32 bits set high 32 bits set low 32 bits load low 32 bits Task1 will get 0. This patch fixes this problem by using two locks to protect it fs_info->chunk_mutex sb->s_umount On the read side, we just need get one of these two locks, and on the write side, we must lock all of them. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: add a comment for fs_info->max_inlineMiao Xie
Though ->max_inline is a 64bit variant, and may be accessed by multi-task, but it is just suggestive number, so we needn't add anything to protect fs_info->max_inline, just add a comment to explain wny we don't use a lock to protect it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20sparc64: Fix tsb_grow() in atomic context.David S. Miller
If our first THP installation for an MM is via the set_pmd_at() done during khugepaged's collapsing we'll end up in tsb_grow() trying to do a GFP_KERNEL allocation with several locks held. Simply using GFP_ATOMIC in this situation is not the best option because we really can't have this fail, so we'd really like to keep this an order 0 GFP_KERNEL allocation if possible. Also, doing the TSB allocation from khugepaged is a really bad idea because we'll allocate it potentially from the wrong NUMA node in that context. So what we do is defer the hugepage TSB allocation until the first TLB miss we take on a hugepage. This is slightly tricky because we have to handle two unusual cases: 1) Taking the first hugepage TLB miss in the window trap handler. We'll call the winfix_trampoline when that is detected. 2) An initial TSB allocation via TLB miss races with a hugetlb fault on another cpu running the same MM. We handle this by unconditionally loading the TSB we see into the current cpu even if it's non-NULL at hugetlb_setup time. Reported-by: Meelis Roos <mroos@ut.ee> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-20sparc64: Handle hugepage TSB being NULL.David S. Miller
Accomodate the possibility that the TSB might be NULL at the point that update_mmu_cache() is invoked. This is necessary because we will sometimes need to defer the TSB allocation to the first fault that happens in the 'mm'. Seperate out the hugepage PTE test into a seperate function so that the logic is clearer. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-20sparc64: Fix gfp_flags setting in tsb_grow().David S. Miller
We should "|= more_flags" rather than "= more_flags". Reported-by: David Rientjes <rientjes@google.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-20Merge tag 'hwmon-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon updates from Guenter Roeck: - New drivers for MAX6697 and compatibles and for INA209. - Added support for IT8771E, IT8772E, MAX34460, MAX34461, MCP98244, and ADT7420 to existing drivers. - Added support for additional attributes to various drivers. - Replaced SENSORS_LIMIT with clamp_val; retire SENSORS_LIMIT; - Clean up PMBus code to reduce its size; clean up adt7410 driver. - A couple of minor bug fixes as well as documentation cleanup. - Out-of-tree change: Replace SENSORS_LIMIT with clamp_val in platform/x86/eeepc-laptop driver. * tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: (32 commits) hwmon: (ntc_thermistor): Fix sparse warnings hwmon: (adt7410) Add device table entry for the adt7420 hwmon: (adt7410) Use I2C_ADDRS helper macro hwmon: (adt7410) Use the SIMPLE_DEV_PM_OPS helper macro hwmon: (adt7410) Let suspend/resume depend on CONFIG_PM_SLEEP hwmon: (adt7410) Clear unwanted bits in the config register hwmon: (jc42) Add support for MCP98244 hwmon: (pmbus) Clean up for code size reduction hwmon: (pmbus/max34440) Add support for MAX34460 and MAX34461 hwmon: (pmbus) Add support for word status register hwmon: (pmbus/zl6100) Add support for VMON/VDRV hwmon: (pmbus) Add function to clear sensor cache hwmon: (pmbus) Add support for additional voltage sensor hwmon: (pmbus) Use krealloc to allocate attribute memory hwmon: (pmbus) Simplify memory allocation for sensor attributes hwmon: (pmbus) Improve boolean handling hwmon: (pmbus) Simplify memory allocation for labels and booleans hwmon: (pmbus) Use dev variable to represent client->dev hwmon: (pmbus) Fix 'Macros with multiple statements' checkpatch error hwmon: (pmbus) Drop unnecessary error messages in probe error path ...
2013-02-20MIPS: Quit exporting kernel internel break codes to uapi/asm/break.hDavid Daney
The internal codes are not part of the kernel's ABI. Signed-off-by: David Daney <david.daney@cavium.com> Patchwork: http://patchwork.linux-mips.org/patch/4932/ Signed-off-by: John Crispin <blogic@openwrt.org> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2013-02-20Merge tag 'pinctrl-for-v3.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pinctrl changes from Linus Walleij: "These are the main pinctrl changes for the v3.9 merge window. The most interesting change by far is how the device core grabs pinctrl default handles avoiding the need to stick boilerplate into driver consumers. - Grabbing of default pinctrl handles from the device core. These are the hunks hitting drivers/base. All is ACKed by Greg, after a long discussion about different alternatives. - Some stuff also touches the MFD and ARM SoC trees, this has been coordinated and ACKed. - New drivers for: - The Tegra 114 sub-SoC - Allwinner sunxi - New ABx500 driver and sub-SoC drivers for AB8500, AB8505, AB9540 and AB8540. - Make it possible for hogged pins to enter a sleep mode, and make it possible for drivers to control that mode. - Various clean-up, extensions and device tree support to various pin controllers." * tag 'pinctrl-for-v3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (68 commits) pinctrl: tegra: add clfvs function to Tegra114 support pinctrl: generic: rename input schmitt disable pinctrl/pinconfig: add debug interface pinctrl: samsung: remove duplicated line ARM: ux500: use real AB8500 IRQ numbers instead of virtual ones ARM: ux500: remove irq_base property from platform_data pinctrl/abx500: use direct IRQ defines pinctrl/abx500: replace IRQ offsets with table read-in values pinctrl/abx500: move IRQ handling to ab8500-core pinctrl: exynos5440: remove erroneous __init pinctrl/abx500: adjust offset for get_mode() pinctrl/abx500: add Device Tree support pinctrl/abx500: align GPIO cluster boundaries pinctrl/abx500: prevent error path from corrupting returning error pinctrl: sunxi: add of_xlate function pinctrl/lantiq: fix pin number in ltq_pmx_gpio_request_enable pinctrl/lantiq: add functionality to falcon_pinconf_dbg_show pinctrl/lantiq: fix pinconfig parameters pinctrl/lantiq: one of the boot leds was defined incorrectly pinctrl/lantiq: only probe available pad controllers ...
2013-02-20Merge tag 'regulator-3.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator updates from Mark Brown: "A fairly quiet release for the regulator API, the bulk of the changes being lots of small cleanups and API updates contributed by Axel Lin with just a small set of larger changes: - New driver for LP8755 - DT support for S5M8767, TPS51632, TPS6507x and TPS65090 - Support for writing a "commit changes" bit in the regmap helper functions." * tag 'regulator-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (60 commits) regulator: Fix memory garbage dev_err printout. regulator: max77686: Reuse rdev_get_id() function. regulator: tps51632: Use regulator_[get|set]_voltage_sel_regmap regulator: as3711: Fix checking if no platform initialization data regulator: s5m8767: Prevent possible NULL pointer dereference regulator: s5m8767: Fix dev argument for devm_kzalloc and of_get_regulator_init_data regulator: core: Optimize _regulator_do_set_voltage if voltage does not change regulator: max8998: Let regulator core handle the case selector == old_selector regulator: s5m8767: Use of_get_child_count() regulator: anatop: improve precision of delay time regulator: show state for GPIO-controlled regulators regulator: s5m8767: Fix build in non-DT case regulator: add device tree support for s5m8767 regulator: palmas: Remove a redundant setting for warm_reset regulator: mc13xxx: Use of_get_child_count() regulator: max8997: Use of_get_child_count() regulator: tps65090: Fix using wrong dev argument for calling of_regulator_match regulators: anatop: add set_voltage_time_sel interface regulator: Add missing of_node_put() regulator: tps6507x: Fix using wrong dev argument for calling of_regulator_match ...
2013-02-20ARM: prima2: remove duplicate v7_invalidate_l1Arnd Bergmann
Patch c08e20d "arm: Add v7_invalidate_l1 to cache-v7.S" added a generic version of this function and removed all platform specific versions, while 4898de3 "ARM: PRIMA2: add new SiRFmarco SMP SoC infrastructures" added another one, leading to a link error. I verified that the two are identical, so we can just remove the one in mach-prima2. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2013-02-20Merge tag 'regmap-3.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap updates from Mark Brown: "Several nice new features and performance improvements here, especially the first: - Support for using the cache infrastructure without the physical I/O, allowing devices which don't fit the physical model regmap has to take advantage of the cache infrastructure, contributed by Andrey Smirnov. - Several small improvements to the support for wake capable IRQs. - Support for asynchronous I/O, allowing us to come much closer to saturating fast buses like SPI. - Support for simple array caches, giving higher performance for use with MMIO devices. - Restoration of the use of bulk reads for handling interrupts, giving a performance improvement. - Support for 24 bit register addresses. - More performance improvements for debugfs." * tag 'regmap-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: (24 commits) regmap: mmio: add register clock support regmap: debugfs: Factor out debugfs_tot_len calc into a function regmap: debugfs: Optimize seeking within blocks of registers regmap: debugfs: Add a `max_reg' member in struct regmap_debugfs_off_cache regmap: debugfs: Fix reading in register field units regmap: spi: Handle allocation failures gracefully regmap: Export regmap_async_complete() regmap: Export regmap_async_complete_cb regmap: include linux/sched.h to fix build regmap: spi: Support asynchronous I/O for SPI regmap: Add asynchronous I/O support regmap: Add "no-bus" option for regmap API regmap: regmap: avoid spurious warning in regmap_read_debugfs regmap: Add provisions to have user-defined write operation regmap: Add provisions to have user-defined read operation regmap: Add support for 24 bit wide register addresses mfd: wm5110: Mark wakes as inverted mfd: wm5102: Mark wakes as inverted regmap: irq: Support wake IRQ mask inversion regmap: irq: Fix sync of wake statuses to hardware ...
2013-02-20Merge branch 'for-3.9-cpuset' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cpuset changes from Tejun Heo: - Synchornization has seen a lot of changes with focus on decoupling cpuset synchronization from cgroup internal locking. After this change, there only remain a couple of mostly trivial dependencies on cgroup_lock outside cgroup core proper. cgroup_lock is scheduled to be unexported in this devel cycle. This will finally remove the fragile locking order around cgroup (cgroup locking wants to / should be one of the outermost but yet has been acquired from deep inside individual controllers). - At this point, Li is most knowlegeable with cpuset and taking over the maintainership of cpuset. * 'for-3.9-cpuset' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cpuset: drop spurious retval assignment in proc_cpuset_show() cpuset: fix RCU lockdep splat cpuset: update MAINTAINERS cpuset: remove cpuset->parent cpuset: replace cpuset->stack_list with cpuset_for_each_descendant_pre() cpuset: replace cgroup_mutex locking with cpuset internal locking cpuset: schedule hotplug propagation from cpuset_attach() if the cpuset is empty cpuset: pin down cpus and mems while a task is being attached cpuset: make CPU / memory hotplug propagation asynchronous cpuset: drop async_rebuild_sched_domains() cpuset: don't nest cgroup_mutex inside get_online_cpus() cpuset: reorganize CPU / memory hotplug handling cpuset: cleanup cpuset[_can]_attach() cpuset: introduce cpuset_for_each_child() cpuset: introduce CS_ONLINE cpuset: introduce ->css_on/offline() cpuset: remove fast exit path from remove_tasks_in_empty_cpuset() cpuset: remove unused cpuset_unlock()
2013-02-20Merge branch 'for-3.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup changes from Tejun Heo: "Nothing too drastic. - Removal of synchronize_rcu() from userland visible paths. - Various fixes and cleanups from Li. - cgroup_rightmost_descendant() added which will be used by cpuset changes (it will be a separate pull request)." * 'for-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: fail if monitored file and event_control are in different cgroup cgroup: fix cgroup_rmdir() vs close(eventfd) race cpuset: fix cpuset_print_task_mems_allowed() vs rename() race cgroup: fix exit() vs rmdir() race cgroup: remove bogus comments in cgroup_diput() cgroup: remove synchronize_rcu() from cgroup_diput() cgroup: remove duplicate RCU free on struct cgroup sched: remove redundant NULL cgroup check in task_group_path() sched: split out css_online/css_offline from tg creation/destruction cgroup: initialize cgrp->dentry before css_alloc() cgroup: remove a NULL check in cgroup_exit() cgroup: fix bogus kernel warnings when cgroup_create() failed cgroup: remove synchronize_rcu() from rebind_subsystems() cgroup: remove synchronize_rcu() from cgroup_attach_{task|proc}() cgroup: use new hashtable implementation cgroups: fix cgroup_event_listener error handling cgroups: move cgroup_event_listener.c to tools/cgroup cgroup: implement cgroup_rightmost_descendant() cgroup: remove unused dummy cgroup_fork_callbacks()
2013-02-20Btrfs: move fs/btrfs/ioctl.h to include/uapi/linux/btrfs.hFilipe Brandenburger
The header file will then be installed under /usr/include/linux so that userspace applications can refer to Btrfs ioctls by name and use the same structs used internally in the kernel. Signed-off-by: Filipe Brandenburger <filbranden@google.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: Check CAP_DAC_READ_SEARCH for BTRFS_IOC_INO_PATHSKusanagi Kouichi
CAP_DAC_READ_SEARCH overrides read and search permission check on file and directory. It seems fit for BTRFS_IOC_INO_PATHS. Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Revert "Btrfs: fix permissions of empty files not affected by umask"Josef Bacik
This reverts commit 2794ed013b3551cbae887ea1b93c52aaacb7370d. Wasn't supposed to get used in btrfs_mknod, it was supposed to be in btrfs_create, which was done in commit 9185aa587b7425f8f4520da2e66792f5f3c2b815. Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: don't traverse the ordered operation list repeatedlyMiao Xie
btrfs_run_ordered_operations() needn't traverse the ordered operation list repeatedly, it is because the transaction commiter will invoke it again when there is no other writer in this transaction, it can ensure that no one can add new objects into the ordered operation list. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: traverse and flush the delalloc inodes onceMiao Xie
btrfs_start_delalloc_inodes() needn't traverse and flush the delalloc inodes repeatedly. It is because we can regard the data that the users write after we start delalloc inodes flush as the one which is after the delalloc inodes flush is done, and we can flush it next time. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: check the return value of btrfs_run_ordered_operations()Miao Xie
We forget to check the return value of btrfs_run_ordered_operations() when flushing all the pending stuffs, fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: check the return value of btrfs_start_delalloc_inodes()Miao Xie
We forget to check the return value of btrfs_start_delalloc_inodes(), fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: make raid attr array more readableMiao Xie
The current code of raid attr arry is hard to understand and it is easy to introduce some problem if we modify the array. So I changed it and made it more readable. Cc: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: record first logical byte in memoryLiu Bo
This'd save us a rbtree search which may become expensive in large filesystem. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: save us a read_lockLiu Bo
This does not change the logic of code, but can save us a read_lock. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: use token to avoid times mapping extent bufferLiu Bo
The API in tree log code has done sort of changes, and it proves that we can benifit from using token, so do the same thing here. function_graph tracer's timer shows that it costs nearly half time of before(39.788us -> 22.391us). Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: kill unused argument of btrfs_pin_extent_for_log_replayLiu Bo
Argument 'trans' is not used any more. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>