summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-07-01ext4: improve free space calculation for inline_databoxi liu
In ext4 feature inline_data,it use the xattr's space to store the inline data in inode.When we calculate the inline data as the xattr,we add the pad.But in get_max_inline_xattr_value_size() function we count the free space without pad.It cause some contents are moved to a block even if it can be stored in the inode. Signed-off-by: liulei <lewis.liulei@huawei.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Tao Ma <boyu.mt@taobao.com>
2013-07-01ext4: reduce object size when !CONFIG_PRINTKJoe Perches
Reduce the object size ~10% could be useful for embedded systems. Add #ifdef CONFIG_PRINTK #else #endif blocks to hold formats and arguments, passing " " to functions when !CONFIG_PRINTK and still verifying format and arguments with no_printk. $ size fs/ext4/built-in.o* text data bss dec hex filename 239375 610 888 240873 3ace9 fs/ext4/built-in.o.new 264167 738 888 265793 40e41 fs/ext4/built-in.o.old $ grep -E "CONFIG_EXT4|CONFIG_PRINTK" .config # CONFIG_PRINTK is not set CONFIG_EXT4_FS=y CONFIG_EXT4_USE_FOR_EXT23=y CONFIG_EXT4_FS_POSIX_ACL=y # CONFIG_EXT4_FS_SECURITY is not set # CONFIG_EXT4_DEBUG is not set Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-07-01ext4: improve extent cache shrink mechanism to avoid to burn CPU timeZheng Liu
Now we maintain an proper in-order LRU list in ext4 to reclaim entries from extent status tree when we are under heavy memory pressure. For keeping this order, a spin lock is used to protect this list. But this lock burns a lot of CPU time. We can use the following steps to trigger it. % cd /dev/shm % dd if=/dev/zero of=ext4-img bs=1M count=2k % mkfs.ext4 ext4-img % mount -t ext4 -o loop ext4-img /mnt % cd /mnt % for ((i=0;i<160;i++)); do truncate -s 64g $i; done % for ((i=0;i<160;i++)); do cp $i /dev/null &; done % perf record -a -g % perf report This commit tries to fix this problem. Now a new member called i_touch_when is added into ext4_inode_info to record the last access time for an inode. Meanwhile we never need to keep a proper in-order LRU list. So this can avoid to burns some CPU time. When we try to reclaim some entries from extent status tree, we use list_sort() to get a proper in-order list. Then we traverse this list to discard some entries. In ext4_sb_info, we use s_es_last_sorted to record the last time of sorting this list. When we traverse the list, we skip the inode that is newer than this time, and move this inode to the tail of LRU list. When the head of the list is newer than s_es_last_sorted, we will sort the LRU list again. In this commit, we break the loop if s_extent_cache_cnt == 0 because that means that all extents in extent status tree have been reclaimed. Meanwhile in this commit, ext4_es_{un}register_shrinker()'s prototype is changed to save a local variable in these functions. Reported-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-07-01ext4: implement error handling of ext4_mb_new_preallocation()Alexey Khoroshilov
If memory allocation in ext4_mb_new_group_pa() is failed, it returns error code, ext4_mb_new_preallocation() propages it, but ext4_mb_new_blocks() ignores it. An observed result was: - allocation fail means ext4_mb_new_group_pa() does not update ext4_allocation_context; - ext4_mb_new_blocks() sets ext4_allocation_request->len (ar->len = ac->ac_b_ex.fe_len;) to number of blocks preallocated (512) instead of number of blocks requested (1); - that activates update cycle in ext4_splice_branch(): for (i = 1; i < blks; i++) <-- blks is 512 instead of 1 here *(where->p + i) = cpu_to_le32(current_block++); - it iterates 511 times and corrupts a chunk of memory including inode structure; - page fault happens at EXT4_SB(inode->i_sb) in ext4_mark_inode_dirty(); - system hangs with 'scheduling while atomic' BUG. The patch implements a check for ext4_mb_new_preallocation() error code and handles its failure as if ext4_mb_regular_allocator() fails. Found by Linux File System Verification project (linuxtesting.org). [ Patch restructed by tytso to make the flow of control easier to follow. ] Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-07-01ext4: fix corruption when online resizing a fs with 1K block sizeMaarten ter Huurne
Subtracting the number of the first data block places the superblock backups one block too early, corrupting the file system. When the block size is larger than 1K, the first data block is 0, so the subtraction has no effect and no corruption occurs. Signed-off-by: Maarten ter Huurne <maarten@treewalker.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> CC: stable@vger.kernel.org
2013-07-01Merge branch 'for-next/hugepages' of ↵Catalin Marinas
git://git.linaro.org/people/stevecapper/linux into upstream-hugepages * 'for-next/hugepages' of git://git.linaro.org/people/stevecapper/linux: ARM64: mm: THP support. ARM64: mm: Raise MAX_ORDER for 64KB pages and THP. ARM64: mm: HugeTLB support. ARM64: mm: Move PTE_PROT_NONE bit. ARM64: mm: Make PAGE_NONE pages read only and no-execute. ARM64: mm: Restore memblock limit when map_mem finished. mm: thp: Correct the HPAGE_PMD_ORDER check. x86: mm: Remove general hugetlb code from x86. mm: hugetlb: Copy general hugetlb code from x86 to mm. x86: mm: Remove x86 version of huge_pmd_share. mm: hugetlb: Copy huge_pmd_share from x86 to mm. Conflicts: arch/arm64/Kconfig arch/arm64/include/asm/pgtable-hwdef.h arch/arm64/include/asm/pgtable.h
2013-07-01Merge tag 'v3.10' into sched/coreIngo Molnar
Merge in a recent upstream commit: c2853c8df57f include/linux/math64.h: add div64_ul() because: 72a4cf20cb71 sched: Change cfs_rq load avg to unsigned long relies on it. [ We don't rebase sched/core for this, because the handful of followup commits after the broken commit are not behavioral changes so are unlikely to be needed during bisection. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-30Linux 3.10v3.10Linus Torvalds
2013-06-30Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull another powerpc fix from Benjamin Herrenschmidt: "I mentioned that while we had fixed the kernel crashes, EEH error recovery didn't always recover... It appears that I had a fix for that already in powerpc-next (with a stable CC). I cherry-picked it today and did a few tests and it seems that things now work quite well. The patch is also pretty simple, so I see no reason to wait before merging it." * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc/eeh: Fix fetching bus for single-dev-PE
2013-06-30Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "This is a set of seven bug fixes. Several fcoe fixes for locking problems, initiator issues and a VLAN API change, all of which could eventually lead to data corruption, one fix for a qla2xxx locking problem which could lead to multiple completions of the same request (and subsequent data corruption) and a use after free in the ipr driver. Plus one minor MAINTAINERS file update" (only six bugfixes in this pull, since I had already pulled the fcoe API fix directly from Robert Love) * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: [SCSI] ipr: Avoid target_destroy accessing memory after it was freed [SCSI] qla2xxx: Fix for locking issue between driver ISR and mailbox routines MAINTAINERS: Fix fcoe mailing list libfc: extend ex_lock to protect all of fc_seq_send libfc: Correct check for initiator role libfcoe: Fix Conflicting FCFs issue in the fabric
2013-06-30Revert "serial: 8250_pci: add support for another kind of NetMos Technology ↵Greg Kroah-Hartman
PCI 9835 Multi-I/O Controller" This reverts commit 8d2f8cd424ca0b99001f3ff4f5db87c4e525f366. As reported by Stefan, this device already works with the parport_serial driver, so the 8250_pci driver should not also try to grab it as well. Reported-by: Stefan Seyfried <stefan.seyfried@googlemail.com> Cc: Wang YanQing <udknight@gmail.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-30powerpc/eeh: Fix fetching bus for single-dev-PEGavin Shan
While running Linux as guest on top of phyp, we possiblly have PE that includes single PCI device. However, we didn't return its PCI bus correctly and it leads to failure on recovery from EEH errors for single-dev-PE. The patch fixes the issue. Cc: <stable@vger.kernel.org> # v3.7+ Cc: Steve Best <sbest@us.ibm.com> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-29Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc fixes from Ben Herrenschmidt: "We discovered some breakage in our "EEH" (PCI Error Handling) code while doing error injection, due to a couple of regressions. One of them is due to a patch (37f02195bee9 "powerpc/pci: fix PCI-e devices rescan issue on powerpc platform") that, in hindsight, I shouldn't have merged considering that it caused more problems than it solved. Please pull those two fixes. One for a simple EEH address cache initialization issue. The other one is a patch from Guenter that I had originally planned to put in 3.11 but which happens to also fix that other regression (a kernel oops during EEH error handling and possibly hotplug). With those two, the couple of test machines I've hammered with error injection are remaining up now. EEH appears to still fail to recover on some devices, so there is another problem that Gavin is looking into but at least it's no longer crashing the kernel." * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc/pci: Improve device hotplug initialization powerpc/eeh: Add eeh_dev to the cache during boot
2013-06-29ARM: dt: Only print warning, not WARN() on bad cpu map in device treeOlof Johansson
Due to recent changes and expecations of proper cpu bindings, there are now cases for many of the in-tree devicetrees where a WARN() will hit on boot due to badly formatted /cpus nodes. Downgrade this to a pr_warn() to be less alarmist, since it's not a new problem. Tested on Arndale, Cubox, Seaboard and Panda ES. Panda hits the WARN without this, the others do not. Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-30powerpc/pci: Improve device hotplug initializationGuenter Roeck
Commit 37f02195b (powerpc/pci: fix PCI-e devices rescan issue on powerpc platform) fixes a problem with interrupt and DMA initialization on hot plugged devices. With this commit, interrupt and DMA initialization for hot plugged devices is handled in the pci device enable function. This approach has a couple of drawbacks. First, it creates two code paths for device initialization, one for hot plugged devices and another for devices known during the initial PCI scan. Second, the initialization code for hot plugged devices is only called when the device is enabled, ie typically in the probe function. Also, the platform specific setup code is called each time pci_enable_device() is called, not only once during device discovery, meaning it is actually called multiple times, once for devices discovered during the initial scan and again each time a driver is re-loaded. The visible result is that interrupt pins are only assigned to hot plugged devices when the device driver is loaded. Effectively this changes the PCI probe API, since pci_dev->irq and the device's dma configuration will now only be valid after pci_enable() was called at least once. A more subtle change is that platform specific PCI device setup is moved from device discovery into the driver's probe function, more specifically into the pci_enable_device() call. To fix the inconsistencies, add new function pcibios_add_device. Call pcibios_setup_device from pcibios_setup_bus_devices if device setup is not complete, and from pcibios_add_device if bus setup is complete. With this change, device setup code is moved back into device initialization, and called exactly once for both static and hot plugged devices. [ This also fixes a regression introduced by the above patch which causes dev->irq to be overwritten under some cirumstances after MSIs have been enabled for the device which leads to crashes due to the MSI core "hijacking" dev->irq to store the base MSI number and not the LSI. --BenH ] Cc: Yuanquan Chen <Yuanquan.Chen@freescale.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hiroo Matsumoto <matsumoto.hiroo@jp.fujitsu.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-29cgroup: CGRP_ROOT_SUBSYS_BOUND should also be ignored when mounting an ↵Tejun Heo
existing hierarchy 0ce6cba357 ("cgroup: CGRP_ROOT_SUBSYS_BOUND should be ignored when comparing mount options") only updated the remount path but CGRP_ROOT_SUBSYS_BOUND should also be ignored when comparing options while mounting an existing hierarchy. As option mismatch triggers a warning but doesn't fail the mount without sane_behavior, this only triggers a spurious warning message. Fix it by only comparing CGRP_ROOT_OPTION_MASK bits when comparing new and existing root options. Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds
Pull crypto fix from Herbert Xu: "This fixes a crash in the crypto layer exposed by an SCTP test tool" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: algboss - Hold ref count on larval
2013-06-29Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm/qxl fix from Dave Airlie: "Bad me forgot an access check, possible security issue, but since this is the first kernel with it, should be fine to just put it in now" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/qxl: add missing access check for execbuffer ioctl
2013-06-29Fix: kernel/ptrace.c: ptrace_peek_siginfo() missing __put_user() validationMathieu Desnoyers
This __put_user() could be used by unprivileged processes to write into kernel memory. The issue here is that even if copy_siginfo_to_user() fails, the error code is not checked before __put_user() is executed. Luckily, ptrace_peek_siginfo() has been added within the 3.10-rc cycle, so it has not hit a stable release yet. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Roland McGrath <roland@redhat.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Pedro Alves <palves@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-29Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fix from Sage Weil: "This is a recently spotted regression in the snapshot behavior... It turns out several tests weren't being run in the nightlies so this took a while to spot" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: send snapshot context with writes
2013-06-29Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull ubifs fixes from Al Viro: "A couple of ubifs readdir/lseek race fixes. Stable fodder, really nasty..." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: UBIFS: fix a horrid bug UBIFS: prepare to fix a horrid bug
2013-06-29Merge tag 'for-linus-20130628' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-mn10300 Pull two MN10300 fixes from David Howells: "The first fixes a problem with passing arrays rather than pointers to get_user() where __typeof__ then wants to declare and initialise an array variable which gcc doesn't like. The second fixes a problem whereby putting mem=xxx into the kernel command line causes init=xxx to get an incorrect value." * tag 'for-linus-20130628' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-mn10300: mn10300: Use early_param() to parse "mem=" parameter mn10300: Allow to pass array name to get_user()
2013-06-29Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "Correct an ordering issue in the tick broadcast code. I really wish we'd get compensation for pain and suffering for each line of code we write to work around dysfunctional timer hardware." * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick: Fix tick_broadcast_pending_mask not cleared
2013-06-29Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Ingo Molnar: "One more fix for a recently discovered bug" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Disable monitoring on setuid processes for regular users
2013-06-29Merge branch 'devel-stable' into for-nextRussell King
Conflicts: arch/arm/Makefile arch/arm/include/asm/glue-proc.h
2013-06-29Merge branches 'fixes', 'mcpm', 'misc' and 'mmci' into for-nextRussell King
2013-06-29ARM: 7775/1: mm: Remove do_sect_fault from LPAE codeSteven Capper
For LPAE, do_sect_fault used to be invoked as the second level access flag handler. When transparent huge pages were introduced for LPAE, do_page_fault was used instead. Unfortunately, do_sect_fault remains defined but not used for LPAE code resulting in a compile warning. This patch surrounds do_sect_fault with #ifndef CONFIG_ARM_LPAE to fix this warning. Signed-off-by: Steve Capper <steve.capper@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-06-29ARM: 7777/1: Avoid extra calls to the C compilerDouglas Anderson
Starting up the C compiler can be a slow operation on some systems. Though these calls don't individually take a lot of time, they add up. Rearrange the ARM Makefile a bit to avoid extra calls to the compiler when they can be easily avoided. When running with the Chrome OS ARM cross compiler "armv7a-cros-linux-gnueabi-", this shaved .55 seconds (from 5.31 seconds to 4.76 seconds) off an incremental build of the kernel: time make -j32 ARCH=arm CROSS_COMPILE=armv7a-cros-linux-gnueabi- Thanks to Mike Frysinger for the clean trick to make this work. Signed-off-by: Doug Anderson <dianders@chromium.org> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-06-29ARM: 7774/1: Fix dtb dependency to use order-only prerequisitesDouglas Anderson
The %.dtb dependency is specified to depend on the PHONY "scripts". That means that it'll build every time even if the underlying dtb file hasn't been touched. Use an order-only prerequisites to fix this. Also mark "dtbs" as PHONY for correctness. This was broken in (70b0476 ARM: 7513/1: Make sure dtc is built before running it). Reported-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Doug Anderson <dianders@chromium.org> Acked-by: Olof Johansson <olof@lixom.net> Reviewed-by: David Brown <davidb@codeaurora.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-06-29lseek_execute() doesn't need an inode passed to itAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29block_dev: switch to fixed_size_llseek()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29cpqphp_sysfs: switch to fixed_size_llseek()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29tile-srom: switch to fixed_size_llseek()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29proc_powerpc: switch to fixed_size_llseek()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29ubi/cdev: switch to fixed_size_llseek()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29pci/proc: switch to fixed_size_llseek()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29isapnp: switch to fixed_size_llseek()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29lpfc: switch to fixed_size_llseek()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29locks: give the blocked_hash its own spinlockJeff Layton
There's no reason we have to protect the blocked_hash and file_lock_list with the same spinlock. With the tests I have, breaking it in two gives a barely measurable performance benefit, but it seems reasonable to make this locking as granular as possible. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29locks: add a new "lm_owner_key" lock operationJeff Layton
Currently, the hashing that the locking code uses to add these values to the blocked_hash is simply calculated using fl_owner field. That's valid in most cases except for server-side lockd, which validates the owner of a lock based on fl_owner and fl_pid. In the case where you have a small number of NFS clients doing a lot of locking between different processes, you could end up with all the blocked requests sitting in a very small number of hash buckets. Add a new lm_owner_key operation to the lock_manager_operations that will generate an unsigned long to use as the key in the hashtable. That function is only implemented for server-side lockd, and simply XORs the fl_owner and fl_pid. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29locks: turn the blocked_list into a hashtableJeff Layton
Break up the blocked_list into a hashtable, using the fl_owner as a key. This speeds up searching the hash chains, which is especially significant for deadlock detection. Note that the initial implementation assumes that hashing on fl_owner is sufficient. In most cases it should be, with the notable exception being server-side lockd, which compares ownership using a tuple of the nlm_host and the pid sent in the lock request. So, this may degrade to a single hash bucket when you only have a single NFS client. That will be addressed in a later patch. The careful observer may note that this patch leaves the file_lock_list alone. There's much less of a case for turning the file_lock_list into a hashtable. The only user of that list is the code that generates /proc/locks, and it always walks the entire list. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29locks: convert fl_link to a hlist_nodeJeff Layton
Testing has shown that iterating over the blocked_list for deadlock detection turns out to be a bottleneck. In order to alleviate that, begin the process of turning it into a hashtable. We start by turning the fl_link into a hlist_node and the global lists into hlists. A later patch will do the conversion of the blocked_list to a hashtable. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29locks: avoid taking global lock if possible when waking up blocked waitersJeff Layton
Since we always hold the i_lock when inserting a new waiter onto the fl_block list, we can avoid taking the global lock at all if we find that it's empty when we go to wake up blocked waiters. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29locks: protect most of the file_lock handling with i_lockJeff Layton
Having a global lock that protects all of this code is a clear scalability problem. Instead of doing that, move most of the code to be protected by the i_lock instead. The exceptions are the global lists that the ->fl_link sits on, and the ->fl_block list. ->fl_link is what connects these structures to the global lists, so we must ensure that we hold those locks when iterating over or updating these lists. Furthermore, sound deadlock detection requires that we hold the blocked_list state steady while checking for loops. We also must ensure that the search and update to the list are atomic. For the checking and insertion side of the blocked_list, push the acquisition of the global lock into __posix_lock_file and ensure that checking and update of the blocked_list is done without dropping the lock in between. On the removal side, when waking up blocked lock waiters, take the global lock before walking the blocked list and dequeue the waiters from the global list prior to removal from the fl_block list. With this, deadlock detection should be race free while we minimize excessive file_lock_lock thrashing. Finally, in order to avoid a lock inversion problem when handling /proc/locks output we must ensure that manipulations of the fl_block list are also protected by the file_lock_lock. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29locks: encapsulate the fl_link list handlingJeff Layton
Move the fl_link list handling routines into a separate set of helpers. Also ensure that locks and requests are always put on global lists last (after fully initializing them) and are taken off before unintializing them. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29locks: make "added" in __posix_lock_file a boolJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29locks: comment cleanups and clarificationsJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29locks: make generic_add_lease and generic_delete_lease staticJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29cifs: use posix_unblock_lock instead of locks_delete_blockJeff Layton
commit 66189be74 (CIFS: Fix VFS lock usage for oplocked files) exported the locks_delete_block symbol. There's already an exported helper function that provides this capability however, so make cifs use that instead and turn locks_delete_block back into a static function. Note that if fl->fl_next == NULL then this lock has already been through locks_delete_block(), so we should be OK to ignore an ENOENT error here and simply not retry the lock. Cc: Pavel Shilovsky <piastryyy@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29locks: drop the unused filp argument to posix_unblock_lockJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>