Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC cleanups from Olof Johansson:
"This branch contains code cleanups, moves and removals for 3.13.
Qualcomm msm targets had a bunch of code removal for legacy non-DT
platforms. Nomadik saw more device tree conversions and cleanup of
old code. Tegra has some code refactoring, etc.
One longish patch series from Sebastian Hasselbarth changes the
init_time hooks and tries to use a generic implementation for most
platforms, since they were all doing more or less the same things.
Finally the "shark" platform is removed in this release. It's been
abandoned for a while and nobody seems to care enough to keep it
around. If someone comes along and wants to resurrect it, the removal
can easily be reverted and code brought back.
Beyond this, mostly a bunch of removals of stale content across the
board, etc"
* tag 'cleanup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (79 commits)
ARM: gemini: convert to GENERIC_CLOCKEVENTS
ARM: EXYNOS: remove CONFIG_MACH_EXYNOS[4, 5]_DT config options
ARM: OMAP3: control: add API for setting IVA bootmode
ARM: OMAP3: CM/control: move CM scratchpad save to CM driver
ARM: OMAP3: McBSP: do not access CM register directly
ARM: OMAP3: clock: add API to enable/disable autoidle for a single clock
ARM: OMAP2: CM/PM: remove direct register accesses outside CM code
MAINTAINERS: Add patterns for DTS files for AT91
ARM: at91: remove init_machine() as default is suitable
ARM: at91/dt: split sama5d3 peripheral definitions
ARM: at91/dt: split sam9x5 peripheral definitions
ARM: Remove temporary sched_clock.h header
ARM: clps711x: Use linux/sched_clock.h
MAINTAINERS: Add DTS files to patterns for Samsung platform
ARM: EXYNOS: remove unnecessary header inclusions from exynos4/5 dt machine file
ARM: tegra: fix ARCH_TEGRA_114_SOC select sort order
clk: nomadik: fix missing __init on nomadik_src_init
ARM: drop explicit selection of HAVE_CLK and CLKDEV_LOOKUP
ARM: S3C64XX: Kill CONFIG_PLAT_S3C64XX
ASoC: samsung: Use CONFIG_ARCH_S3C64XX to check for S3C64XX support
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC low-priority fixes from Olof Johansson:
"A set of fixes for various platforms that weren't considered bad
enough to include in 3.12 (nor -stable). Mostly simple typo fixes,
etc"
* tag 'fixes-nc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: OMAP2+: irq, AM33XX add missing register check
ARM: OMAP2+: wakeupgen: AM43x adaptation
ARM: OMAP1: Fix a bunch of GPIO related section warnings after initdata got corrected
ARM: dts: fix PL330 MDMA1 address in DT for Universal C210 board
ARM: dts: Work around lack of cpufreq regulator lookup for exynos4210-origen and trats boards
ARM: dts: Fix typo earlyprintk in exynos5440-sd5v1 and ssdk5440 boards
ARM: dts: Correct typo in use of samsung,pin-drv for exynos5250
ARM: rockchip: remove obsolete rockchip,config properties
ARM: rockchip: fix wrong use of non-existent CONFIG_LOCAL_TIMERS
ARM: mach-omap1: Fix omap1510_fpga_init_irq() implicit declarations.
ARM: OMAP1: fix incorrect placement of __initdata tag
ARM: OMAP: remove deprecated IRQF_DISABLED
ARM: OMAP2+: throw the die id into the entropy pool
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64
Pull ARM64 update from Catalin Marinas:
"Main features:
- Ticket-based spinlock implementation and lockless lockref support
- Big endian support
- CPU hotplug support, currently for PSCI (Power State Coordination
Interface) capable firmware
- Virtual address space extended to 42-bit in the 64K page
configuration (maximum VA space with 2 levels of page tables)
- Compat (AArch32) kuser helpers updated to ARMv8 (make use of
load-acquire/store-release instructions)
- Code cleanup, defconfig update and minor fixes"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64: (43 commits)
ARM64: /proc/interrupts: display IPIs of online CPUs only
arm64: locks: Remove CONFIG_GENERIC_LOCKBREAK
arm64: KVM: vgic: byteswap GICv2 access on world switch if BE
arm64: KVM: initialize HYP mode following the kernel endianness
arm64: compat: Clear the IT state independent of the 32-bit ARM or Thumb-2 mode
arm64: Use 42-bit address space with 64K pages
arm64: module: ensure instruction is little-endian before manipulation
arm64: defconfig: Enable CONFIG_PREEMPT by default
arm64: fix access to preempt_count from assembly code
arm64: move enabling of GIC before CPUs are set online
arm64: use generic RW_DATA_SECTION macro in linker script
arm64: Slightly improve the warning on CPU0 enable-method
ARM64: simplify cpu_read_bootcpu_ops using OF/DT helper
ARM64: DT: define ARM64 specific arch_match_cpu_phys_id
arm64: allow ioremap_cache() to use existing RAM mappings
arm64: update 32-bit kuser helpers to ARMv8
arm64: perf: fix event number mask
arm64: kconfig: allow CPU_BIG_ENDIAN to be selected
arm64: Fix the endianness of arch_spinlock_t
arm64: big-endian: write CPU holding pen address as LE
...
|
|
Fixes a suspicious rcu derference warning.
Cc: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is to avoid very silly Kconfig dependencies for modules
using this routine.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The max_vfs parameter has a limit of 63 and silently fails (adding 0 vfs) when
it is out of range. This patch adds a warning so that the user knows something
went wrong. Also, this patch moves the warning in ixgbe_enable_sriov() to where
max_vfs is checked, so that even an out of range value will show the deprecated
warning. Previously, an out of range parameter didn't even warn the user to use
the new sysfs interface instead.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch fixes multiple problems in the link modes display in ethtool.
Newer parts have more complicated methods to determine actual link
capabilities. Older parts cannot communicate with their SFP modules.
Finally, all the available defines are not displayed by ethtool. This
updates the link modes to be as accurate as possible depending on what data
is available to the driver at any given time.
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pushing original fragments through causes several problems. For example
for matching, frags may not be matched correctly. Take following
example:
<example>
On HOSTA do:
ip6tables -I INPUT -p icmpv6 -j DROP
ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
and on HOSTB you do:
ping6 HOSTA -s2000 (MTU is 1500)
Incoming echo requests will be filtered out on HOSTA. This issue does
not occur with smaller packets than MTU (where fragmentation does not happen)
</example>
As was discussed previously, the only correct solution seems to be to use
reassembled skb instead of separete frags. Doing this has positive side
effects in reducing sk_buff by one pointer (nfct_reasm) and also the reams
dances in ipvs and conntrack can be removed.
Future plan is to remove net/ipv6/netfilter/nf_conntrack_reasm.c
entirely and use code in net/ipv6/reassembly.c instead.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If reassembled packet would fit into outdev MTU, it is not fragmented
according the original frag size and it is send as single big packet.
The second case is if skb is gso. In that case fragmentation does not happen
according to the original frag size.
This patch fixes these.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
o Changes from v1
Use find_next(_zero)_bit suggested by jg.kim
When f2fs issues discard command, if segment is contiguous,
let's issue more large segment to gather adjacent segments.
** blktrace **
179,1 0 5859 42.619023770 971 C D 131072 + 2097152 [0]
179,1 0 33665 108.840475468 971 C D 2228224 + 2494464 [0]
179,1 0 33671 109.131616427 971 C D 14909440 + 344064 [0]
179,1 0 33677 109.137100677 971 C D 15261696 + 4096 [0]
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw
Pull gfs2 updates from Steven Whitehouse:
"The main feature of interest this time is quota updates. There are
some clean ups and some patches to use the new generic lru list code.
There is still plenty of scope for some further changes in due course -
faster lookups of quota structures is very much on the todo list.
Also, a start has been made towards the more tricky issue of using the
generic lru code with glocks, but that will have to be completed in a
subsequent merge window.
The other, more minor feature, is that there have been a number of
performance patches which relate to block allocation. In particular
they will improve performance when the disk is nearly full"
* tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
GFS2: Use generic list_lru for quota
GFS2: Rename quota qd_lru_lock qd_lock
GFS2: Use reflink for quota data cache
GFS2: Use lockref for glocks
GFS2: Protect quota sync generation
GFS2: Inline qd_trylock into gfs2_quota_unlock
GFS2: Make two similar quota code fragments into a function
GFS2: Remove obsolete quota tunable
GFS2: Move gfs2_icbit_munge into quota.c
GFS2: Speed up starting point selection for block allocation
GFS2: Add allocation parameters structure
GFS2: Clean up reservation removal
GFS2: fix dentry leaks
GFS2: new function gfs2_rbm_incr
GFS2: Introduce rbm field bii
GFS2: Do not reset flags on active reservations
GFS2: introduce bi_blocks for optimization
GFS2: optimize rbm_from_block wrt bi_start
GFS2: d_splice_alias() can't return error
|
|
Pull Request for 3.13 for FCOE tree.
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
If a write block triggers promotion and covers a whole block we can
avoid a copy.
Introduce dm_{hook,unhook}_bio to simplify saving and restoring bio
fields (bi_private is now used by overwrite). Switch writethrough
support over to using these helpers too.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Previously these promotions only got priority if there were unused cache
blocks. Now we give them priority if there are any clean blocks in the
cache.
The fio_soak_test in the device-mapper-test-suite now gives uniform
performance across subvolumes (~16 seconds).
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
There are now two multiqueues for in cache blocks. A clean one and a
dirty one.
writeback_work comes from the dirty one. Demotions come from the clean
one.
There are two benefits:
- Performance improvement, since demoting a clean block is a noop.
- The cache cleans itself when io load is light.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Check commit_requested flag _before_ calling
dm_cache_changed_this_transaction() superfluously.
Also, be sure to set last_commit_jiffies _after_ dm_cache_commit()
completes.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Don't waste time spotting blocks that have been allocated and then freed
in the same transaction.
The extra lookup is expensive, and I don't think it really gives us much.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
The option DM_LOG_USERSPACE is sub-option of DM_MIRROR, so place it
right after DM_MIRROR. Doing so fixes various other Device mapper
targets/features to be properly nested under "Device mapper support".
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
This patch allows the removal of an open device to be deferred until
it is closed. (Previously such a removal attempt would fail.)
The deferred remove functionality is enabled by setting the flag
DM_DEFERRED_REMOVE in the ioctl structure on DM_DEV_REMOVE or
DM_REMOVE_ALL ioctl.
On return from DM_DEV_REMOVE, the flag DM_DEFERRED_REMOVE indicates if
the device was removed immediately or flagged to be removed on close -
if the flag is clear, the device was removed.
On return from DM_DEV_STATUS and other ioctls, the flag
DM_DEFERRED_REMOVE is set if the device is scheduled to be removed on
closure.
A device that is scheduled to be deleted can be revived using the
message "@cancel_deferred_remove". This message clears the
DMF_DEFERRED_REMOVE flag so that the device won't be deleted on close.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
If preresume fails it is worth logging an error given that a device is
left suspended due to the failure.
This change was motivated by local preresume error logging that was
added to the cache target ("preresume failed"). Elevating this
target-agnostic context for the where the target-specific error occurred
relative to the DM core's callouts makes sense.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
|
|
dm-crypt can already activate TCRYPT (TrueCrypt compatible) containers
in LRW or XTS block encryption mode.
TCRYPT containers prior to version 4.1 use CBC mode with some additional
tweaks, this patch adds support for these containers.
This new mode is implemented using special IV generator named TCW
(TrueCrypt IV with whitening). TCW IV only supports containers that are
encrypted with one cipher (Tested with AES, Twofish, Serpent, CAST5 and
TripleDES).
While this mode is legacy and is known to be vulnerable to some
watermarking attacks (e.g. revealing of hidden disk existence) it can
still be useful to activate old containers without using 3rd party
software or for independent forensic analysis of such containers.
(Both the userspace and kernel code is an independent implementation
based on the format documentation and it completely avoids use of
original source code.)
The TCW IV generator uses two additional keys: Kw (whitening seed, size
is always 16 bytes - TCW_WHITENING_SIZE) and Kiv (IV seed, size is
always the IV size of the selected cipher). These keys are concatenated
at the end of the main encryption key provided in mapping table.
While whitening is completely independent from IV, it is implemented
inside IV generator for simplification.
The whitening value is always 16 bytes long and is calculated per sector
from provided Kw as initial seed, xored with sector number and mixed
with CRC32 algorithm. Resulting value is xored with ciphertext sector
content.
IV is calculated from the provided Kiv as initial IV seed and xored with
sector number.
Detailed calculation can be found in the Truecrypt documentation for
version < 4.1 and will also be described on dm-crypt site, see:
http://code.google.com/p/cryptsetup/wiki/DMCrypt
The experimental support for activation of these containers is already
present in git devel brach of cryptsetup.
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Some encryption modes use extra keys (e.g. loopAES has IV seed) which
are not used in block cipher initialization but are part of key string
in table constructor.
This patch adds an additional field which describes the length of the
extra key(s) and substracts it before real key encryption setting.
The key_size always includes the size, in bytes, of the key provided
in mapping table.
The key_parts describes how many parts (usually keys) are contained in
the whole key buffer. And key_extra_size contains size in bytes of
additional keys part (this number of bytes must be subtracted because it
is processed by the IV generator).
| K1 | K2 | .... | K64 | Kiv |
|----------- key_size ----------------- |
| |-key_extra_size-|
| [64 keys] | [1 key] | => key_parts = 65
Example where key string contains main key K, whitening key
Kw and IV seed Kiv:
| K | Kiv | Kw |
|--------------- key_size --------------|
| |-----key_extra_size------|
| [1 key] | [1 key] | [1 key] | => key_parts = 3
Because key_extra_size is calculated during IV mode setting, key
initialization is moved after this step.
For now, this change has no effect to supported modes (thanks to ilog2
rounding) but it is required by the following patch.
Also, fix a sparse warning in crypt_iv_lmk_one().
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
A migration failure should be logged (albeit limited).
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Fix a few cell_defer() calls that weren't passing a bool.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Return -EINVAL when the specified cache policy is unknown rather than
returning -ENOMEM.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Rename takeout_queue to concat_queue.
Fix a harmless bug in mq policies pop() function. Currently pop()
always succeeds, with up coming changes this wont be the case.
Fix typo in comment above pre_cache_to_cache prototype.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
No need to return from a void function.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Make the quiescing flag an atomic_t and stop protecting it with a spin
lock.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
for a shutdown
The code that was trying to do this was inadequate. The postsuspend
method (in ioctl context), needs to wait for the worker thread to
acknowledge the request to quiesce. Otherwise the migration count may
drop to zero temporarily before the worker thread realises we're
quiescing. In this case the target will be taken down, but the worker
thread may have issued a new migration, which will cause an oops when
it completes.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.9+
|
|
Previously only origin bios could trigger ticks, which meant if all
the io was destined for the cache no ticks were generated. If no ticks
are generated then multiple hits, and movements in general, are
attributed to the same tick.
Only a stop gap fix, we need a better solution.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
It is safe to use a mutex in mq_residency() at this point since it is
only called from ioctl context. But future-proof mq_residency() by
using might_sleep() to catch new contexts that cannot sleep.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Lennart says: "I haven't been able to spend time on mv643xx_eth for a
while now, so if you want to take over maintainership, I'd be fine with
that."
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Acked-by: Lennert Buytenhek <buytenh@wantstofly.org>
Acked-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With psched_ratecfg_precompute(), tbf can deal with 64bit rates.
Add two new attributes so that tc can use them to break the 32bit
limit.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Suggested-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
1. copy_insn() doesn't look very nice, all calculations are
confusing and it is not immediately clear why do we read
the 2nd page first.
2. The usage of inode->i_size is wrong on 32-bit machines.
3. "Instruction at end of binary" logic is simply wrong, it
doesn't handle the case when uprobe->offset > inode->i_size.
In this case "bytes" overflows, and __copy_insn() writes to
the memory outside of uprobe->arch.insn.
Yes, uprobe_register() checks i_size_read(), but this file
can be truncated after that. All i_size checks are racy, we
do this only to catch the obvious mistakes.
Change copy_insn() to call __copy_insn() in a loop, simplify
and fix the bytes/nbytes calculations.
Note: we do not care if we read extra bytes after inode->i_size
if we got the valid page. This is fine because the task gets the
same page after page-fault, and arch_uprobe_analyze_insn() can't
know how many bytes were actually read anyway.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
|
|
Commit aa59c53fd459 "uprobes: Change uprobe_copy_process() to dup
xol_area" has a stupid typo, we need to setup t->utask->vaddr but
the code wrongly uses current->utask.
Even with this bug dup_xol_work() works "in practice", but only
because get_unmapped_area(NULL, TASK_SIZE - PAGE_SIZE) likely
returns the same address every time.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
|
|
NFSv4 uses leases to guarantee that clients can cache metadata as well
as data.
Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: David Howells <dhowells@redhat.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Cc: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
We'll need the same logic for rename and link.
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
We need to break delegations on any operation that changes the set of
links pointing to an inode. Start with unlink.
Such operations also hold the i_mutex on a parent directory. Breaking a
delegation may require waiting for a timeout (by default 90 seconds) in
the case of a unresponsive NFS client. To avoid blocking all directory
operations, we therefore drop locks before waiting for the delegation.
The logic then looks like:
acquire locks
...
test for delegation; if found:
take reference on inode
release locks
wait for delegation break
drop reference on inode
retry
It is possible this could never terminate. (Even if we take precautions
to prevent another delegation being acquired on the same inode, we could
get a different inode on each retry.) But this seems very unlikely.
The initial test for a delegation happens after the lock on the target
inode is acquired, but the directory inode may have been acquired
further up the call stack. We therefore add a "struct inode **"
argument to any intervening functions, which we use to pass the inode
back up to the caller in the case it needs a delegation synchronously
broken.
Cc: David Howells <dhowells@redhat.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
We'll be using dentry->d_inode in one more place.
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Implement NFSv4 delegations at the vfs level using the new FL_DELEG lock
type.
Note nfsd is the only delegation user and is only using read
delegations. Warn on any attempt to set a write delegation for now.
We'll come back to that case later.
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
For now FL_DELEG is just a synonym for FL_LEASE. So this patch doesn't
change behavior.
Next we'll modify break_lease to treat FL_DELEG leases differently, to
account for the fact that NFSv4 delegations should be broken in more
situations than Windows oplocks.
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
A read delegation is used by NFSv4 as a guarantee that a client can
perform local read opens without informing the server.
The open operation takes the last component of the pathname as an
argument, thus is also a lookup operation, and giving the client the
above guarantee means informing the client before we allow anything that
would change the set of names pointing to the inode.
Therefore, we need to break delegations on rename, link, and unlink.
We also need to prevent new delegations from being acquired while one of
these operations is in progress.
We could add some completely new locking for that purpose, but it's
simpler to use the i_mutex, since that's already taken by all the
operations we care about.
The single exception is rename. So, modify rename to take the i_mutex
on the file that is being renamed.
Also fix up lockdep and Documentation/filesystems/directory-locking to
reflect the change.
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
I_MUTEX_QUOTA is now just being used whenever we want to lock two
non-directories. So the name isn't right. I_MUTEX_NONDIR2 isn't
especially elegant but it's the best I could think of.
Also fix some outdated documentation.
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Reserve I_MUTEX_PARENT and I_MUTEX_CHILD for locking of actual
directories.
(Also I_MUTEX_QUOTA isn't really a meaningful name for this locking
class any more; fixed in a later patch.)
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
We want to do this elsewhere as well.
Also catch any attempts to use it for directories (where this ordering
would conflict with ancestor-first directory ordering in lock_rename).
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Dave Chinner <david@fromorbit.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|