diff options
Diffstat (limited to 'Documentation/filesystems/propagate_umount.txt')
-rw-r--r-- | Documentation/filesystems/propagate_umount.txt | 484 |
1 files changed, 484 insertions, 0 deletions
diff --git a/Documentation/filesystems/propagate_umount.txt b/Documentation/filesystems/propagate_umount.txt new file mode 100644 index 000000000000..c90349e5b889 --- /dev/null +++ b/Documentation/filesystems/propagate_umount.txt @@ -0,0 +1,484 @@ + Notes on propagate_umount() + +Umount propagation starts with a set of mounts we are already going to +take out. Ideally, we would like to add all downstream cognates to +that set - anything with the same mountpoint as one of the removed +mounts and with parent that would receive events from the parent of that +mount. However, there are some constraints the resulting set must +satisfy. + +It is convenient to define several properties of sets of mounts: + +1) A set S of mounts is non-shifting if for any mount X belonging +to S all subtrees mounted strictly inside of X (i.e. not overmounting +the root of X) contain only elements of S. + +2) A set S is non-revealing if all locked mounts that belong to S have +parents that also belong to S. + +3) A set S is closed if it contains all children of its elements. + +The set of mounts taken out by umount(2) must be non-shifting and +non-revealing; the first constraint is what allows to reparent +any remaining mounts and the second is what prevents the exposure +of any concealed mountpoints. + +propagate_umount() takes the original set as an argument and tries to +extend that set. The original set is a full subtree and its root is +unlocked; what matters is that it's closed and non-revealing. +Resulting set may not be closed; there might still be mounts outside +of that set, but only on top of stacks of root-overmounting elements +of set. They can be reparented to the place where the bottom of +stack is attached to a mount that will survive. NOTE: doing that +will violate a constraint on having no more than one mount with +the same parent/mountpoint pair; however, the caller (umount_tree()) +will immediately remedy that - it may keep unmounted element attached +to parent, but only if the parent itself is unmounted. Since all +conflicts created by reparenting have common parent *not* in the +set and one side of the conflict (bottom of the stack of overmounts) +is in the set, it will be resolved. However, we rely upon umount_tree() +doing that pretty much immediately after the call of propagate_umount(). + +Algorithm is based on two statements: + 1) for any set S, there is a maximal non-shifting subset of S +and it can be calculated in O(#S) time. + 2) for any non-shifting set S, there is a maximal non-revealing +subset of S. That subset is also non-shifting and it can be calculated +in O(#S) time. + + Finding candidates. + +We are given a closed set U and we want to find all mounts that have +the same mountpoint as some mount m in U *and* whose parent receives +propagation from the parent of the same mount m. Naive implementation +would be + S = {} + for each m in U + add m to S + p = parent(m) + for each q in Propagation(p) - {p} + child = look_up(q, mountpoint(m)) + if child + add child to S +but that can lead to excessive work - there might be propagation among the +subtrees of U, in which case we'd end up examining the same candidates +many times. Since propagation is transitive, the same will happen to +everything downstream of that candidate and it's not hard to construct +cases where the approach above leads to the time quadratic by the actual +number of candidates. + +Note that if we run into a candidate we'd already seen, it must've been +added on an earlier iteration of the outer loop - all additions made +during one iteration of the outer loop have different parents. So +if we find a child already added to the set, we know that everything +in Propagation(parent(child)) with the same mountpoint has been already +added. + S = {} + for each m in U + if m in S + continue + add m to S + p = parent(m) + q = propagation_next(p, p) + while q + child = look_up(q, mountpoint(m)) + if child + if child in S + q = skip_them(q, p) + continue; + add child to S + q = propagation_next(q, p) +where +skip_them(q, p) + keep walking Propagation(p) from q until we find something + not in Propagation(q) + +would get rid of that problem, but we need a sane implementation of +skip_them(). That's not hard to do - split propagation_next() into +"down into mnt_slave_list" and "forward-and-up" parts, with the +skip_them() being "repeat the forward-and-up part until we get NULL +or something that isn't a peer of the one we are skipping". + +Note that there can be no absolute roots among the extra candidates - +they all come from mount lookups. Absolute root among the original +set is _currently_ impossible, but it might be worth protecting +against. + + Maximal non-shifting subsets. + +Let's call a mount m in a set S forbidden in that set if there is a +subtree mounted strictly inside m and containing mounts that do not +belong to S. + +The set is non-shifting when none of its elements are forbidden in it. + +If mount m is forbidden in a set S, it is forbidden in any subset S' it +belongs to. In other words, it can't belong to any of the non-shifting +subsets of S. If we had a way to find a forbidden mount or show that +there's none, we could use it to find the maximal non-shifting subset +simply by finding and removing them until none remain. + +Suppose mount m is forbidden in S; then any mounts forbidden in S - {m} +must have been forbidden in S itself. Indeed, since m has descendents +that do not belong to S, any subtree that fits into S will fit into +S - {m} as well. + +So in principle we could go through elements of S, checking if they +are forbidden in S and removing the ones that are. Removals will +not invalidate the checks done for earlier mounts - if they were not +forbidden at the time we checked, they won't become forbidden later. +It's too costly to be practical, but there is a similar approach that +is linear by size of S. + +Let's say that mount x in a set S is forbidden by mount y, if + * both x and y belong to S. + * there is a chain of mounts starting at x and leaving S + immediately after passing through y, with the first + mountpoint strictly inside x. +Note 1: x may be equal to y - that's the case when something not +belonging to S is mounted strictly inside x. +Note 2: if y does not belong to S, it can't forbid anything in S. +Note 3: if y has no children outside of S, it can't forbid anything in S. + +It's easy to show that mount x is forbidden in S if and only if x is +forbidden in S by some mount y. And it's easy to find all mounts in S +forbidden by a given mount. + +Consider the following operation: + Trim(S, m) = S - {x : x is forbidden by m in S} + +Note that if m does not belong to S or has no children outside of S we +are guaranteed that Trim(S, m) is equal to S. + +The following is true: if x is forbidden by y in Trim(S, m), it was +already forbidden by y in S. + +Proof: Suppose x is forbidden by y in Trim(S, m). Then there is a +chain of mounts (x_0 = x, ..., x_k = y, x_{k+1} = r), such that x_{k+1} +is the first element that doesn't belong to Trim(S, m) and the +mountpoint of x_1 is strictly inside x. If mount r belongs to S, it must +have been removed by Trim(S, m), i.e. it was forbidden in S by m. +Then there was a mount chain from r to some child of m that stayed in +S all the way until m, but that's impossible since x belongs to Trim(S, m) +and prepending (x_0, ..., x_k) to that chain demonstrates that x is also +forbidden in S by m, and thus can't belong to Trim(S, m). +Therefore r can not belong to S and our chain demonstrates that +x is forbidden by y in S. QED. + +Corollary: no mount is forbidden by m in Trim(S, m). Indeed, any +such mount would have been forbidden by m in S and thus would have been +in the part of S removed in Trim(S, m). + +Corollary: no mount is forbidden by m in Trim(Trim(S, m), n). Indeed, +any such would have to have been forbidden by m in Trim(S, m), which +is impossible. + +Corollary: after + S = Trim(S, x_1) + S = Trim(S, x_2) + ... + S = Trim(S, x_k) +no mount remaining in S will be forbidden by either of x_1,...,x_k. + +The following will reduce S to its maximal non-shifting subset: + visited = {} + while S contains elements not belonging to visited + let m be an arbitrary such element of S + S = Trim(S, m) + add m to visited + +S never grows, so the number of elements of S not belonging to visited +decreases at least by one on each iteration. When the loop terminates, +all mounts remaining in S belong to visited. It's easy to see that at +the beginning of each iteration no mount remaining in S will be forbidden +by any element of visited. In other words, no mount remaining in S will +be forbidden, i.e. final value of S will be non-shifting. It will be +the maximal non-shifting subset, since we were removing only forbidden +elements. + + There are two difficulties in implementing the above in linear +time, both due to the fact that Trim() might need to remove more than one +element. Naive implementation of Trim() is vulnerable to running into a +long chain of mounts, each mounted on top of parent's root. Nothing in +that chain is forbidden, so nothing gets removed from it. We need to +recognize such chains and avoid walking them again on subsequent calls of +Trim(), otherwise we will end up with worst-case time being quadratic by +the number of elements in S. Another difficulty is in implementing the +outer loop - we need to iterate through all elements of a shrinking set. +That would be trivial if we never removed more than one element at a time +(linked list, with list_for_each_entry_safe for iterator), but we may +need to remove more than one entry, possibly including the ones we have +already visited. + + Let's start with naive algorithm for Trim(): + +Trim_one(m) + found = false + for each n in children(m) + if n not in S + found = true + if (mountpoint(n) != root(m)) + remove m from S + break + if found + Trim_ancestors(m) + +Trim_ancestors(m) + for (; parent(m) in S; m = parent(m)) { + if (mountpoint(m) != root(parent(m))) + remove parent(m) from S + } + +If m belongs to S, Trim_one(m) will replace S with Trim(S, m). +Proof: + Consider the chains excluding elements from Trim(S, m). The last +two elements in such chain are m and some child of m that does not belong +to S. If m has no such children, Trim(S, m) is equal to S. + m itself is removed if and only if the chain has exactly two +elements, i.e. when the last element does not overmount the root of m. +In other words, that happens when m has a child not in S that does not +overmount the root of m. + All other elements to remove will be ancestors of m, such that +the entire descent chain from them to m is contained in S. Let +(x_0, x_1, ..., x_k = m) be the longest such chain. x_i needs to be +removed if and only if x_{i+1} does not overmount its root. It's easy +to see that Trim_ancestors(m) will iterate through that chain from +x_k to x_1 and that it will remove exactly the elements that need to be +removed. + + Note that if the loop in Trim_ancestors() walks into an already +visited element, we are guaranteed that remaining iterations will see +only elements that had already been visited and remove none of them. +That's the weakness that makes it vulnerable to long chains of full +overmounts. + + It's easy to deal with, if we can afford setting marks on +elements of S; we would mark all elements already visited by +Trim_ancestors() and have it bail out as soon as it sees an already +marked element. + + The problems with iterating through the set can be dealt with in +several ways, depending upon the representation we choose for our set. +One useful observation is that we are given a closed subset in S - the +original set passed to propagate_umount(). Its elements can neither +forbid anything nor be forbidden by anything - all their descendents +belong to S, so they can not occur anywhere in any excluding chain. +In other words, the elements of that subset will remain in S until +the end and Trim_one(S, m) is a no-op for all m from that subset. + + That suggests keeping S as a disjoint union of a closed set U +('will be unmounted, no matter what') and the set of all elements of +S that do not belong to U. That set ('candidates') is all we need +to iterate through. Let's represent it as a subset in a cyclic list, +consisting of all list elements that are marked as candidates (initially - +all of them). Then we could have Trim_ancestors() only remove the mark, +leaving the elements on the list. Then Trim_one() would never remove +anything other than its argument from the containing list, allowing to +use list_for_each_entry_safe() as iterator. + + Assuming that representation we get the following: + + list_for_each_entry_safe(m, ..., Candidates, ...) + Trim_one(m) +where +Trim_one(m) + if (m is not marked as a candidate) + strip the "seen by Trim_ancestors" mark from m + remove m from the Candidates list + return + + remove_this = false + found = false + for each n in children(m) + if n not in S + found = true + if (mountpoint(n) != root(m)) + remove_this = true + break + if found + Trim_ancestors(m) + if remove_this + strip the "seen by Trim_ancestors" mark from m + strip the "candidate" mark from m + remove m from the Candidate list + +Trim_ancestors(m) + for (p = parent(m); p is marked as candidate ; m = p, p = parent(p)) { + if m is marked as seen by Trim_ancestors + return + mark m as seen by Trim_ancestors + if (mountpoint(m) != root(p)) + strip the "candidate" mark from p + } + + Terminating condition in the loop in Trim_ancestors() is correct, +since that that loop will never run into p belonging to U - p is always +an ancestor of argument of Trim_one() and since U is closed, the argument +of Trim_one() would also have to belong to U. But Trim_one() is never +called for elements of U. In other words, p belongs to S if and only +if it belongs to candidates. + + Time complexity: +* we get no more than O(#S) calls of Trim_one() +* the loop over children in Trim_one() never looks at the same child +twice through all the calls. +* iterations of that loop for children in S are no more than O(#S) +in the worst case +* at most two children that are not elements of S are considered per +call of Trim_one(). +* the loop in Trim_ancestors() sets its mark once per iteration and +no element of S has is set more than once. + + In the end we may have some elements excluded from S by +Trim_ancestors() still stuck on the list. We could do a separate +loop removing them from the list (also no worse than O(#S) time), +but it's easier to leave that until the next phase - there we will +iterate through the candidates anyway. + + The caller has already removed all elements of U from their parents' +lists of children, which means that checking if child belongs to S is +equivalent to checking if it's marked as a candidate; we'll never see +the elements of U in the loop over children in Trim_one(). + + What's more, if we see that children(m) is empty and m is not +locked, we can immediately move m into the committed subset (remove +from the parent's list of children, etc.). That's one fewer mount we'll +have to look into when we check the list of children of its parent *and* +when we get to building the non-revealing subset. + + Maximal non-revealing subsets + +If S is not a non-revealing subset, there is a locked element x in S +such that parent of x is not in S. + +Obviously, no non-revealing subset of S may contain x. Removing such +elements one by one will obviously end with the maximal non-revealing +subset (possibly empty one). Note that removal of an element will +require removal of all its locked children, etc. + +If the set had been non-shifting, it will remain non-shifting after +such removals. +Proof: suppose S was non-shifting, x is a locked element of S, parent of x +is not in S and S - {x} is not non-shifting. Then there is an element m +in S - {x} and a subtree mounted strictly inside m, such that m contains +an element not in in S - {x}. Since S is non-shifting, everything in +that subtree must belong to S. But that means that this subtree must +contain x somewhere *and* that parent of x either belongs that subtree +or is equal to m. Either way it must belong to S. Contradiction. + +// same representation as for finding maximal non-shifting subsets: +// S is a disjoint union of a non-revealing set U (the ones we are committed +// to unmount) and a set of candidates, represented as a subset of list +// elements that have "is a candidate" mark on them. +// Elements of U are removed from their parents' lists of children. +// In the end candidates becomes empty and maximal non-revealing non-shifting +// subset of S is now in U + while (Candidates list is non-empty) + handle_locked(first(Candidates)) + +handle_locked(m) + if m is not marked as a candidate + strip the "seen by Trim_ancestors" mark from m + remove m from the list + return + cutoff = m + for (p = m; p in candidates; p = parent(p)) { + strip the "seen by Trim_ancestors" mark from p + strip the "candidate" mark from p + remove p from the Candidates list + if (!locked(p)) + cutoff = parent(p) + } + if p in U + cutoff = p + while m != cutoff + remove m from children(parent(m)) + add m to U + m = parent(m) + +Let (x_0, ..., x_n = m) be the maximal chain of descent of m within S. +* If it contains some elements of U, let x_k be the last one of those. +Then union of U with {x_{k+1}, ..., x_n} is obviously non-revealing. +* otherwise if all its elements are locked, then none of {x_0, ..., x_n} +may be elements of a non-revealing subset of S. +* otherwise let x_k be the first unlocked element of the chain. Then none +of {x_0, ..., x_{k-1}} may be an element of a non-revealing subset of +S and union of U and {x_k, ..., x_n} is non-revealing. + +handle_locked(m) finds which of these cases applies and adjusts Candidates +and U accordingly. U remains non-revealing, union of Candidates and +U still contains any non-revealing subset of S and after the call of +handle_locked(m) m is guaranteed to be not in Candidates list. So having +it called for each element of S would suffice to empty Candidates, +leaving U the maximal non-revealing subset of S. + +However, handle_locked(m) is a no-op when m belongs to U, so it's enough +to have it called for elements of Candidates list until none remain. + +Time complexity: number of calls of handle_locked() is limited by +#Candidates, each iteration of the first loop in handle_locked() removes +an element from the list, so their total number of executions is also +limited by #Candidates; number of iterations in the second loop is no +greater than the number of iterations of the first loop. + + + Reparenting + +After we'd calculated the final set, we still need to deal with +reparenting - if an element of the final set has a child not in it, +we need to reparent such child. + +Such children can only be root-overmounting (otherwise the set wouldn't +be non-shifting) and their parents can not belong to the original set, +since the original is guaranteed to be closed. + + + Putting all of that together + +The plan is to + * find all candidates + * trim down to maximal non-shifting subset + * trim down to maximal non-revealing subset + * reparent anything that needs to be reparented + * return the resulting set to the caller + +For the 2nd and 3rd steps we want to separate the set into growing +non-revealing subset, initially containing the original set ("U" in +terms of the pseudocode above) and everything we are still not sure about +("candidates"). It means that for the output of the 1st step we'd like +the extra candidates separated from the stuff already in the original set. +For the 4th step we would like the additions to U separate from the +original set. + +So let's go for + * original set ("set"). Linkage via mnt_list + * undecided candidates ("candidates"). Subset of a list, +consisting of all its elements marked with a new flag (T_UMOUNT_CANDIDATE). +Initially all elements of the list will be marked that way; in the +end the list will become empty and no mounts will remain marked with +that flag. + * Reuse T_MARKED for "has been already seen by trim_ancestors()". + * anything in U that hadn't been in the original set - elements of +candidates will gradually be either discarded or moved there. In other +words, it's the candidates we have already decided to unmount. Its role +is reasonably close to the old "to_umount", so let's use that name. +Linkage via mnt_list. + +For gather_candidates() we'll need to maintain both candidates (S - +set) and intersection of S with set. Use T_UMOUNT_CANDIDATE for +all elements we encounter, putting the ones not already in the original +set into the list of candidates. When we are done, strip that flag from +all elements of the original set. That gives a cheap way to check +if element belongs to S (in gather_candidates) and to candidates +itself (at later stages). Call that predicate is_candidate(); it would +be m->mnt_t_flags & T_UMOUNT_CANDIDATE. + +All elements of the original set are marked with MNT_UMOUNT and we'll +need the same for elements added when joining the contents of to_umount +to set in the end. Let's set MNT_UMOUNT at the time we add an element +to to_umount; that's close to what the old 'umount_one' is doing, so +let's keep that name. It also gives us another predicate we need - +"belongs to union of set and to_umount"; will_be_unmounted() for now. + +Removals from the candidates list should strip both T_MARKED and +T_UMOUNT_CANDIDATE; call it remove_from_candidates_list(). |