summaryrefslogtreecommitdiff
path: root/fs/afs/server_list.c
AgeCommit message (Collapse)Author
2020-06-04afs: Reorganise volume and server trees to be rooted on the cellDavid Howells
Reorganise afs_volume objects such that they're in a tree keyed on volume ID, rooted at on an afs_cell object rather than being in multiple trees, each of which is rooted on an afs_server object. afs_server structs become per-cell and acquire a pointer to the cell. The process of breaking a callback then starts with finding the server by its network address, following that to the cell and then looking up each volume ID in the volume tree. This is simpler than the afs_vol_interest/afs_cb_interest N:M mapping web and allows those structs and the code for maintaining them to be simplified or removed. It does make a couple of things a bit more tricky, though: (1) Operations now start with a volume, not a server, so there can be more than one answer as to whether or not the server we'll end up using supports the FS.InlineBulkStatus RPC. (2) CB RPC operations that specify the server UUID. There's still a tree of servers by UUID on the afs_net struct, but the UUIDs in it aren't guaranteed unique. Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04afs: Retain more of the VLDB record for alias detectionDavid Howells
Save more bits from the volume location database record obtained for a server so that we can use this information in cell alias detection. Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31afs: Split the usage count on struct afs_serverDavid Howells
Split the usage count on the afs_server struct to have an active count that registers who's actually using it separately from the reference count on the object. This allows a future patch to dispatch polling probes without advancing the "unuse" time into the future each time we emit a probe, which would otherwise prevent unused server records from expiring. Included in this: (1) The latter part of afs_destroy_server() in which the RCU destruction of afs_server objects is invoked and the outstanding server count is decremented is split out into __afs_put_server(). (2) afs_put_server() now calls __afs_put_server() rather then setting the management timer. (3) The calls begun by afs_fs_give_up_all_callbacks() and afs_fs_get_capabilities() can now take a ref on the server record, so afs_destroy_server() can just drop its ref and needn't wait for the completion of these calls. They'll put the ref when they're done. (4) Because of (3), afs_fs_probe_done() no longer needs to wake up afs_destroy_server() with server->probe_outstanding. (5) afs_gc_servers can be simplified. It only needs to check if server->active is 0 rather than playing games with the refcount. (6) afs_manage_servers() can propose a server for gc if usage == 0 rather than if ref == 1. The gc is effected by (5). Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31afs: Use the serverUnique field in the UVLDB record to reduce rpc opsDavid Howells
The U-version VLDB volume record retrieved by the VL.GetEntryByNameU rpc op carries a change counter (the serverUnique field) for each fileserver listed in the record as backing that volume. This is incremented whenever the registration details for a fileserver change (such as its address list). Note that the same value will be seen in all UVLDB records that refer to that fileserver. This should be checked before calling the VL server to re-query the address list for a fileserver. If it's the same, there's no point doing the query. Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-10Merge tag 'afs-next-20190628' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull afs updates from David Howells: "A set of minor changes for AFS: - Remove an unnecessary check in afs_unlink() - Add a tracepoint for tracking callback management - Add a tracepoint for afs_server object usage - Use struct_size() - Add mappings for AFS UAE abort codes to Linux error codes, using symbolic names rather than hex numbers in the .c file" * tag 'afs-next-20190628' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Add support for the UAE error table fs/afs: use struct_size() in kzalloc() afs: Trace afs_server usage afs: Add some callback management tracepoints afs: afs_unlink() doesn't need to check dentry->d_inode
2019-06-20afs: Trace afs_server usageDavid Howells
Add a tracepoint (afs_server) to track the afs_server object usage count. Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-10afs: Use struct_size() in kzalloc()Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; void *entry[]; }; instance = kzalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24afs: Probe multiple fileservers simultaneouslyDavid Howells
Send probes to all the unprobed fileservers in a fileserver list on all addresses simultaneously in an attempt to find out the fastest route whilst not getting stuck for 20s on any server or address that we don't get a reply from. This alleviates the problem whereby attempting to access a new server can take a long time because the rotation algorithm ends up rotating through all servers and addresses until it finds one that responds. Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14afs: Fix refcounting in callback registrationDavid Howells
The refcounting on afs_cb_interest struct objects in afs_register_server_cb_interest() is wrong as it uses the server list entry's call back interest pointer without regard for the fact that it might be replaced at any time and the object thrown away. Fix this by: (1) Put a lock on the afs_server_list struct that can be used to mediate access to the callback interest pointers in the servers array. (2) Keep a ref on the callback interest that we get from the entry. (3) Dropping the old reference held by vnode->cb_interest if we replace the pointer. Fixes: c435ee34551e ("afs: Overhaul the callback handling") Signed-off-by: David Howells <dhowells@redhat.com>
2018-02-06afs: Fix server list handlingDavid Howells
Fix server list handling in the following ways: (1) In afs_alloc_volume(), remove duplicate server list build code. This was already done by afs_alloc_server_list() which afs_alloc_volume() previously called. This just results in twice as many VL RPCs. (2) In afs_deliver_vl_get_entry_by_name_u(), use the number of server records indicated by ->nServers in the UVLDB record returned by the VL.GetEntryByNameU RPC call rather than scanning all NMAXNSERVERS slots. Unused slots may contain garbage. (3) In afs_alloc_server_list(), don't stop converting a UVLDB record into a server list just because we can't look up one of the servers. Just skip that server and go on to the next. If we can't look up any of the servers then we'll fail at the end. Without this patch, an attempt to view the umich.edu root cell using something like "ls /afs/umich.edu" on a dynamic root (future patch) mount or an autocell mount will result in ENOMEDIUM. The failure is due to kafs not stopping after nServers'worth of records have been read, but then trying to access a server with a garbage UUID and getting an error, which aborts the server list build. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Reported-by: Jonathan Billings <jsbillings@jsbillings.org> Signed-off-by: David Howells <dhowells@redhat.com> cc: stable@vger.kernel.org
2017-11-17afs: Fix file lockingDavid Howells
Fix the AFS file locking whereby the use of the big kernel lock (which could be slept with) was replaced by a spinlock (which couldn't). The problem is that the AFS code was doing stuff inside the critical section that might call schedule(), so this is a broken transformation. Fix this by the following means: (1) Use a state machine with a proper state that can only be changed under the spinlock rather than using a collection of bit flags. (2) Cache the key used for the lock and the lock type in the afs_vnode struct so that the manager work function doesn't have to refer to a file_lock struct that's been dequeued. This makes signal handling safer. (4) Move the unlock from afs_do_unlk() to afs_fl_release_private() which means that unlock is achieved in other circumstances too. (5) Unlock the file on the server before taking the next conflicting lock. Also change: (1) Check the permits on a file before actually trying the lock. (2) fsync the file before effecting an explicit unlock operation. We don't fsync if the lock is erased otherwise as we might not be in a context where we can actually do that. Further fixes: (1) Fixed-fileserver address rotation is made to work. It's only used by the locking functions, so couldn't be tested before. Fixes: 72f98e72551f ("locks: turn lock_flocks into a spinlock") Signed-off-by: David Howells <dhowells@redhat.com> cc: jlayton@redhat.com
2017-11-13afs: Overhaul volume and server record caching and fileserver rotationDavid Howells
The current code assumes that volumes and servers are per-cell and are never shared, but this is not enforced, and, indeed, public cells do exist that are aliases of each other. Further, an organisation can, say, set up a public cell and a private cell with overlapping, but not identical, sets of servers. The difference is purely in the database attached to the VL servers. The current code will malfunction if it sees a server in two cells as it assumes global address -> server record mappings and that each server is in just one cell. Further, each server may have multiple addresses - and may have addresses of different families (IPv4 and IPv6, say). To this end, the following structural changes are made: (1) Server record management is overhauled: (a) Server records are made independent of cell. The namespace keeps track of them, volume records have lists of them and each vnode has a server on which its callback interest currently resides. (b) The cell record no longer keeps a list of servers known to be in that cell. (c) The server records are now kept in a flat list because there's no single address to sort on. (d) Server records are now keyed by their UUID within the namespace. (e) The addresses for a server are obtained with the VL.GetAddrsU rather than with VL.GetEntryByName, using the server's UUID as a parameter. (f) Cached server records are garbage collected after a period of non-use and are counted out of existence before purging is allowed to complete. This protects the work functions against rmmod. (g) The servers list is now in /proc/fs/afs/servers. (2) Volume record management is overhauled: (a) An RCU-replaceable server list is introduced. This tracks both servers and their coresponding callback interests. (b) The superblock is now keyed on cell record and numeric volume ID. (c) The volume record is now tied to the superblock which mounts it, and is activated when mounted and deactivated when unmounted. This makes it easier to handle the cache cookie without causing a double-use in fscache. (d) The volume record is loaded from the VLDB using VL.GetEntryByNameU to get the server UUID list. (e) The volume name is updated if it is seen to have changed when the volume is updated (the update is keyed on the volume ID). (3) The vlocation record is got rid of and VLDB records are no longer cached. Sufficient information is stored in the volume record, though an update to a volume record is now no longer shared between related volumes (volumes come in bundles of three: R/W, R/O and backup). and the following procedural changes are made: (1) The fileserver cursor introduced previously is now fleshed out and used to iterate over fileservers and their addresses. (2) Volume status is checked during iteration, and the server list is replaced if a change is detected. (3) Server status is checked during iteration, and the address list is replaced if a change is detected. (4) The abort code is saved into the address list cursor and -ECONNABORTED returned in afs_make_call() if a remote abort happened rather than translating the abort into an error message. This allows actions to be taken depending on the abort code more easily. (a) If a VMOVED abort is seen then this is handled by rechecking the volume and restarting the iteration. (b) If a VBUSY, VRESTARTING or VSALVAGING abort is seen then this is handled by sleeping for a short period and retrying and/or trying other servers that might serve that volume. A message is also displayed once until the condition has cleared. (c) If a VOFFLINE abort is seen, then this is handled as VBUSY for the moment. (d) If a VNOVOL abort is seen, the volume is rechecked in the VLDB to see if it has been deleted; if not, the fileserver is probably indicating that the volume couldn't be attached and needs salvaging. (e) If statfs() sees one of these aborts, it does not sleep, but rather returns an error, so as not to block the umount program. (5) The fileserver iteration functions in vnode.c are now merged into their callers and more heavily macroised around the cursor. vnode.c is removed. (6) Operations on a particular vnode are serialised on that vnode because the server will lock that vnode whilst it operates on it, so a second op sent will just have to wait. (7) Fileservers are probed with FS.GetCapabilities before being used. This is where service upgrade will be done. (8) A callback interest on a fileserver is set up before an FS operation is performed and passed through to afs_make_call() so that it can be set on the vnode if the operation returns a callback. The callback interest is passed through to afs_iget() also so that it can be set there too. In general, record updating is done on an as-needed basis when we try to access servers, volumes or vnodes rather than offloading it to work items and special threads. Notes: (1) Pre AFS-3.4 servers are no longer supported, though this can be added back if necessary (AFS-3.4 was released in 1998). (2) VBUSY is retried forever for the moment at intervals of 1s. (3) /proc/fs/afs/<cell>/servers no longer exists. Signed-off-by: David Howells <dhowells@redhat.com>