From 054aa8d439b9185d4f5eb9a90282d1ce74772969 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Wed, 1 Dec 2021 10:06:14 -0800 Subject: fget: check that the fd still exists after getting a ref to it Jann Horn points out that there is another possible race wrt Unix domain socket garbage collection, somewhat reminiscent of the one fixed in commit cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK"). See the extended comment about the garbage collection requirements added to unix_peek_fds() by that commit for details. The race comes from how we can locklessly look up a file descriptor just as it is in the process of being closed, and with the right artificial timing (Jann added a few strategic 'mdelay(500)' calls to do that), the Unix domain socket garbage collector could see the reference count decrement of the close() happen before fget() took its reference to the file and the file was attached onto a new file descriptor. This is all (intentionally) correct on the 'struct file *' side, with RCU lookups and lockless reference counting very much part of the design. Getting that reference count out of order isn't a problem per se. But the garbage collector can get confused by seeing this situation of having seen a file not having any remaining external references and then seeing it being attached to an fd. In commit cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK") the fix was to serialize the file descriptor install with the garbage collector by taking and releasing the unix_gc_lock. That's not really an option here, but since this all happens when we are in the process of looking up a file descriptor, we can instead simply just re-check that the file hasn't been closed in the meantime, and just re-do the lookup if we raced with a concurrent close() of the same file descriptor. Reported-and-tested-by: Jann Horn Acked-by: Miklos Szeredi Signed-off-by: Linus Torvalds --- fs/file.c | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'fs/file.c') diff --git a/fs/file.c b/fs/file.c index 8627dacfc424..ad4a8bf3cf10 100644 --- a/fs/file.c +++ b/fs/file.c @@ -858,6 +858,10 @@ loop: file = NULL; else if (!get_file_rcu_many(file, refs)) goto loop; + else if (files_lookup_fd_raw(files, fd) != file) { + fput_many(file, refs); + goto loop; + } } rcu_read_unlock(); -- cgit From e386dfc56f837da66d00a078e5314bc8382fab83 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Fri, 10 Dec 2021 14:00:15 -0800 Subject: fget: clarify and improve __fget_files() implementation Commit 054aa8d439b9 ("fget: check that the fd still exists after getting a ref to it") fixed a race with getting a reference to a file just as it was being closed. It was a fairly minimal patch, and I didn't think re-checking the file pointer lookup would be a measurable overhead, since it was all right there and cached. But I was wrong, as pointed out by the kernel test robot. The 'poll2' case of the will-it-scale.per_thread_ops benchmark regressed quite noticeably. Admittedly it seems to be a very artificial test: doing "poll()" system calls on regular files in a very tight loop in multiple threads. That means that basically all the time is spent just looking up file descriptors without ever doing anything useful with them (not that doing 'poll()' on a regular file is useful to begin with). And as a result it shows the extra "re-check fd" cost as a sore thumb. Happily, the regression is fixable by just writing the code to loook up the fd to be better and clearer. There's still a cost to verify the file pointer, but now it's basically in the noise even for that benchmark that does nothing else - and the code is more understandable and has better comments too. [ Side note: this patch is also a classic case of one that looks very messy with the default greedy Myers diff - it's much more legible with either the patience of histogram diff algorithm ] Link: https://lore.kernel.org/lkml/20211210053743.GA36420@xsang-OptiPlex-9020/ Link: https://lore.kernel.org/lkml/20211213083154.GA20853@linux.intel.com/ Reported-by: kernel test robot Tested-by: Carel Si Cc: Jann Horn Cc: Miklos Szeredi Signed-off-by: Linus Torvalds --- fs/file.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 56 insertions(+), 16 deletions(-) (limited to 'fs/file.c') diff --git a/fs/file.c b/fs/file.c index ad4a8bf3cf10..97d212a9b814 100644 --- a/fs/file.c +++ b/fs/file.c @@ -841,28 +841,68 @@ void do_close_on_exec(struct files_struct *files) spin_unlock(&files->file_lock); } -static struct file *__fget_files(struct files_struct *files, unsigned int fd, - fmode_t mask, unsigned int refs) +static inline struct file *__fget_files_rcu(struct files_struct *files, + unsigned int fd, fmode_t mask, unsigned int refs) { - struct file *file; + for (;;) { + struct file *file; + struct fdtable *fdt = rcu_dereference_raw(files->fdt); + struct file __rcu **fdentry; - rcu_read_lock(); -loop: - file = files_lookup_fd_rcu(files, fd); - if (file) { - /* File object ref couldn't be taken. - * dup2() atomicity guarantee is the reason - * we loop to catch the new file (or NULL pointer) + if (unlikely(fd >= fdt->max_fds)) + return NULL; + + fdentry = fdt->fd + array_index_nospec(fd, fdt->max_fds); + file = rcu_dereference_raw(*fdentry); + if (unlikely(!file)) + return NULL; + + if (unlikely(file->f_mode & mask)) + return NULL; + + /* + * Ok, we have a file pointer. However, because we do + * this all locklessly under RCU, we may be racing with + * that file being closed. + * + * Such a race can take two forms: + * + * (a) the file ref already went down to zero, + * and get_file_rcu_many() fails. Just try + * again: */ - if (file->f_mode & mask) - file = NULL; - else if (!get_file_rcu_many(file, refs)) - goto loop; - else if (files_lookup_fd_raw(files, fd) != file) { + if (unlikely(!get_file_rcu_many(file, refs))) + continue; + + /* + * (b) the file table entry has changed under us. + * Note that we don't need to re-check the 'fdt->fd' + * pointer having changed, because it always goes + * hand-in-hand with 'fdt'. + * + * If so, we need to put our refs and try again. + */ + if (unlikely(rcu_dereference_raw(files->fdt) != fdt) || + unlikely(rcu_dereference_raw(*fdentry) != file)) { fput_many(file, refs); - goto loop; + continue; } + + /* + * Ok, we have a ref to the file, and checked that it + * still exists. + */ + return file; } +} + +static struct file *__fget_files(struct files_struct *files, unsigned int fd, + fmode_t mask, unsigned int refs) +{ + struct file *file; + + rcu_read_lock(); + file = __fget_files_rcu(files, fd, mask, refs); rcu_read_unlock(); return file; -- cgit