Age | Commit message (Collapse) | Author |
|
Use an xattr on each backing file in the cache to store some metadata, such
as the content type and the coherency data.
Five content types are defined:
(0) No content stored.
(1) The file contains a single monolithic blob and must be all or nothing.
This would be used for something like an AFS directory or a symlink.
(2) The file is populated with content completely up to a point with
nothing beyond that.
(3) The file has a map attached and is sparsely populated. This would be
stored in one or more additional xattrs.
(4) The file is dirty, being in the process of local modification and the
contents are not necessarily represented correctly by the metadata.
The file should be deleted if this is seen on binding.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819641320.215744.16346770087799536862.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906942248.143852.5423738045012094252.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967151734.1823006.9301249989443622576.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021550471.640689.553853918307994335.stgit@warthog.procyon.org.uk/ # v4
|
|
Implement allocate, get, see and put functions for the cachefiles_object
struct. The members of the struct we're going to need are also added.
Additionally, implement a lifecycle tracepoint.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819639457.215744.4600093239395728232.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906939569.143852.3594314410666551982.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967148857.1823006.6332962598220464364.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021547762.640689.8422781599594931000.stgit@warthog.procyon.org.uk/ # v4
|
|
Add tracepoints in cachefiles to monitor when it does various VFS
operations, such as mkdir.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819638517.215744.12773133137536579766.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906938316.143852.17227990869551737803.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967147139.1823006.4909879317496543392.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021546287.640689.3501604495002415631.stgit@warthog.procyon.org.uk/ # v4
|
|
Use an inode flag, S_KERNEL_FILE, to mark that a backing file is in use by
the kernel to prevent cachefiles or other kernel services from interfering
with that file.
Alter rmdir to reject attempts to remove a directory marked with this flag.
This is used by cachefiles to prevent cachefilesd from removing them.
Using S_SWAPFILE instead isn't really viable as that has other effects in
the I/O paths.
Changes
=======
ver #3:
- Check for the object pointer being NULL in the tracepoints rather than
the caller.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819630256.215744.4815885535039369574.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906931596.143852.8642051223094013028.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967141000.1823006.12920680657559677789.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021541207.640689.564689725898537127.stgit@warthog.procyon.org.uk/ # v4
|
|
Add two trace points to log errors, one for vfs operations like mkdir or
create, and one for I/O operations, like read, write or truncate.
Also add the beginnings of a struct that is going to represent a data file
and place a debugging ID in it for the tracepoints to record.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819625632.215744.17907340966178411033.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906926297.143852.18267924605548658911.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967135390.1823006.2512120406360156424.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021534029.640689.1875723624947577095.stgit@warthog.procyon.org.uk/ # v4
|
|
Introduce basic skeleton of the rewritten cachefiles driver including
config options so that it can be enabled for compilation.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819622766.215744.9108359326983195047.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906923341.143852.3856498104256721447.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967130320.1823006.15791456613198441566.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021528993.640689.9069695476048171884.stgit@warthog.procyon.org.uk/ # v4
|
|
Provide a function to change the size of the storage attached to a cookie,
to match the size of the file being cached when it's changed by truncate or
fallocate:
void fscache_resize_cookie(struct fscache_cookie *cookie,
loff_t new_size);
This acts synchronously and is expected to run under the inode lock of the
caller.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819621839.215744.7895597119803515402.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906922387.143852.16394459879816147793.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967128998.1823006.10740669081985775576.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021527861.640689.3466382085497236267.stgit@warthog.procyon.org.uk/ # v4
|
|
Provide a function to be called from a network filesystem's releasepage
method to indicate that a page has been released that might have been a
reflection of data upon the server - and now that data must be reloaded
from the server or the cache.
This is used to end an optimisation for empty files, in particular files
that have just been created locally, whereby we know there cannot yet be
any data that we would need to read from the server or the cache.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819617128.215744.4725572296135656508.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906920354.143852.7511819614661372008.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967128061.1823006.611781655060034988.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021525963.640689.9264556596205140044.stgit@warthog.procyon.org.uk/ # v4
|
|
Cachefiles has a problem in that it needs to keep the backing file for a
cookie open whilst there are local modifications pending that need to be
written to it. However, we don't want to keep the file open indefinitely,
as that causes EMFILE/ENFILE/ENOMEM problems.
Reopening the cache file, however, is a problem if this is being done due
to writeback triggered by exit(). Some filesystems will oops if we try to
open a file in that context because they want to access current->fs or
other resources that have already been dismantled.
To get around this, I added the following:
(1) An inode flag, I_PINNING_FSCACHE_WB, to be set on a network filesystem
inode to indicate that we have a usage count on the cookie caching
that inode.
(2) A flag in struct writeback_control, unpinned_fscache_wb, that is set
when __writeback_single_inode() clears the last dirty page from
i_pages - at which point it clears I_PINNING_FSCACHE_WB and sets this
flag.
This has to be done here so that clearing I_PINNING_FSCACHE_WB can be
done atomically with the check of PAGECACHE_TAG_DIRTY that clears
I_DIRTY_PAGES.
(3) A function, fscache_set_page_dirty(), which if it is not set, sets
I_PINNING_FSCACHE_WB and calls fscache_use_cookie() to pin the cache
resources.
(4) A function, fscache_unpin_writeback(), to be called by ->write_inode()
to unuse the cookie.
(5) A function, fscache_clear_inode_writeback(), to be called when the
inode is evicted, before clear_inode() is called. This cleans up any
lingering I_PINNING_FSCACHE_WB.
The network filesystem can then use these tools to make sure that
fscache_write_to_cache() can write locally modified data to the cache as
well as to the server.
For the future, I'm working on write helpers for netfs lib that should
allow this facility to be removed by keeping track of the dirty regions
separately - but that's incomplete at the moment and is also going to be
affected by folios, one way or another, since it deals with pages
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819615157.215744.17623791756928043114.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906917856.143852.8224898306177154573.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967124567.1823006.14188359004568060298.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021524705.640689.17824932021727663017.stgit@warthog.procyon.org.uk/ # v4
|
|
Provide a higher-level function than fscache_write() to perform a write
from an inode's pagecache to the cache, whilst fending off concurrent
writes by means of the PG_fscache mark on a page:
void fscache_write_to_cache(struct fscache_cookie *cookie,
struct address_space *mapping,
loff_t start,
size_t len,
loff_t i_size,
netfs_io_terminated_t term_func,
void *term_func_priv,
bool caching);
If caching is false, this function does nothing except call (*term_func)()
if given. It assumes that, in such a case, PG_fscache will not have been
set on the pages.
Otherwise, if caching is true, this function requires the source pages to
have had PG_fscache set on them before calling. start and len define the
region of the file to be modified and i_size indicates the new file size.
The source pages are extracted from the mapping.
term_func and term_func_priv work as for fscache_write(). The PG_fscache
marks will be cleared at the end of the operation, before term_func is
called or the function otherwise returns.
There is an additonal helper function to clear the PG_fscache bits from a
range of pages:
void fscache_clear_page_bits(struct fscache_cookie *cookie,
struct address_space *mapping,
loff_t start, size_t len,
bool caching);
If caching is true, the pages to be managed are expected to be located on
mapping in the range defined by start and len. If caching is false, it
does nothing.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819614155.215744.5528123235123721230.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906916346.143852.15632773570362489926.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967123599.1823006.12946816026724657428.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021522672.640689.4381958316198807813.stgit@warthog.procyon.org.uk/ # v4
|
|
Provide a pair of functions to perform raw I/O on the cache. The first
function allows an arbitrary asynchronous direct-IO read to be made against
a cache object, though the read should be aligned and sized appropriately
for the backing device:
int fscache_read(struct netfs_cache_resources *cres,
loff_t start_pos,
struct iov_iter *iter,
enum netfs_read_from_hole read_hole,
netfs_io_terminated_t term_func,
void *term_func_priv);
The cache resources must have been previously initialised by
fscache_begin_read_operation(). A read operation is sent to the backing
filesystem, starting at start_pos within the file. The size of the read is
specified by the iterator, as is the location of the output buffer.
If there is a hole in the data it can be ignored and left to the backing
filesystem to deal with (NETFS_READ_HOLE_IGNORE), a hole at the beginning
can be skipped over and the buffer padded with zeros
(NETFS_READ_HOLE_CLEAR) or -ENODATA can be given (NETFS_READ_HOLE_FAIL).
If term_func is not NULL, the operation may be performed asynchronously.
Upon completion, successful or otherwise, (*term_func)() will be called and
passed term_func_priv, along with an error or the amount of data
transferred. If the op is run asynchronously, fscache_read() will return
-EIOCBQUEUED.
The second function allows an arbitrary asynchronous direct-IO write to be
made against a cache object, though the write should be aligned and sized
appropriately for the backing device:
int fscache_write(struct netfs_cache_resources *cres,
loff_t start_pos,
struct iov_iter *iter,
netfs_io_terminated_t term_func,
void *term_func_priv);
This works in very similar way to fscache_read(), except that there's no
need to deal with holes (they're just overwritten).
The caller is responsible for preventing concurrent overlapping writes.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819613224.215744.7877577215582621254.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906915386.143852.16936177636106480724.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967122632.1823006.7487049517698562172.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021521420.640689.12747258780542678309.stgit@warthog.procyon.org.uk/ # v4
|
|
Pass more information to the cache on how to deal with a hole if it
encounters one when trying to read from the cache. Three options are
provided:
(1) NETFS_READ_HOLE_IGNORE. Read the hole along with the data, assuming
it to be a punched-out extent by the backing filesystem.
(2) NETFS_READ_HOLE_CLEAR. If there's a hole, erase the requested region
of the cache and clear the read buffer.
(3) NETFS_READ_HOLE_FAIL. Fail the read if a hole is detected.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819612321.215744.9738308885948264476.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906914460.143852.6284247083607910189.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967119923.1823006.15637375885194297582.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021519762.640689.16994364383313159319.stgit@warthog.procyon.org.uk/ # v4
|
|
Provide a function to let the netfs update its coherency data:
void fscache_update_cookie(struct fscache_cookie *cookie,
const void *aux_data,
const loff_t *object_size);
This will update the auxiliary data and/or the size of the object attached
to a cookie if either pointer is not-NULL and flag that the disk needs to
be updated.
Note that fscache_unuse_cookie() also allows this to be done.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819610438.215744.4223265964131424954.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906913530.143852.18150303220217653820.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967117795.1823006.7493373142653442595.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021518440.640689.6369952464473039268.stgit@warthog.procyon.org.uk/ # v4
|
|
Provide read/write stat counters for the cache backend to use.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819609532.215744.10821082637727410554.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906912598.143852.12960327989649429069.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967113830.1823006.3222957649202368162.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021517502.640689.6077928311710357342.stgit@warthog.procyon.org.uk/ # v4
|
|
Count the data storage objects that are currently allocated in a cache.
This is used to pin certain cache structures until cache withdrawal is
complete.
Three helpers are provided to manage and make use of the count:
(1) void fscache_count_object(struct fscache_cache *cache);
This should be called by the cache backend to note that an object has
been allocated and attached to the cache.
(2) void fscache_uncount_object(struct fscache_cache *cache);
This should be called by the backend to note that an object has been
destroyed. This sends a wakeup event that allows cache withdrawal to
proceed if it was waiting for that object.
(3) void fscache_wait_for_objects(struct fscache_cache *cache);
This can be used by the backend to wait for all outstanding cache
object to be destroyed.
Each cache's counter is displayed as part of /proc/fs/fscache/caches.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819608594.215744.1812706538117388252.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906911646.143852.168184059935530127.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967111846.1823006.9868154941573671255.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021516219.640689.4934796654308958158.stgit@warthog.procyon.org.uk/ # v4
|
|
Provide a function to begin a read operation:
int fscache_begin_read_operation(
struct netfs_cache_resources *cres,
struct fscache_cookie *cookie)
This is primarily intended to be called by network filesystems on behalf of
netfslib, but may also be called to use the I/O access functions directly.
It attaches the resources required by the cache to cres struct from the
supplied cookie.
This holds access to the cache behind the cookie for the duration of the
operation and forces cache withdrawal and cookie invalidation to perform
synchronisation on the operation. cres->inval_counter is set from the
cookie at this point so that it can be compared at the end of the
operation.
Note that this does not guarantee that the cache state is fully set up and
able to perform I/O immediately; looking up and creation may be left in
progress in the background. The operations intended to be called by the
network filesystem, such as reading and writing, are expected to wait for
the cookie to move to the correct state.
This will, however, potentially sleep, waiting for a certain minimum state
to be set or for operations such as invalidate to advance far enough that
I/O can resume.
Also provide a function for the cache to call to wait for the cache object
to get to a state where it can be used for certain things:
bool fscache_wait_for_operation(struct netfs_cache_resources *cres,
enum fscache_want_stage stage);
This looks at the cache resources provided by the begin function and waits
for them to get to an appropriate stage. There's a choice of wanting just
some parameters (FSCACHE_WANT_PARAM) or the ability to do I/O
(FSCACHE_WANT_READ or FSCACHE_WANT_WRITE).
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819603692.215744.146724961588817028.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906910672.143852.13856103384424986357.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967110245.1823006.2239170567540431836.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021513617.640689.16627329360866150606.stgit@warthog.procyon.org.uk/ # v4
|
|
Add a function to invalidate the cache behind a cookie:
void fscache_invalidate(struct fscache_cookie *cookie,
const void *aux_data,
loff_t size,
unsigned int flags)
This causes any cached data for the specified cookie to be discarded. If
the cookie is marked as being in use, a new cache object will be created if
possible and future I/O will use that instead. In-flight I/O should be
abandoned (writes) or reconsidered (reads). Each time it is called
cookie->inval_counter is incremented and this can be used to detect
invalidation at the end of an I/O operation.
The coherency data attached to the cookie can be updated and the cookie
size should be reset. One flag is available, FSCACHE_INVAL_DIO_WRITE,
which should be used to indicate invalidation due to a DIO write on a
file. This will temporarily disable caching for this cookie.
Changes
=======
ver #2:
- Should only change to inval state if can get access to cache.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819602231.215744.11206598147269491575.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906909707.143852.18056070560477964891.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967107447.1823006.5945029409592119962.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021512640.640689.11418616313147754172.stgit@warthog.procyon.org.uk/ # v4
|
|
Provide a pair of functions to count the number of users of a cookie (open
files, writeback, invalidation, resizing, reads, writes), to obtain and pin
resources for the cookie and to prevent culling for the whilst there are
users.
The first function marks a cookie as being in use:
void fscache_use_cookie(struct fscache_cookie *cookie,
bool will_modify);
The caller should indicate the cookie to use and whether or not the caller
is in a context that may modify the cookie (e.g. a file open O_RDWR).
If the cookie is not already resourced, fscache will ask the cache backend
in the background to do whatever it needs to look up, create or otherwise
obtain the resources necessary to access data. This is pinned to the
cookie and may not be culled, though it may be withdrawn if the cache as a
whole is withdrawn.
The second function removes the in-use mark from a cookie and, optionally,
updates the coherency data:
void fscache_unuse_cookie(struct fscache_cookie *cookie,
const void *aux_data,
const loff_t *object_size);
If non-NULL, the aux_data buffer and/or the object_size will be saved into
the cookie and will be set on the backing store when the object is
committed.
If this removes the last usage on a cookie, the cookie is placed onto an
LRU list from which it will be removed and closed after a couple of seconds
if it doesn't get reused. This prevents resource overload in the cache -
in particular it prevents it from holding too many files open.
Changes
=======
ver #2:
- Fix fscache_unuse_cookie() to use atomic_dec_and_lock() to avoid a
potential race if the cookie gets reused before it completes the
unusement.
- Added missing transition to LRU_DISCARDING state.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819600612.215744.13678350304176542741.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906907567.143852.16979631199380722019.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967106467.1823006.6790864931048582667.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021511674.640689.10084988363699111860.stgit@warthog.procyon.org.uk/ # v4
|
|
Implement a very simple cookie state machine to handle lookup,
invalidation, withdrawal, relinquishment and, to be added later, commit on
LRU discard.
Three cache methods are provided: ->lookup_cookie() to look up and, if
necessary, create a data storage object; ->withdraw_cookie() to free the
resources associated with that object and potentially delete it; and
->prepare_to_write(), to do prepare for changes to the cached data to be
modified locally.
Changes
=======
ver #3:
- Fix a race between LRU discard and relinquishment whereby the former
would override the latter and thus the latter would never happen[1].
ver #2:
- Don't hold n_accesses elevated whilst cache is bound to a cookie, but
rather add a flag that prevents the state machine from being queued when
n_accesses reaches 0.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/599331.1639410068@warthog.procyon.org.uk/ [1]
Link: https://lore.kernel.org/r/163819599657.215744.15799615296912341745.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906903925.143852.1805855338154353867.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967105456.1823006.14730395299835841776.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021510706.640689.7961423370243272583.stgit@warthog.procyon.org.uk/ # v4
|
|
Add a function to the backend API to note an I/O error in a cache.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819598741.215744.891281275151382095.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906901316.143852.15225412215771586528.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967100721.1823006.16435671567428949398.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021508840.640689.11902836226570620424.stgit@warthog.procyon.org.uk/ # v4
|
|
Add cache methods to lookup, create and remove a volume.
Looking up or creating the volume requires the cache pinning for access;
freeing the volume requires the volume pinning for access. The
->acquire_volume() method is used to ask the cache backend to lookup and,
if necessary, create a volume; the ->free_volume() method is used to free
the resources for a volume.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819597821.215744.5225318658134989949.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906898645.143852.8537799955945956818.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967099771.1823006.1455197910571061835.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021507345.640689.4073511598838843040.stgit@warthog.procyon.org.uk/ # v4
|
|
Implement functions to allow the cache backend to add or remove a cache:
(1) Declare a cache to be live:
int fscache_add_cache(struct fscache_cache *cache,
const struct fscache_cache_ops *ops,
void *cache_priv);
Take a previously acquired cache cookie, set the operations table and
private data and mark the cache open for access.
(2) Withdraw a cache from service:
void fscache_withdraw_cache(struct fscache_cache *cache);
This marks the cache as withdrawn and thus prevents further
cache-level and volume-level accesses.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819596022.215744.8799712491432238827.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906896599.143852.17049208999019262884.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967097870.1823006.3470041000971522030.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021505541.640689.1819714759326331054.stgit@warthog.procyon.org.uk/ # v4
|
|
Add a number of helper functions to manage access to a cookie, pinning the
cache object in place for the duration to prevent cache withdrawal from
removing it:
(1) void fscache_init_access_gate(struct fscache_cookie *cookie);
This function initialises the access count when a cache binds to a
cookie. An extra ref is taken on the access count to prevent wakeups
while the cache is active. We're only interested in the wakeup when a
cookie is being withdrawn and we're waiting for it to quiesce - at
which point the counter will be decremented before the wait.
The FSCACHE_COOKIE_NACC_ELEVATED flag is set on the cookie to keep
track of the extra ref in order to handle a race between
relinquishment and withdrawal both trying to drop the extra ref.
(2) bool fscache_begin_cookie_access(struct fscache_cookie *cookie,
enum fscache_access_trace why);
This function attempts to begin access upon a cookie, pinning it in
place if it's cached. If successful, it returns true and leaves a the
access count incremented.
(3) void fscache_end_cookie_access(struct fscache_cookie *cookie,
enum fscache_access_trace why);
This function drops the access count obtained by (2), permitting
object withdrawal to take place when it reaches zero.
A tracepoint is provided to track changes to the access counter on a
cookie.
Changes
=======
ver #2:
- Don't hold n_accesses elevated whilst cache is bound to a cookie, but
rather add a flag that prevents the state machine from being queued when
n_accesses reaches 0.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819595085.215744.1706073049250505427.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906895313.143852.10141619544149102193.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967095980.1823006.1133648159424418877.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021503063.640689.8870918985269528670.stgit@warthog.procyon.org.uk/ # v4
|
|
Add a pair of helper functions to manage access to a volume, pinning the
volume in place for the duration to prevent cache withdrawal from removing
it:
bool fscache_begin_volume_access(struct fscache_volume *volume,
enum fscache_access_trace why);
void fscache_end_volume_access(struct fscache_volume *volume,
enum fscache_access_trace why);
The way the access gate on the volume works/will work is:
(1) If the cache tests as not live (state is not FSCACHE_CACHE_IS_ACTIVE),
then we return false to indicate access was not permitted.
(2) If the cache tests as live, then we increment the volume's n_accesses
count and then recheck the cache liveness, ending the access if it
ceased to be live.
(3) When we end the access, we decrement the volume's n_accesses and wake
up the any waiters if it reaches 0.
(4) Whilst the cache is caching, the volume's n_accesses is kept
artificially incremented to prevent wakeups from happening.
(5) When the cache is taken offline, the state is changed to prevent new
accesses, the volume's n_accesses is decremented and we wait for it to
become 0.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819594158.215744.8285859817391683254.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906894315.143852.5454793807544710479.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967095028.1823006.9173132503876627466.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021501546.640689.9631510472149608443.stgit@warthog.procyon.org.uk/ # v4
|
|
Add a pair of functions to pin/unpin a cache that we're wanting to do a
high-level access to (such as creating or removing a volume):
bool fscache_begin_cache_access(struct fscache_cache *cache,
enum fscache_access_trace why);
void fscache_end_cache_access(struct fscache_cache *cache,
enum fscache_access_trace why);
The way the access gate works/will work is:
(1) If the cache tests as not live (state is not FSCACHE_CACHE_IS_ACTIVE),
then we return false to indicate access was not permitted.
(2) If the cache tests as live, then we increment the n_accesses count and
then recheck the liveness, ending the access if it ceased to be live.
(3) When we end the access, we decrement n_accesses and wake up the any
waiters if it reaches 0.
(4) Whilst the cache is caching, n_accesses is kept artificially
incremented to prevent wakeups from happening.
(5) When the cache is taken offline, the state is changed to prevent new
accesses, n_accesses is decremented and we wait for n_accesses to
become 0.
Note that some of this is implemented in a later patch.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819593239.215744.7537428720603638088.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906893368.143852.14164004598465617981.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967093977.1823006.6967886507023056409.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021499995.640689.18286203753480287850.stgit@warthog.procyon.org.uk/ # v4
|
|
Add functions to the fscache API to allow data file cookies to be acquired
and relinquished by the network filesystem. It is intended that the
filesystem will create such cookies per-inode under a volume.
To request a cookie, the filesystem should call:
struct fscache_cookie *
fscache_acquire_cookie(struct fscache_volume *volume,
u8 advice,
const void *index_key,
size_t index_key_len,
const void *aux_data,
size_t aux_data_len,
loff_t object_size)
The filesystem must first have created a volume cookie, which is passed in
here. If it passes in NULL then the function will just return a NULL
cookie.
A binary key should be passed in index_key and is of size index_key_len.
This is saved in the cookie and is used to locate the associated data in
the cache.
A coherency data buffer of size aux_data_len will be allocated and
initialised from the buffer pointed to by aux_data. This is used to
validate cache objects when they're opened and is stored on disk with them
when they're committed. The data is stored in the cookie and will be
updateable by various functions in later patches.
The object_size must also be given. This is also used to perform a
coherency check and to size the backing storage appropriately.
This function disallows a cookie from being acquired twice in parallel,
though it will cause the second user to wait if the first is busy
relinquishing its cookie.
When a network filesystem has finished with a cookie, it should call:
void
fscache_relinquish_cookie(struct fscache_volume *volume,
bool retire)
If retire is true, any backing data will be discarded immediately.
Changes
=======
ver #3:
- fscache_hash()'s size parameter is now in bytes. Use __le32 as the unit
to round up to.
- When comparing cookies, simply see if the attributes are the same rather
than subtracting them to produce a strcmp-style return[1].
- Add a check to see if the cookie is still hashed at the point of
freeing.
ver #2:
- Don't hold n_accesses elevated whilst cache is bound to a cookie, but
rather add a flag that prevents the state machine from being queued when
n_accesses reaches 0.
- Remove the unused cookie pointer field from the fscache_acquire
tracepoint.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/CAHk-=whtkzB446+hX0zdLsdcUJsJ=8_-0S1mE_R+YurThfUbLA@mail.gmail.com/ [1]
Link: https://lore.kernel.org/r/163819590658.215744.14934902514281054323.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906891983.143852.6219772337558577395.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967088507.1823006.12659006350221417165.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021498432.640689.12743483856927722772.stgit@warthog.procyon.org.uk/ # v4
|
|
Add functions to the fscache API to allow volumes to be acquired and
relinquished by the network filesystem. A volume is an index of data
storage cache objects. A volume is represented by a volume cookie in the
API. A filesystem would typically create a volume for a superblock and
then create per-inode cookies within it.
To request a volume, the filesystem calls:
struct fscache_volume *
fscache_acquire_volume(const char *volume_key,
const char *cache_name,
const void *coherency_data,
size_t coherency_len)
The volume_key is a printable string used to match the volume in the cache.
It should not contain any '/' characters. For AFS, for example, this would
be "afs,<cellname>,<volume_id>", e.g. "afs,example.com,523001".
The cache_name can be NULL, but if not it should be a string indicating the
name of the cache to use if there's more than one available.
The coherency data, if given, is an arbitrarily-sized blob that's attached
to the volume and is compared when the volume is looked up. If it doesn't
match, the old volume is judged to be out of date and it and everything
within it is discarded.
Acquiring a volume twice concurrently is disallowed, though the function
will wait if an old volume cookie is being relinquishing.
When a network filesystem has finished with a volume, it should return the
volume cookie by calling:
void
fscache_relinquish_volume(struct fscache_volume *volume,
const void *coherency_data,
bool invalidate)
If invalidate is true, the entire volume will be discarded; if false, the
volume will be synced and the coherency data will be updated.
Changes
=======
ver #4:
- Removed an extraneous param from kdoc on fscache_relinquish_volume()[3].
ver #3:
- fscache_hash()'s size parameter is now in bytes. Use __le32 as the unit
to round up to.
- When comparing cookies, simply see if the attributes are the same rather
than subtracting them to produce a strcmp-style return[2].
- Make the coherency data an arbitrary blob rather than a u64, but don't
store it for the moment.
ver #2:
- Fix error check[1].
- Make a fscache_acquire_volume() return errors, including EBUSY if a
conflicting volume cookie already exists. No error is printed now -
that's left to the netfs.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/20211203095608.GC2480@kili/ [1]
Link: https://lore.kernel.org/r/CAHk-=whtkzB446+hX0zdLsdcUJsJ=8_-0S1mE_R+YurThfUbLA@mail.gmail.com/ [2]
Link: https://lore.kernel.org/r/20211220224646.30e8205c@canb.auug.org.au/ [3]
Link: https://lore.kernel.org/r/163819588944.215744.1629085755564865996.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906890630.143852.13972180614535611154.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967086836.1823006.8191672796841981763.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021495816.640689.4403156093668590217.stgit@warthog.procyon.org.uk/ # v4
|
|
Implement a register of caches and provide functions to manage it.
Two functions are provided for the cache backend to use:
(1) Acquire a cache cookie:
struct fscache_cache *fscache_acquire_cache(const char *name)
This gets the cache cookie for a cache of the specified name and moves
it to the preparation state. If a nameless cache cookie exists, that
will be given this name and used.
(2) Relinquish a cache cookie:
void fscache_relinquish_cache(struct fscache_cache *cache);
This relinquishes a cache cookie, cleans it and makes it available if
it's still referenced by a network filesystem.
Note that network filesystems don't deal with cache cookies directly, but
rather go straight to the volume registration.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819587157.215744.13523139317322503286.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906889665.143852.10378009165231294456.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967085081.1823006.2218944206363626210.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021494847.640689.10109692261640524343.stgit@warthog.procyon.org.uk/ # v4
|
|
Introduce basic skeleton of the new, rewritten fscache driver.
Changes
=======
ver #3:
- Use remove_proc_subtree(), not remove_proc_entry() to remove a populated
dir.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819584034.215744.4290533472390439030.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906887770.143852.3577888294989185666.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967080039.1823006.5702921801104057922.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021491014.640689.4292699878317589512.stgit@warthog.procyon.org.uk/ # v4
|
|
Pass a flag to ->prepare_write() to indicate if there's definitely no
space allocated in the cache yet (for instance if we've already checked as
we were asked to do a read).
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819583123.215744.12783808230464471417.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906886835.143852.6689886781122679769.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967079100.1823006.12889542712309574359.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021489334.640689.3131206613015409076.stgit@warthog.procyon.org.uk/ # v4
|
|
Display the netfs inode number in the netfs_read tracepoint so that this
can be used to correlate with the cachefiles_prep_read tracepoint.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819581097.215744.17476611915583897051.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906885903.143852.12229407815154182247.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967078164.1823006.15286989199782861123.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021487412.640689.7544388469390936443.stgit@warthog.procyon.org.uk/ # v4
|
|
Remove the code that comprises the fscache driver as it's going to be
substantially rewritten, with the majority of the code being erased in the
rewrite.
A small piece of linux/fscache.h is left as that is #included by a bunch of
network filesystems.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819578724.215744.18210619052245724238.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906884814.143852.6727245089843862889.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967077097.1823006.1377665951499979089.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021485548.640689.13876080567388696162.stgit@warthog.procyon.org.uk/ # v4
|
|
Delete the code from the cachefiles driver to make it easier to rewrite and
resubmit in a logical manner.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819577641.215744.12718114397770666596.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906883770.143852.4149714614981373410.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967076066.1823006.7175712134577687753.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021483619.640689.7586546280515844702.stgit@warthog.procyon.org.uk/ # v4
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:
- Properly init uclamp_flags of a runqueue, on first enqueuing
- Fix preempt= callback return values
- Correct utime/stime resource usage reporting on nohz_full to return
the proper times instead of shorter ones
* tag 'sched_urgent_for_v5.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/uclamp: Fix rq->uclamp_max not set on first enqueue
preempt/dynamic: Fix setup_preempt_mode() return value
sched/cputime: Fix getrusage(RUSAGE_THREAD) with nohz_full
|
|
Pull drm fixes from Dave Airlie:
"Bit of an uptick in patch count this week, though it's all relatively
small overall.
I suspect msm has been queuing up a few fixes to skew it here.
Otherwise amdgpu has a scattered bunch of small fixes, and then some
vc4, i915.
virtio-gpu changes an rc1 introduced uAPI mistake, and makes it
operate more like other drivers. This should be fine as no userspace
relies on the behaviour yet.
Summary:
dma-buf:
- memory leak fix
msm:
- kasan found memory overwrite
- mmap flags
- fencing error bug
- ioctl NULL ptr
- uninit var
- devfreqless devices fix
- dsi lanes fix
- dp: avoid unpowered aux xfers
amdgpu:
- IP discovery based enumeration fixes
- vkms fixes
- DSC fixes for DP MST
- Audio fix for hotplug with tiled displays
- Misc display fixes
- DP tunneling fix
- DP fix
- Aldebaran fix
amdkfd:
- Locking fix
- Static checker fix
- Fix double free
i915:
- backlight regression
- Intel HDR backlight detection fix
- revert TGL workaround that caused hangs
virtio-gpu:
- switch back to drm_poll
vc4:
- memory leak
- error check fix
- HVS modesetting fixes"
* tag 'drm-fixes-2021-12-03-1' of git://anongit.freedesktop.org/drm/drm: (41 commits)
Revert "drm/i915: Implement Wa_1508744258"
drm/amdkfd: process_info lock not needed for svm
drm/amdgpu: adjust the kfd reset sequence in reset sriov function
drm/amd/display: add connector type check for CRC source set
drm/amdkfd: fix double free mem structure
drm/amdkfd: set "r = 0" explicitly before goto
drm/amd/display: Add work around for tunneled MST.
drm/amd/display: Fix for the no Audio bug with Tiled Displays
drm/amd/display: Clear DPCD lane settings after repeater training
drm/amd/display: Allow DSC on supported MST branch devices
drm/amdgpu: Don't halt RLC on GFX suspend
drm/amdgpu: fix the missed handling for SDMA2 and SDMA3
drm/amdgpu: check atomic flag to differeniate with legacy path
drm/amdgpu: cancel the correct hrtimer on exit
drm/amdgpu/sriov/vcn: add new vcn ip revision check case for SIENNA_CICHLID
drm/i915/dp: Perform 30ms delay after source OUI write
dma-buf: system_heap: Use 'for_each_sgtable_sg' in pages free flow
drm/i915: Add support for panels with VESA backlights with PWM enable/disable
drm/vc4: kms: Fix previous HVS commit wait
drm/vc4: kms: Don't duplicate pending commit
...
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Switch back to drm_poll for virtio, multiple fixes (memory leak,
improper error check, some functional fixes too) for vc4, memory leak
fix in dma-buf,
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20211202084440.u3b7lbeulj7k3ltg@houat
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless, and wireguard.
Mostly scattered driver changes this week, with one big clump in
mv88e6xxx. Nothing of note, really.
Current release - regressions:
- smc: keep smc_close_final()'s error code during active close
Current release - new code bugs:
- iwlwifi: various static checker fixes (int overflow, leaks, missing
error codes)
- rtw89: fix size of firmware header before transfer, avoid crash
- mt76: fix timestamp check in tx_status; fix pktid leak;
- mscc: ocelot: fix missing unlock on error in ocelot_hwstamp_set()
Previous releases - regressions:
- smc: fix list corruption in smc_lgr_cleanup_early
- ipv4: convert fib_num_tclassid_users to atomic_t
Previous releases - always broken:
- tls: fix authentication failure in CCM mode
- vrf: reset IPCB/IP6CB when processing outbound pkts, prevent
incorrect processing
- dsa: mv88e6xxx: fixes for various device errata
- rds: correct socket tunable error in rds_tcp_tune()
- ipv6: fix memory leak in fib6_rule_suppress
- wireguard: reset peer src endpoint when netns exits
- wireguard: improve resilience to DoS around incoming handshakes
- tcp: fix page frag corruption on page fault which involves TCP
- mpls: fix missing attributes in delete notifications
- mt7915: fix NULL pointer dereference with ad-hoc mode
Misc:
- rt2x00: be more lenient about EPROTO errors during start
- mlx4_en: update reported link modes for 1/10G"
* tag 'net-5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
net: dsa: b53: Add SPI ID table
gro: Fix inconsistent indenting
selftests: net: Correct case name
net/rds: correct socket tunable error in rds_tcp_tune()
mctp: Don't let RTM_DELROUTE delete local routes
net/smc: Keep smc_close_final rc during active close
ibmvnic: drop bad optimization in reuse_tx_pools()
ibmvnic: drop bad optimization in reuse_rx_pools()
net/smc: fix wrong list_del in smc_lgr_cleanup_early
Fix Comment of ETH_P_802_3_MIN
ethernet: aquantia: Try MAC address from device tree
ipv4: convert fib_num_tclassid_users to atomic_t
net: avoid uninit-value from tcp_conn_request
net: annotate data-races on txq->xmit_lock_owner
octeontx2-af: Fix a memleak bug in rvu_mbox_init()
net/mlx4_en: Fix an use-after-free bug in mlx4_en_try_alloc_resources()
vrf: Reset IPCB/IP6CB when processing outbound pkts in vrf dev xmit
net: qlogic: qlcnic: Fix a NULL pointer dereference in qlcnic_83xx_add_rings()
net: dsa: mv88e6xxx: Link in pcs_get_state() if AN is bypassed
net: dsa: mv88e6xxx: Fix inband AN for 2500base-x on 88E6393X family
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Three tracing fixes:
- Allow compares of strings when using signed and unsigned characters
- Fix kmemleak false positive for histogram entries
- Handle negative numbers for user defined kretprobe data sizes"
* tag 'trace-v5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
kprobes: Limit max data_size of the kretprobe instances
tracing: Fix a kmemleak false positive in tracing_map
tracing/histograms: String compares should not care about signed values
|
|
getrusage(RUSAGE_THREAD) with nohz_full may return shorter utime/stime
than the actual time.
task_cputime_adjusted() snapshots utime and stime and then adjust their
sum to match the scheduler maintained cputime.sum_exec_runtime.
Unfortunately in nohz_full, sum_exec_runtime is only updated once per
second in the worst case, causing a discrepancy against utime and stime
that can be updated anytime by the reader using vtime.
To fix this situation, perform an update of cputime.sum_exec_runtime
when the cputime snapshot reports the task as actually running while
the tick is disabled. The related overhead is then contained within the
relevant situations.
Reported-by: Hasegawa Hitomi <hasegawa-hitomi@fujitsu.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Hasegawa Hitomi <hasegawa-hitomi@fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Acked-by: Phil Auld <pauld@redhat.com>
Link: https://lore.kernel.org/r/20211026141055.57358-3-frederic@kernel.org
|
|
The description of ETH_P_802_3_MIN is misleading.
The value of EthernetType in Ethernet II frame is more than 0x0600,
the value of Length in 802.3 frame is less than 0x0600.
Signed-off-by: Xiayu Zhang <Xiayu.Zhang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Before commit faa041a40b9f ("ipv4: Create cleanup helper for fib_nh")
changes to net->ipv4.fib_num_tclassid_users were protected by RTNL.
After the change, this is no longer the case, as free_fib_info_rcu()
runs after rcu grace period, without rtnl being held.
Fixes: faa041a40b9f ("ipv4: Create cleanup helper for fib_nh")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A recent change triggers a KMSAN warning, because request
sockets do not initialize @sk_rx_queue_mapping field.
Add sk_rx_queue_update() helper to make our intent clear.
BUG: KMSAN: uninit-value in sk_rx_queue_set include/net/sock.h:1922 [inline]
BUG: KMSAN: uninit-value in tcp_conn_request+0x3bcc/0x4dc0 net/ipv4/tcp_input.c:6922
sk_rx_queue_set include/net/sock.h:1922 [inline]
tcp_conn_request+0x3bcc/0x4dc0 net/ipv4/tcp_input.c:6922
tcp_v4_conn_request+0x218/0x2a0 net/ipv4/tcp_ipv4.c:1528
tcp_rcv_state_process+0x2c5/0x3290 net/ipv4/tcp_input.c:6406
tcp_v4_do_rcv+0xb4e/0x1330 net/ipv4/tcp_ipv4.c:1738
tcp_v4_rcv+0x468d/0x4ed0 net/ipv4/tcp_ipv4.c:2100
ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204
ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
NF_HOOK include/linux/netfilter.h:307 [inline]
ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252
dst_input include/net/dst.h:460 [inline]
ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline]
ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline]
ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609
ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644
__netif_receive_skb_list_ptype net/core/dev.c:5505 [inline]
__netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553
__netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605
netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696
gro_normal_list net/core/dev.c:5850 [inline]
napi_complete_done+0x579/0xdd0 net/core/dev.c:6587
virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557
__napi_poll+0x14e/0xbc0 net/core/dev.c:7020
napi_poll net/core/dev.c:7087 [inline]
net_rx_action+0x824/0x1880 net/core/dev.c:7174
__do_softirq+0x1fe/0x7eb kernel/softirq.c:558
invoke_softirq+0xa4/0x130 kernel/softirq.c:432
__irq_exit_rcu kernel/softirq.c:636 [inline]
irq_exit_rcu+0x76/0x130 kernel/softirq.c:648
common_interrupt+0xb6/0xd0 arch/x86/kernel/irq.c:240
asm_common_interrupt+0x1e/0x40
smap_restore arch/x86/include/asm/smap.h:67 [inline]
get_shadow_origin_ptr mm/kmsan/instrumentation.c:31 [inline]
__msan_metadata_ptr_for_load_1+0x28/0x30 mm/kmsan/instrumentation.c:63
tomoyo_check_acl+0x1b0/0x630 security/tomoyo/domain.c:173
tomoyo_path_permission security/tomoyo/file.c:586 [inline]
tomoyo_check_open_permission+0x61f/0xe10 security/tomoyo/file.c:777
tomoyo_file_open+0x24f/0x2d0 security/tomoyo/tomoyo.c:311
security_file_open+0xb1/0x1f0 security/security.c:1635
do_dentry_open+0x4e4/0x1bf0 fs/open.c:809
vfs_open+0xaf/0xe0 fs/open.c:957
do_open fs/namei.c:3426 [inline]
path_openat+0x52f1/0x5dd0 fs/namei.c:3559
do_filp_open+0x306/0x760 fs/namei.c:3586
do_sys_openat2+0x263/0x8f0 fs/open.c:1212
do_sys_open fs/open.c:1228 [inline]
__do_sys_open fs/open.c:1236 [inline]
__se_sys_open fs/open.c:1232 [inline]
__x64_sys_open+0x314/0x380 fs/open.c:1232
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x44/0xae
Uninit was created at:
__alloc_pages+0xbc7/0x10a0 mm/page_alloc.c:5409
alloc_pages+0x8a5/0xb80
alloc_slab_page mm/slub.c:1810 [inline]
allocate_slab+0x287/0x1c20 mm/slub.c:1947
new_slab mm/slub.c:2010 [inline]
___slab_alloc+0xbdf/0x1e90 mm/slub.c:3039
__slab_alloc mm/slub.c:3126 [inline]
slab_alloc_node mm/slub.c:3217 [inline]
slab_alloc mm/slub.c:3259 [inline]
kmem_cache_alloc+0xbb3/0x11c0 mm/slub.c:3264
reqsk_alloc include/net/request_sock.h:91 [inline]
inet_reqsk_alloc+0xaf/0x8b0 net/ipv4/tcp_input.c:6712
tcp_conn_request+0x910/0x4dc0 net/ipv4/tcp_input.c:6852
tcp_v4_conn_request+0x218/0x2a0 net/ipv4/tcp_ipv4.c:1528
tcp_rcv_state_process+0x2c5/0x3290 net/ipv4/tcp_input.c:6406
tcp_v4_do_rcv+0xb4e/0x1330 net/ipv4/tcp_ipv4.c:1738
tcp_v4_rcv+0x468d/0x4ed0 net/ipv4/tcp_ipv4.c:2100
ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204
ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
NF_HOOK include/linux/netfilter.h:307 [inline]
ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252
dst_input include/net/dst.h:460 [inline]
ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline]
ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline]
ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609
ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644
__netif_receive_skb_list_ptype net/core/dev.c:5505 [inline]
__netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553
__netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605
netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696
gro_normal_list net/core/dev.c:5850 [inline]
napi_complete_done+0x579/0xdd0 net/core/dev.c:6587
virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557
__napi_poll+0x14e/0xbc0 net/core/dev.c:7020
napi_poll net/core/dev.c:7087 [inline]
net_rx_action+0x824/0x1880 net/core/dev.c:7174
__do_softirq+0x1fe/0x7eb kernel/softirq.c:558
Fixes: 342159ee394d ("net: avoid dirtying sk->sk_rx_queue_mapping")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20211130182939.2584764-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot found that __dev_queue_xmit() is reading txq->xmit_lock_owner
without annotations.
No serious issue there, let's document what is happening there.
BUG: KCSAN: data-race in __dev_queue_xmit / __dev_queue_xmit
write to 0xffff888139d09484 of 4 bytes by interrupt on cpu 0:
__netif_tx_unlock include/linux/netdevice.h:4437 [inline]
__dev_queue_xmit+0x948/0xf70 net/core/dev.c:4229
dev_queue_xmit_accel+0x19/0x20 net/core/dev.c:4265
macvlan_queue_xmit drivers/net/macvlan.c:543 [inline]
macvlan_start_xmit+0x2b3/0x3d0 drivers/net/macvlan.c:567
__netdev_start_xmit include/linux/netdevice.h:4987 [inline]
netdev_start_xmit include/linux/netdevice.h:5001 [inline]
xmit_one+0x105/0x2f0 net/core/dev.c:3590
dev_hard_start_xmit+0x72/0x120 net/core/dev.c:3606
sch_direct_xmit+0x1b2/0x7c0 net/sched/sch_generic.c:342
__dev_xmit_skb+0x83d/0x1370 net/core/dev.c:3817
__dev_queue_xmit+0x590/0xf70 net/core/dev.c:4194
dev_queue_xmit+0x13/0x20 net/core/dev.c:4259
neigh_hh_output include/net/neighbour.h:511 [inline]
neigh_output include/net/neighbour.h:525 [inline]
ip6_finish_output2+0x995/0xbb0 net/ipv6/ip6_output.c:126
__ip6_finish_output net/ipv6/ip6_output.c:191 [inline]
ip6_finish_output+0x444/0x4c0 net/ipv6/ip6_output.c:201
NF_HOOK_COND include/linux/netfilter.h:296 [inline]
ip6_output+0x10e/0x210 net/ipv6/ip6_output.c:224
dst_output include/net/dst.h:450 [inline]
NF_HOOK include/linux/netfilter.h:307 [inline]
ndisc_send_skb+0x486/0x610 net/ipv6/ndisc.c:508
ndisc_send_rs+0x3b0/0x3e0 net/ipv6/ndisc.c:702
addrconf_rs_timer+0x370/0x540 net/ipv6/addrconf.c:3898
call_timer_fn+0x2e/0x240 kernel/time/timer.c:1421
expire_timers+0x116/0x240 kernel/time/timer.c:1466
__run_timers+0x368/0x410 kernel/time/timer.c:1734
run_timer_softirq+0x2e/0x60 kernel/time/timer.c:1747
__do_softirq+0x158/0x2de kernel/softirq.c:558
__irq_exit_rcu kernel/softirq.c:636 [inline]
irq_exit_rcu+0x37/0x70 kernel/softirq.c:648
sysvec_apic_timer_interrupt+0x3e/0xb0 arch/x86/kernel/apic/apic.c:1097
asm_sysvec_apic_timer_interrupt+0x12/0x20
read to 0xffff888139d09484 of 4 bytes by interrupt on cpu 1:
__dev_queue_xmit+0x5e3/0xf70 net/core/dev.c:4213
dev_queue_xmit_accel+0x19/0x20 net/core/dev.c:4265
macvlan_queue_xmit drivers/net/macvlan.c:543 [inline]
macvlan_start_xmit+0x2b3/0x3d0 drivers/net/macvlan.c:567
__netdev_start_xmit include/linux/netdevice.h:4987 [inline]
netdev_start_xmit include/linux/netdevice.h:5001 [inline]
xmit_one+0x105/0x2f0 net/core/dev.c:3590
dev_hard_start_xmit+0x72/0x120 net/core/dev.c:3606
sch_direct_xmit+0x1b2/0x7c0 net/sched/sch_generic.c:342
__dev_xmit_skb+0x83d/0x1370 net/core/dev.c:3817
__dev_queue_xmit+0x590/0xf70 net/core/dev.c:4194
dev_queue_xmit+0x13/0x20 net/core/dev.c:4259
neigh_resolve_output+0x3db/0x410 net/core/neighbour.c:1523
neigh_output include/net/neighbour.h:527 [inline]
ip6_finish_output2+0x9be/0xbb0 net/ipv6/ip6_output.c:126
__ip6_finish_output net/ipv6/ip6_output.c:191 [inline]
ip6_finish_output+0x444/0x4c0 net/ipv6/ip6_output.c:201
NF_HOOK_COND include/linux/netfilter.h:296 [inline]
ip6_output+0x10e/0x210 net/ipv6/ip6_output.c:224
dst_output include/net/dst.h:450 [inline]
NF_HOOK include/linux/netfilter.h:307 [inline]
ndisc_send_skb+0x486/0x610 net/ipv6/ndisc.c:508
ndisc_send_rs+0x3b0/0x3e0 net/ipv6/ndisc.c:702
addrconf_rs_timer+0x370/0x540 net/ipv6/addrconf.c:3898
call_timer_fn+0x2e/0x240 kernel/time/timer.c:1421
expire_timers+0x116/0x240 kernel/time/timer.c:1466
__run_timers+0x368/0x410 kernel/time/timer.c:1734
run_timer_softirq+0x2e/0x60 kernel/time/timer.c:1747
__do_softirq+0x158/0x2de kernel/softirq.c:558
__irq_exit_rcu kernel/softirq.c:636 [inline]
irq_exit_rcu+0x37/0x70 kernel/softirq.c:648
sysvec_apic_timer_interrupt+0x8d/0xb0 arch/x86/kernel/apic/apic.c:1097
asm_sysvec_apic_timer_interrupt+0x12/0x20
kcsan_setup_watchpoint+0x94/0x420 kernel/kcsan/core.c:443
folio_test_anon include/linux/page-flags.h:581 [inline]
PageAnon include/linux/page-flags.h:586 [inline]
zap_pte_range+0x5ac/0x10e0 mm/memory.c:1347
zap_pmd_range mm/memory.c:1467 [inline]
zap_pud_range mm/memory.c:1496 [inline]
zap_p4d_range mm/memory.c:1517 [inline]
unmap_page_range+0x2dc/0x3d0 mm/memory.c:1538
unmap_single_vma+0x157/0x210 mm/memory.c:1583
unmap_vmas+0xd0/0x180 mm/memory.c:1615
exit_mmap+0x23d/0x470 mm/mmap.c:3170
__mmput+0x27/0x1b0 kernel/fork.c:1113
mmput+0x3d/0x50 kernel/fork.c:1134
exit_mm+0xdb/0x170 kernel/exit.c:507
do_exit+0x608/0x17a0 kernel/exit.c:819
do_group_exit+0xce/0x180 kernel/exit.c:929
get_signal+0xfc3/0x1550 kernel/signal.c:2852
arch_do_signal_or_restart+0x8c/0x2e0 arch/x86/kernel/signal.c:868
handle_signal_work kernel/entry/common.c:148 [inline]
exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
exit_to_user_mode_prepare+0x113/0x190 kernel/entry/common.c:207
__syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:300
do_syscall_64+0x50/0xd0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x00000000 -> 0xffffffff
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 28712 Comm: syz-executor.0 Tainted: G W 5.16.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20211130170155.2331929-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The 'kprobe::data_size' is unsigned, thus it can not be negative. But if
user sets it enough big number (e.g. (size_t)-8), the result of 'data_size
+ sizeof(struct kretprobe_instance)' becomes smaller than sizeof(struct
kretprobe_instance) or zero. In result, the kretprobe_instance are
allocated without enough memory, and kretprobe accesses outside of
allocated memory.
To avoid this issue, introduce a max limitation of the
kretprobe::data_size. 4KB per instance should be OK.
Link: https://lkml.kernel.org/r/163836995040.432120.10322772773821182925.stgit@devnote2
Cc: stable@vger.kernel.org
Fixes: f47cd9b553aa ("kprobes: kretprobe user entry-handler")
Reported-by: zhangyue <zhangyue1@kylinos.cn>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes. A large series is found for ASoC tegra
drivers to correct the control element handlings, while others are
mostly for device-specific quirks and fix-ups"
* tag 'sound-5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (25 commits)
ALSA: hda/hdmi: fix HDA codec entry table order for ADL-P
ALSA: hda: Add Intel DG2 PCI ID and HDMI codec vid
ALSA: hda/cs8409: Set PMSG_ON earlier inside cs8409 driver
ASoC: SOF: hda: reset DAI widget before reconfiguring it
ASoC: cs35l41: Set the max SPI speed for the whole device
ALSA: intel-dsp-config: add quirk for CML devices based on ES8336 codec
ASoC: Intel: soc-acpi: add entry for ESSX8336 on CML
ASoC: rk817: Add module alias for rk817-codec
ASoC: soc-acpi: Set mach->id field on comp_ids matches
ASoC: tegra: Fix kcontrol put callback in Mixer
ASoC: tegra: Fix kcontrol put callback in ADX
ASoC: tegra: Fix kcontrol put callback in AMX
ASoC: tegra: Fix kcontrol put callback in SFC
ASoC: tegra: Fix kcontrol put callback in MVC
ASoC: tegra: Fix kcontrol put callback in AHUB
ASoC: tegra: Fix kcontrol put callback in DSPK
ASoC: tegra: Fix kcontrol put callback in DMIC
ASoC: tegra: Fix kcontrol put callback in I2S
ASoC: tegra: Fix kcontrol put callback in ADMAIF
ASoC: tegra: Fix wrong value type in MVC
...
|
|
Validate MRTC register is supported before triggering a delayed work
which accesses it.
Fixes: 5a1023deeed0 ("net/mlx5: Add periodic update of host time to firmware")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
because the ordinary load/store instructions (ldr, ldrh, ldrb) can
tolerate any misalignment of the memory address. However, load/store
double and load/store multiple instructions (ldrd, ldm) may still only
be used on memory addresses that are 32-bit aligned, and so we have to
use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we
may end up with a severe performance hit due to alignment traps that
require fixups by the kernel. Testing shows that this currently happens
with clang-13 but not gcc-11. In theory, any compiler version can
produce this bug or other problems, as we are dealing with undefined
behavior in C99 even on architectures that support this in hardware,
see also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363.
Fortunately, the get_unaligned() accessors do the right thing: when
building for ARMv6 or later, the compiler will emit unaligned accesses
using the ordinary load/store instructions (but avoid the ones that
require 32-bit alignment). When building for older ARM, those accessors
will emit the appropriate sequence of ldrb/mov/orr instructions. And on
architectures that can truly tolerate any kind of misalignment, the
get_unaligned() accessors resolve to the leXX_to_cpup accessors that
operate on aligned addresses.
Since the compiler will in fact emit ldrd or ldm instructions when
building this code for ARM v6 or later, the solution is to use the
unaligned accessors unconditionally on architectures where this is
known to be fast. The _aligned version of the hash function is
however still needed to get the best performance on architectures
that cannot do any unaligned access in hardware.
This new version avoids the undefined behavior and should produce
the fastest hash on all architectures we support.
Link: https://lore.kernel.org/linux-arm-kernel/20181008211554.5355-4-ard.biesheuvel@linaro.org/
Link: https://lore.kernel.org/linux-crypto/CAK8P3a2KfmmGDbVHULWevB0hv71P2oi2ZCHEAqT=8dQfa0=cqQ@mail.gmail.com/
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Fixes: 2c956a60778c ("siphash: add cryptographically secure PRF")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Each peer's endpoint contains a dst_cache entry that takes a reference
to another netdev. When the containing namespace exits, we take down the
socket and prevent future sockets from being created (by setting
creating_net to NULL), which removes that potential reference on the
netns. However, it doesn't release references to the netns that a netdev
cached in dst_cache might be taking, so the netns still might fail to
exit. Since the socket is gimped anyway, we can simply clear all the
dst_caches (by way of clearing the endpoint src), which will release all
references.
However, the current dst_cache_reset function only releases those
references lazily. But it turns out that all of our usages of
wg_socket_clear_peer_endpoint_src are called from contexts that are not
exactly high-speed or bottle-necked. For example, when there's
connection difficulty, or when userspace is reconfiguring the interface.
And in particular for this patch, when the netns is exiting. So for
those cases, it makes more sense to call dst_release immediately. For
that, we add a small helper function to dst_cache.
This patch also adds a test to netns.sh from Hangbin Liu to ensure this
doesn't regress.
Tested-by: Hangbin Liu <liuhangbin@gmail.com>
Reported-by: Xiumei Mu <xmu@redhat.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Fixes: 900575aa33a3 ("wireguard: device: avoid circular netns references")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The kernel leaks memory when a `fib` rule is present in IPv6 nftables
firewall rules and a suppress_prefix rule is present in the IPv6 routing
rules (used by certain tools such as wg-quick). In such scenarios, every
incoming packet will leak an allocation in `ip6_dst_cache` slab cache.
After some hours of `bpftrace`-ing and source code reading, I tracked
down the issue to ca7a03c41753 ("ipv6: do not free rt if
FIB_LOOKUP_NOREF is set on suppress rule").
The problem with that change is that the generic `args->flags` always have
`FIB_LOOKUP_NOREF` set[1][2] but the IPv6-specific flag
`RT6_LOOKUP_F_DST_NOREF` might not be, leading to `fib6_rule_suppress` not
decreasing the refcount when needed.
How to reproduce:
- Add the following nftables rule to a prerouting chain:
meta nfproto ipv6 fib saddr . mark . iif oif missing drop
This can be done with:
sudo nft create table inet test
sudo nft create chain inet test test_chain '{ type filter hook prerouting priority filter + 10; policy accept; }'
sudo nft add rule inet test test_chain meta nfproto ipv6 fib saddr . mark . iif oif missing drop
- Run:
sudo ip -6 rule add table main suppress_prefixlength 0
- Watch `sudo slabtop -o | grep ip6_dst_cache` to see memory usage increase
with every incoming ipv6 packet.
This patch exposes the protocol-specific flags to the protocol
specific `suppress` function, and check the protocol-specific `flags`
argument for RT6_LOOKUP_F_DST_NOREF instead of the generic
FIB_LOOKUP_NOREF when decreasing the refcount, like this.
[1]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L71
[2]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L99
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215105
Fixes: ca7a03c41753 ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Steffen reported a TCP stream corruption for HTTP requests
served by the apache web-server using a cifs mount-point
and memory mapping the relevant file.
The root cause is quite similar to the one addressed by
commit 20eb4f29b602 ("net: fix sk_page_frag() recursion from
memory reclaim"). Here the nested access to the task page frag
is caused by a page fault on the (mmapped) user-space memory
buffer coming from the cifs file.
The page fault handler performs an smb transaction on a different
socket, inside the same process context. Since sk->sk_allaction
for such socket does not prevent the usage for the task_frag,
the nested allocation modify "under the hood" the page frag
in use by the outer sendmsg call, corrupting the stream.
The overall relevant stack trace looks like the following:
httpd 78268 [001] 3461630.850950: probe:tcp_sendmsg_locked:
ffffffff91461d91 tcp_sendmsg_locked+0x1
ffffffff91462b57 tcp_sendmsg+0x27
ffffffff9139814e sock_sendmsg+0x3e
ffffffffc06dfe1d smb_send_kvec+0x28
[...]
ffffffffc06cfaf8 cifs_readpages+0x213
ffffffff90e83c4b read_pages+0x6b
ffffffff90e83f31 __do_page_cache_readahead+0x1c1
ffffffff90e79e98 filemap_fault+0x788
ffffffff90eb0458 __do_fault+0x38
ffffffff90eb5280 do_fault+0x1a0
ffffffff90eb7c84 __handle_mm_fault+0x4d4
ffffffff90eb8093 handle_mm_fault+0xc3
ffffffff90c74f6d __do_page_fault+0x1ed
ffffffff90c75277 do_page_fault+0x37
ffffffff9160111e page_fault+0x1e
ffffffff9109e7b5 copyin+0x25
ffffffff9109eb40 _copy_from_iter_full+0xe0
ffffffff91462370 tcp_sendmsg_locked+0x5e0
ffffffff91462370 tcp_sendmsg_locked+0x5e0
ffffffff91462b57 tcp_sendmsg+0x27
ffffffff9139815c sock_sendmsg+0x4c
ffffffff913981f7 sock_write_iter+0x97
ffffffff90f2cc56 do_iter_readv_writev+0x156
ffffffff90f2dff0 do_iter_write+0x80
ffffffff90f2e1c3 vfs_writev+0xa3
ffffffff90f2e27c do_writev+0x5c
ffffffff90c042bb do_syscall_64+0x5b
ffffffff916000ad entry_SYSCALL_64_after_hwframe+0x65
The cifs filesystem rightfully sets sk_allocations to GFP_NOFS,
we can avoid the nesting using the sk page frag for allocation
lacking the __GFP_FS flag. Do not define an additional mm-helper
for that, as this is strictly tied to the sk page frag usage.
v1 -> v2:
- use a stricted sk_page_frag() check instead of reordering the
code (Eric)
Reported-by: Steffen Froemer <sfroemer@redhat.com>
Fixes: 5640f7685831 ("net: use a per task frag allocator")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|