From 14ebc28e07e68ff412aa42f7d8b67969e2f63d00 Mon Sep 17 00:00:00 2001 From: Matthew Wilcox Date: Fri, 22 Dec 2017 06:32:16 -0800 Subject: errseq: Add to documentation tree - Move errseq.rst into core-api - Add errseq to the core-api index - Promote the header to a more prominent header type, otherwise we get three entries in the table of contents. - Reformat the table to look nicer and be a little more proportional in terms of horizontal width per bit (the SF bit is still disproportionately large, but there's no way to fix that). - Include errseq kernel-doc in the errseq.rst - Neaten some kernel-doc markup Signed-off-by: Matthew Wilcox Reviewed-by: Jeff Layton Reviewed-by: Randy Dunlap Signed-off-by: Jonathan Corbet --- Documentation/core-api/errseq.rst | 159 ++++++++++++++++++++++++++++++++++++++ Documentation/core-api/index.rst | 1 + Documentation/errseq.rst | 149 ----------------------------------- include/linux/errseq.h | 2 +- lib/errseq.c | 37 +++++---- 5 files changed, 182 insertions(+), 166 deletions(-) create mode 100644 Documentation/core-api/errseq.rst delete mode 100644 Documentation/errseq.rst diff --git a/Documentation/core-api/errseq.rst b/Documentation/core-api/errseq.rst new file mode 100644 index 000000000000..ff332e272405 --- /dev/null +++ b/Documentation/core-api/errseq.rst @@ -0,0 +1,159 @@ +===================== +The errseq_t datatype +===================== + +An errseq_t is a way of recording errors in one place, and allowing any +number of "subscribers" to tell whether it has changed since a previous +point where it was sampled. + +The initial use case for this is tracking errors for file +synchronization syscalls (fsync, fdatasync, msync and sync_file_range), +but it may be usable in other situations. + +It's implemented as an unsigned 32-bit value. The low order bits are +designated to hold an error code (between 1 and MAX_ERRNO). The upper bits +are used as a counter. This is done with atomics instead of locking so that +these functions can be called from any context. + +Note that there is a risk of collisions if new errors are being recorded +frequently, since we have so few bits to use as a counter. + +To mitigate this, the bit between the error value and counter is used as +a flag to tell whether the value has been sampled since a new value was +recorded. That allows us to avoid bumping the counter if no one has +sampled it since the last time an error was recorded. + +Thus we end up with a value that looks something like this: + ++--------------------------------------+----+------------------------+ +| 31..13 | 12 | 11..0 | ++--------------------------------------+----+------------------------+ +| counter | SF | errno | ++--------------------------------------+----+------------------------+ + +The general idea is for "watchers" to sample an errseq_t value and keep +it as a running cursor. That value can later be used to tell whether +any new errors have occurred since that sampling was done, and atomically +record the state at the time that it was checked. This allows us to +record errors in one place, and then have a number of "watchers" that +can tell whether the value has changed since they last checked it. + +A new errseq_t should always be zeroed out. An errseq_t value of all zeroes +is the special (but common) case where there has never been an error. An all +zero value thus serves as the "epoch" if one wishes to know whether there +has ever been an error set since it was first initialized. + +API usage +========= + +Let me tell you a story about a worker drone. Now, he's a good worker +overall, but the company is a little...management heavy. He has to +report to 77 supervisors today, and tomorrow the "big boss" is coming in +from out of town and he's sure to test the poor fellow too. + +They're all handing him work to do -- so much he can't keep track of who +handed him what, but that's not really a big problem. The supervisors +just want to know when he's finished all of the work they've handed him so +far and whether he made any mistakes since they last asked. + +He might have made the mistake on work they didn't actually hand him, +but he can't keep track of things at that level of detail, all he can +remember is the most recent mistake that he made. + +Here's our worker_drone representation:: + + struct worker_drone { + errseq_t wd_err; /* for recording errors */ + }; + +Every day, the worker_drone starts out with a blank slate:: + + struct worker_drone wd; + + wd.wd_err = (errseq_t)0; + +The supervisors come in and get an initial read for the day. They +don't care about anything that happened before their watch begins:: + + struct supervisor { + errseq_t s_wd_err; /* private "cursor" for wd_err */ + spinlock_t s_wd_err_lock; /* protects s_wd_err */ + } + + struct supervisor su; + + su.s_wd_err = errseq_sample(&wd.wd_err); + spin_lock_init(&su.s_wd_err_lock); + +Now they start handing him tasks to do. Every few minutes they ask him to +finish up all of the work they've handed him so far. Then they ask him +whether he made any mistakes on any of it:: + + spin_lock(&su.su_wd_err_lock); + err = errseq_check_and_advance(&wd.wd_err, &su.s_wd_err); + spin_unlock(&su.su_wd_err_lock); + +Up to this point, that just keeps returning 0. + +Now, the owners of this company are quite miserly and have given him +substandard equipment with which to do his job. Occasionally it +glitches and he makes a mistake. He sighs a heavy sigh, and marks it +down:: + + errseq_set(&wd.wd_err, -EIO); + +...and then gets back to work. The supervisors eventually poll again +and they each get the error when they next check. Subsequent calls will +return 0, until another error is recorded, at which point it's reported +to each of them once. + +Note that the supervisors can't tell how many mistakes he made, only +whether one was made since they last checked, and the latest value +recorded. + +Occasionally the big boss comes in for a spot check and asks the worker +to do a one-off job for him. He's not really watching the worker +full-time like the supervisors, but he does need to know whether a +mistake occurred while his job was processing. + +He can just sample the current errseq_t in the worker, and then use that +to tell whether an error has occurred later:: + + errseq_t since = errseq_sample(&wd.wd_err); + /* submit some work and wait for it to complete */ + err = errseq_check(&wd.wd_err, since); + +Since he's just going to discard "since" after that point, he doesn't +need to advance it here. He also doesn't need any locking since it's +not usable by anyone else. + +Serializing errseq_t cursor updates +=================================== + +Note that the errseq_t API does not protect the errseq_t cursor during a +check_and_advance_operation. Only the canonical error code is handled +atomically. In a situation where more than one task might be using the +same errseq_t cursor at the same time, it's important to serialize +updates to that cursor. + +If that's not done, then it's possible for the cursor to go backward +in which case the same error could be reported more than once. + +Because of this, it's often advantageous to first do an errseq_check to +see if anything has changed, and only later do an +errseq_check_and_advance after taking the lock. e.g.:: + + if (errseq_check(&wd.wd_err, READ_ONCE(su.s_wd_err)) { + /* su.s_wd_err is protected by s_wd_err_lock */ + spin_lock(&su.s_wd_err_lock); + err = errseq_check_and_advance(&wd.wd_err, &su.s_wd_err); + spin_unlock(&su.s_wd_err_lock); + } + +That avoids the spinlock in the common case where nothing has changed +since the last time it was checked. + +Functions +========= + +.. kernel-doc:: lib/errseq.c diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index d55ee6b006ed..1b1fd01990b5 100644 --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst @@ -22,6 +22,7 @@ Core utilities flexible-arrays librs genalloc + errseq printk-formats Interfaces for kernel debugging diff --git a/Documentation/errseq.rst b/Documentation/errseq.rst deleted file mode 100644 index 4c29bd5afbc5..000000000000 --- a/Documentation/errseq.rst +++ /dev/null @@ -1,149 +0,0 @@ -The errseq_t datatype -===================== -An errseq_t is a way of recording errors in one place, and allowing any -number of "subscribers" to tell whether it has changed since a previous -point where it was sampled. - -The initial use case for this is tracking errors for file -synchronization syscalls (fsync, fdatasync, msync and sync_file_range), -but it may be usable in other situations. - -It's implemented as an unsigned 32-bit value. The low order bits are -designated to hold an error code (between 1 and MAX_ERRNO). The upper bits -are used as a counter. This is done with atomics instead of locking so that -these functions can be called from any context. - -Note that there is a risk of collisions if new errors are being recorded -frequently, since we have so few bits to use as a counter. - -To mitigate this, the bit between the error value and counter is used as -a flag to tell whether the value has been sampled since a new value was -recorded. That allows us to avoid bumping the counter if no one has -sampled it since the last time an error was recorded. - -Thus we end up with a value that looks something like this:: - - bit: 31..13 12 11..0 - +-----------------+----+----------------+ - | counter | SF | errno | - +-----------------+----+----------------+ - -The general idea is for "watchers" to sample an errseq_t value and keep -it as a running cursor. That value can later be used to tell whether -any new errors have occurred since that sampling was done, and atomically -record the state at the time that it was checked. This allows us to -record errors in one place, and then have a number of "watchers" that -can tell whether the value has changed since they last checked it. - -A new errseq_t should always be zeroed out. An errseq_t value of all zeroes -is the special (but common) case where there has never been an error. An all -zero value thus serves as the "epoch" if one wishes to know whether there -has ever been an error set since it was first initialized. - -API usage -========= -Let me tell you a story about a worker drone. Now, he's a good worker -overall, but the company is a little...management heavy. He has to -report to 77 supervisors today, and tomorrow the "big boss" is coming in -from out of town and he's sure to test the poor fellow too. - -They're all handing him work to do -- so much he can't keep track of who -handed him what, but that's not really a big problem. The supervisors -just want to know when he's finished all of the work they've handed him so -far and whether he made any mistakes since they last asked. - -He might have made the mistake on work they didn't actually hand him, -but he can't keep track of things at that level of detail, all he can -remember is the most recent mistake that he made. - -Here's our worker_drone representation:: - - struct worker_drone { - errseq_t wd_err; /* for recording errors */ - }; - -Every day, the worker_drone starts out with a blank slate:: - - struct worker_drone wd; - - wd.wd_err = (errseq_t)0; - -The supervisors come in and get an initial read for the day. They -don't care about anything that happened before their watch begins:: - - struct supervisor { - errseq_t s_wd_err; /* private "cursor" for wd_err */ - spinlock_t s_wd_err_lock; /* protects s_wd_err */ - } - - struct supervisor su; - - su.s_wd_err = errseq_sample(&wd.wd_err); - spin_lock_init(&su.s_wd_err_lock); - -Now they start handing him tasks to do. Every few minutes they ask him to -finish up all of the work they've handed him so far. Then they ask him -whether he made any mistakes on any of it:: - - spin_lock(&su.su_wd_err_lock); - err = errseq_check_and_advance(&wd.wd_err, &su.s_wd_err); - spin_unlock(&su.su_wd_err_lock); - -Up to this point, that just keeps returning 0. - -Now, the owners of this company are quite miserly and have given him -substandard equipment with which to do his job. Occasionally it -glitches and he makes a mistake. He sighs a heavy sigh, and marks it -down:: - - errseq_set(&wd.wd_err, -EIO); - -...and then gets back to work. The supervisors eventually poll again -and they each get the error when they next check. Subsequent calls will -return 0, until another error is recorded, at which point it's reported -to each of them once. - -Note that the supervisors can't tell how many mistakes he made, only -whether one was made since they last checked, and the latest value -recorded. - -Occasionally the big boss comes in for a spot check and asks the worker -to do a one-off job for him. He's not really watching the worker -full-time like the supervisors, but he does need to know whether a -mistake occurred while his job was processing. - -He can just sample the current errseq_t in the worker, and then use that -to tell whether an error has occurred later:: - - errseq_t since = errseq_sample(&wd.wd_err); - /* submit some work and wait for it to complete */ - err = errseq_check(&wd.wd_err, since); - -Since he's just going to discard "since" after that point, he doesn't -need to advance it here. He also doesn't need any locking since it's -not usable by anyone else. - -Serializing errseq_t cursor updates -=================================== -Note that the errseq_t API does not protect the errseq_t cursor during a -check_and_advance_operation. Only the canonical error code is handled -atomically. In a situation where more than one task might be using the -same errseq_t cursor at the same time, it's important to serialize -updates to that cursor. - -If that's not done, then it's possible for the cursor to go backward -in which case the same error could be reported more than once. - -Because of this, it's often advantageous to first do an errseq_check to -see if anything has changed, and only later do an -errseq_check_and_advance after taking the lock. e.g.:: - - if (errseq_check(&wd.wd_err, READ_ONCE(su.s_wd_err)) { - /* su.s_wd_err is protected by s_wd_err_lock */ - spin_lock(&su.s_wd_err_lock); - err = errseq_check_and_advance(&wd.wd_err, &su.s_wd_err); - spin_unlock(&su.s_wd_err_lock); - } - -That avoids the spinlock in the common case where nothing has changed -since the last time it was checked. diff --git a/include/linux/errseq.h b/include/linux/errseq.h index 6ffae9c5052d..fc2777770768 100644 --- a/include/linux/errseq.h +++ b/include/linux/errseq.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0 */ /* - * See Documentation/errseq.rst and lib/errseq.c + * See Documentation/core-api/errseq.rst and lib/errseq.c */ #ifndef _LINUX_ERRSEQ_H #define _LINUX_ERRSEQ_H diff --git a/lib/errseq.c b/lib/errseq.c index 79cc66897db4..df782418b333 100644 --- a/lib/errseq.c +++ b/lib/errseq.c @@ -46,14 +46,14 @@ * @eseq: errseq_t field that should be set * @err: error to set (must be between -1 and -MAX_ERRNO) * - * This function sets the error in *eseq, and increments the sequence counter + * This function sets the error in @eseq, and increments the sequence counter * if the last sequence was sampled at some point in the past. * * Any error set will always overwrite an existing error. * - * We do return the latest value here, primarily for debugging purposes. The - * return value should not be used as a previously sampled value in later calls - * as it will not have the SEEN flag set. + * Return: The previous value, primarily for debugging purposes. The + * return value should not be used as a previously sampled value in later + * calls as it will not have the SEEN flag set. */ errseq_t errseq_set(errseq_t *eseq, int err) { @@ -108,11 +108,13 @@ errseq_t errseq_set(errseq_t *eseq, int err) EXPORT_SYMBOL(errseq_set); /** - * errseq_sample - grab current errseq_t value - * @eseq: pointer to errseq_t to be sampled + * errseq_sample() - Grab current errseq_t value. + * @eseq: Pointer to errseq_t to be sampled. * * This function allows callers to sample an errseq_t value, marking it as * "seen" if required. + * + * Return: The current errseq value. */ errseq_t errseq_sample(errseq_t *eseq) { @@ -134,15 +136,15 @@ errseq_t errseq_sample(errseq_t *eseq) EXPORT_SYMBOL(errseq_sample); /** - * errseq_check - has an error occurred since a particular sample point? - * @eseq: pointer to errseq_t value to be checked - * @since: previously-sampled errseq_t from which to check + * errseq_check() - Has an error occurred since a particular sample point? + * @eseq: Pointer to errseq_t value to be checked. + * @since: Previously-sampled errseq_t from which to check. * - * Grab the value that eseq points to, and see if it has changed "since" - * the given value was sampled. The "since" value is not advanced, so there + * Grab the value that eseq points to, and see if it has changed @since + * the given value was sampled. The @since value is not advanced, so there * is no need to mark the value as seen. * - * Returns the latest error set in the errseq_t or 0 if it hasn't changed. + * Return: The latest error set in the errseq_t or 0 if it hasn't changed. */ int errseq_check(errseq_t *eseq, errseq_t since) { @@ -155,11 +157,11 @@ int errseq_check(errseq_t *eseq, errseq_t since) EXPORT_SYMBOL(errseq_check); /** - * errseq_check_and_advance - check an errseq_t and advance to current value - * @eseq: pointer to value being checked and reported - * @since: pointer to previously-sampled errseq_t to check against and advance + * errseq_check_and_advance() - Check an errseq_t and advance to current value. + * @eseq: Pointer to value being checked and reported. + * @since: Pointer to previously-sampled errseq_t to check against and advance. * - * Grab the eseq value, and see whether it matches the value that "since" + * Grab the eseq value, and see whether it matches the value that @since * points to. If it does, then just return 0. * * If it doesn't, then the value has changed. Set the "seen" flag, and try to @@ -170,6 +172,9 @@ EXPORT_SYMBOL(errseq_check); * value. The caller must provide that if necessary. Because of this, callers * may want to do a lockless errseq_check before taking the lock and calling * this. + * + * Return: Negative errno if one has been stored, or 0 if no new error has + * occurred. */ int errseq_check_and_advance(errseq_t *eseq, errseq_t *since) { -- cgit