summaryrefslogtreecommitdiff
path: root/Documentation/core-api
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/core-api')
-rw-r--r--Documentation/core-api/errseq.rst159
-rw-r--r--Documentation/core-api/index.rst3
-rw-r--r--Documentation/core-api/kernel-api.rst15
-rw-r--r--Documentation/core-api/printk-formats.rst492
-rw-r--r--Documentation/core-api/refcount-vs-atomic.rst150
5 files changed, 819 insertions, 0 deletions
diff --git a/Documentation/core-api/errseq.rst b/Documentation/core-api/errseq.rst
new file mode 100644
index 000000000000..ff332e272405
--- /dev/null
+++ b/Documentation/core-api/errseq.rst
@@ -0,0 +1,159 @@
+=====================
+The errseq_t datatype
+=====================
+
+An errseq_t is a way of recording errors in one place, and allowing any
+number of "subscribers" to tell whether it has changed since a previous
+point where it was sampled.
+
+The initial use case for this is tracking errors for file
+synchronization syscalls (fsync, fdatasync, msync and sync_file_range),
+but it may be usable in other situations.
+
+It's implemented as an unsigned 32-bit value. The low order bits are
+designated to hold an error code (between 1 and MAX_ERRNO). The upper bits
+are used as a counter. This is done with atomics instead of locking so that
+these functions can be called from any context.
+
+Note that there is a risk of collisions if new errors are being recorded
+frequently, since we have so few bits to use as a counter.
+
+To mitigate this, the bit between the error value and counter is used as
+a flag to tell whether the value has been sampled since a new value was
+recorded. That allows us to avoid bumping the counter if no one has
+sampled it since the last time an error was recorded.
+
+Thus we end up with a value that looks something like this:
+
++--------------------------------------+----+------------------------+
+| 31..13 | 12 | 11..0 |
++--------------------------------------+----+------------------------+
+| counter | SF | errno |
++--------------------------------------+----+------------------------+
+
+The general idea is for "watchers" to sample an errseq_t value and keep
+it as a running cursor. That value can later be used to tell whether
+any new errors have occurred since that sampling was done, and atomically
+record the state at the time that it was checked. This allows us to
+record errors in one place, and then have a number of "watchers" that
+can tell whether the value has changed since they last checked it.
+
+A new errseq_t should always be zeroed out. An errseq_t value of all zeroes
+is the special (but common) case where there has never been an error. An all
+zero value thus serves as the "epoch" if one wishes to know whether there
+has ever been an error set since it was first initialized.
+
+API usage
+=========
+
+Let me tell you a story about a worker drone. Now, he's a good worker
+overall, but the company is a little...management heavy. He has to
+report to 77 supervisors today, and tomorrow the "big boss" is coming in
+from out of town and he's sure to test the poor fellow too.
+
+They're all handing him work to do -- so much he can't keep track of who
+handed him what, but that's not really a big problem. The supervisors
+just want to know when he's finished all of the work they've handed him so
+far and whether he made any mistakes since they last asked.
+
+He might have made the mistake on work they didn't actually hand him,
+but he can't keep track of things at that level of detail, all he can
+remember is the most recent mistake that he made.
+
+Here's our worker_drone representation::
+
+ struct worker_drone {
+ errseq_t wd_err; /* for recording errors */
+ };
+
+Every day, the worker_drone starts out with a blank slate::
+
+ struct worker_drone wd;
+
+ wd.wd_err = (errseq_t)0;
+
+The supervisors come in and get an initial read for the day. They
+don't care about anything that happened before their watch begins::
+
+ struct supervisor {
+ errseq_t s_wd_err; /* private "cursor" for wd_err */
+ spinlock_t s_wd_err_lock; /* protects s_wd_err */
+ }
+
+ struct supervisor su;
+
+ su.s_wd_err = errseq_sample(&wd.wd_err);
+ spin_lock_init(&su.s_wd_err_lock);
+
+Now they start handing him tasks to do. Every few minutes they ask him to
+finish up all of the work they've handed him so far. Then they ask him
+whether he made any mistakes on any of it::
+
+ spin_lock(&su.su_wd_err_lock);
+ err = errseq_check_and_advance(&wd.wd_err, &su.s_wd_err);
+ spin_unlock(&su.su_wd_err_lock);
+
+Up to this point, that just keeps returning 0.
+
+Now, the owners of this company are quite miserly and have given him
+substandard equipment with which to do his job. Occasionally it
+glitches and he makes a mistake. He sighs a heavy sigh, and marks it
+down::
+
+ errseq_set(&wd.wd_err, -EIO);
+
+...and then gets back to work. The supervisors eventually poll again
+and they each get the error when they next check. Subsequent calls will
+return 0, until another error is recorded, at which point it's reported
+to each of them once.
+
+Note that the supervisors can't tell how many mistakes he made, only
+whether one was made since they last checked, and the latest value
+recorded.
+
+Occasionally the big boss comes in for a spot check and asks the worker
+to do a one-off job for him. He's not really watching the worker
+full-time like the supervisors, but he does need to know whether a
+mistake occurred while his job was processing.
+
+He can just sample the current errseq_t in the worker, and then use that
+to tell whether an error has occurred later::
+
+ errseq_t since = errseq_sample(&wd.wd_err);
+ /* submit some work and wait for it to complete */
+ err = errseq_check(&wd.wd_err, since);
+
+Since he's just going to discard "since" after that point, he doesn't
+need to advance it here. He also doesn't need any locking since it's
+not usable by anyone else.
+
+Serializing errseq_t cursor updates
+===================================
+
+Note that the errseq_t API does not protect the errseq_t cursor during a
+check_and_advance_operation. Only the canonical error code is handled
+atomically. In a situation where more than one task might be using the
+same errseq_t cursor at the same time, it's important to serialize
+updates to that cursor.
+
+If that's not done, then it's possible for the cursor to go backward
+in which case the same error could be reported more than once.
+
+Because of this, it's often advantageous to first do an errseq_check to
+see if anything has changed, and only later do an
+errseq_check_and_advance after taking the lock. e.g.::
+
+ if (errseq_check(&wd.wd_err, READ_ONCE(su.s_wd_err)) {
+ /* su.s_wd_err is protected by s_wd_err_lock */
+ spin_lock(&su.s_wd_err_lock);
+ err = errseq_check_and_advance(&wd.wd_err, &su.s_wd_err);
+ spin_unlock(&su.s_wd_err_lock);
+ }
+
+That avoids the spinlock in the common case where nothing has changed
+since the last time it was checked.
+
+Functions
+=========
+
+.. kernel-doc:: lib/errseq.c
diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index d5bbe035316d..1b1fd01990b5 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -14,6 +14,7 @@ Core utilities
kernel-api
assoc_array
atomic_ops
+ refcount-vs-atomic
cpu_hotplug
local_ops
workqueue
@@ -21,6 +22,8 @@ Core utilities
flexible-arrays
librs
genalloc
+ errseq
+ printk-formats
Interfaces for kernel debugging
===============================
diff --git a/Documentation/core-api/kernel-api.rst b/Documentation/core-api/kernel-api.rst
index 2d9da6c40a4d..e7fadf02c511 100644
--- a/Documentation/core-api/kernel-api.rst
+++ b/Documentation/core-api/kernel-api.rst
@@ -139,6 +139,21 @@ Division Functions
.. kernel-doc:: lib/gcd.c
:export:
+Sorting
+-------
+
+.. kernel-doc:: lib/sort.c
+ :export:
+
+.. kernel-doc:: lib/list_sort.c
+ :export:
+
+UUID/GUID
+---------
+
+.. kernel-doc:: lib/uuid.c
+ :export:
+
Memory Management in Linux
==========================
diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst
new file mode 100644
index 000000000000..258b46435320
--- /dev/null
+++ b/Documentation/core-api/printk-formats.rst
@@ -0,0 +1,492 @@
+=========================================
+How to get printk format specifiers right
+=========================================
+
+:Author: Randy Dunlap <rdunlap@infradead.org>
+:Author: Andrew Murray <amurray@mpc-data.co.uk>
+
+
+Integer types
+=============
+
+::
+
+ If variable is of Type, use printk format specifier:
+ ------------------------------------------------------------
+ int %d or %x
+ unsigned int %u or %x
+ long %ld or %lx
+ unsigned long %lu or %lx
+ long long %lld or %llx
+ unsigned long long %llu or %llx
+ size_t %zu or %zx
+ ssize_t %zd or %zx
+ s32 %d or %x
+ u32 %u or %x
+ s64 %lld or %llx
+ u64 %llu or %llx
+
+
+If <type> is dependent on a config option for its size (e.g., sector_t,
+blkcnt_t) or is architecture-dependent for its size (e.g., tcflag_t), use a
+format specifier of its largest possible type and explicitly cast to it.
+
+Example::
+
+ printk("test: sector number/total blocks: %llu/%llu\n",
+ (unsigned long long)sector, (unsigned long long)blockcount);
+
+Reminder: sizeof() returns type size_t.
+
+The kernel's printf does not support %n. Floating point formats (%e, %f,
+%g, %a) are also not recognized, for obvious reasons. Use of any
+unsupported specifier or length qualifier results in a WARN and early
+return from vsnprintf().
+
+Pointer types
+=============
+
+A raw pointer value may be printed with %p which will hash the address
+before printing. The kernel also supports extended specifiers for printing
+pointers of different types.
+
+Plain Pointers
+--------------
+
+::
+
+ %p abcdef12 or 00000000abcdef12
+
+Pointers printed without a specifier extension (i.e unadorned %p) are
+hashed to prevent leaking information about the kernel memory layout. This
+has the added benefit of providing a unique identifier. On 64-bit machines
+the first 32 bits are zeroed. If you *really* want the address see %px
+below.
+
+Symbols/Function Pointers
+-------------------------
+
+::
+
+ %pF versatile_init+0x0/0x110
+ %pf versatile_init
+ %pS versatile_init+0x0/0x110
+ %pSR versatile_init+0x9/0x110
+ (with __builtin_extract_return_addr() translation)
+ %ps versatile_init
+ %pB prev_fn_of_versatile_init+0x88/0x88
+
+
+The ``F`` and ``f`` specifiers are for printing function pointers,
+for example, f->func, &gettimeofday. They have the same result as
+``S`` and ``s`` specifiers. But they do an extra conversion on
+ia64, ppc64 and parisc64 architectures where the function pointers
+are actually function descriptors.
+
+The ``S`` and ``s`` specifiers can be used for printing symbols
+from direct addresses, for example, __builtin_return_address(0),
+(void *)regs->ip. They result in the symbol name with (S) or
+without (s) offsets. If KALLSYMS are disabled then the symbol
+address is printed instead.
+
+The ``B`` specifier results in the symbol name with offsets and should be
+used when printing stack backtraces. The specifier takes into
+consideration the effect of compiler optimisations which may occur
+when tail-calls are used and marked with the noreturn GCC attribute.
+
+Examples::
+
+ printk("Going to call: %pF\n", gettimeofday);
+ printk("Going to call: %pF\n", p->func);
+ printk("%s: called from %pS\n", __func__, (void *)_RET_IP_);
+ printk("%s: called from %pS\n", __func__,
+ (void *)__builtin_return_address(0));
+ printk("Faulted at %pS\n", (void *)regs->ip);
+ printk(" %s%pB\n", (reliable ? "" : "? "), (void *)*stack);
+
+Kernel Pointers
+---------------
+
+::
+
+ %pK 01234567 or 0123456789abcdef
+
+For printing kernel pointers which should be hidden from unprivileged
+users. The behaviour of %pK depends on the kptr_restrict sysctl - see
+Documentation/sysctl/kernel.txt for more details.
+
+Unmodified Addresses
+--------------------
+
+::
+
+ %px 01234567 or 0123456789abcdef
+
+For printing pointers when you *really* want to print the address. Please
+consider whether or not you are leaking sensitive information about the
+kernel memory layout before printing pointers with %px. %px is functionally
+equivalent to %lx (or %lu). %px is preferred because it is more uniquely
+grep'able. If in the future we need to modify the way the kernel handles
+printing pointers we will be better equipped to find the call sites.
+
+Struct Resources
+----------------
+
+::
+
+ %pr [mem 0x60000000-0x6fffffff flags 0x2200] or
+ [mem 0x0000000060000000-0x000000006fffffff flags 0x2200]
+ %pR [mem 0x60000000-0x6fffffff pref] or
+ [mem 0x0000000060000000-0x000000006fffffff pref]
+
+For printing struct resources. The ``R`` and ``r`` specifiers result in a
+printed resource with (R) or without (r) a decoded flags member.
+
+Passed by reference.
+
+Physical address types phys_addr_t
+----------------------------------
+
+::
+
+ %pa[p] 0x01234567 or 0x0123456789abcdef
+
+For printing a phys_addr_t type (and its derivatives, such as
+resource_size_t) which can vary based on build options, regardless of the
+width of the CPU data path.
+
+Passed by reference.
+
+DMA address types dma_addr_t
+----------------------------
+
+::
+
+ %pad 0x01234567 or 0x0123456789abcdef
+
+For printing a dma_addr_t type which can vary based on build options,
+regardless of the width of the CPU data path.
+
+Passed by reference.
+
+Raw buffer as an escaped string
+-------------------------------
+
+::
+
+ %*pE[achnops]
+
+For printing raw buffer as an escaped string. For the following buffer::
+
+ 1b 62 20 5c 43 07 22 90 0d 5d
+
+A few examples show how the conversion would be done (excluding surrounding
+quotes)::
+
+ %*pE "\eb \C\a"\220\r]"
+ %*pEhp "\x1bb \C\x07"\x90\x0d]"
+ %*pEa "\e\142\040\\\103\a\042\220\r\135"
+
+The conversion rules are applied according to an optional combination
+of flags (see :c:func:`string_escape_mem` kernel documentation for the
+details):
+
+ - a - ESCAPE_ANY
+ - c - ESCAPE_SPECIAL
+ - h - ESCAPE_HEX
+ - n - ESCAPE_NULL
+ - o - ESCAPE_OCTAL
+ - p - ESCAPE_NP
+ - s - ESCAPE_SPACE
+
+By default ESCAPE_ANY_NP is used.
+
+ESCAPE_ANY_NP is the sane choice for many cases, in particularly for
+printing SSIDs.
+
+If field width is omitted then 1 byte only will be escaped.
+
+Raw buffer as a hex string
+--------------------------
+
+::
+
+ %*ph 00 01 02 ... 3f
+ %*phC 00:01:02: ... :3f
+ %*phD 00-01-02- ... -3f
+ %*phN 000102 ... 3f
+
+For printing small buffers (up to 64 bytes long) as a hex string with a
+certain separator. For larger buffers consider using
+:c:func:`print_hex_dump`.
+
+MAC/FDDI addresses
+------------------
+
+::
+
+ %pM 00:01:02:03:04:05
+ %pMR 05:04:03:02:01:00
+ %pMF 00-01-02-03-04-05
+ %pm 000102030405
+ %pmR 050403020100
+
+For printing 6-byte MAC/FDDI addresses in hex notation. The ``M`` and ``m``
+specifiers result in a printed address with (M) or without (m) byte
+separators. The default byte separator is the colon (:).
+
+Where FDDI addresses are concerned the ``F`` specifier can be used after
+the ``M`` specifier to use dash (-) separators instead of the default
+separator.
+
+For Bluetooth addresses the ``R`` specifier shall be used after the ``M``
+specifier to use reversed byte order suitable for visual interpretation
+of Bluetooth addresses which are in the little endian order.
+
+Passed by reference.
+
+IPv4 addresses
+--------------
+
+::
+
+ %pI4 1.2.3.4
+ %pi4 001.002.003.004
+ %p[Ii]4[hnbl]
+
+For printing IPv4 dot-separated decimal addresses. The ``I4`` and ``i4``
+specifiers result in a printed address with (i4) or without (I4) leading
+zeros.
+
+The additional ``h``, ``n``, ``b``, and ``l`` specifiers are used to specify
+host, network, big or little endian order addresses respectively. Where
+no specifier is provided the default network/big endian order is used.
+
+Passed by reference.
+
+IPv6 addresses
+--------------
+
+::
+
+ %pI6 0001:0002:0003:0004:0005:0006:0007:0008
+ %pi6 00010002000300040005000600070008
+ %pI6c 1:2:3:4:5:6:7:8
+
+For printing IPv6 network-order 16-bit hex addresses. The ``I6`` and ``i6``
+specifiers result in a printed address with (I6) or without (i6)
+colon-separators. Leading zeros are always used.
+
+The additional ``c`` specifier can be used with the ``I`` specifier to
+print a compressed IPv6 address as described by
+http://tools.ietf.org/html/rfc5952
+
+Passed by reference.
+
+IPv4/IPv6 addresses (generic, with port, flowinfo, scope)
+---------------------------------------------------------
+
+::
+
+ %pIS 1.2.3.4 or 0001:0002:0003:0004:0005:0006:0007:0008
+ %piS 001.002.003.004 or 00010002000300040005000600070008
+ %pISc 1.2.3.4 or 1:2:3:4:5:6:7:8
+ %pISpc 1.2.3.4:12345 or [1:2:3:4:5:6:7:8]:12345
+ %p[Ii]S[pfschnbl]
+
+For printing an IP address without the need to distinguish whether it's of
+type AF_INET or AF_INET6. A pointer to a valid struct sockaddr,
+specified through ``IS`` or ``iS``, can be passed to this format specifier.
+
+The additional ``p``, ``f``, and ``s`` specifiers are used to specify port
+(IPv4, IPv6), flowinfo (IPv6) and scope (IPv6). Ports have a ``:`` prefix,
+flowinfo a ``/`` and scope a ``%``, each followed by the actual value.
+
+In case of an IPv6 address the compressed IPv6 address as described by
+http://tools.ietf.org/html/rfc5952 is being used if the additional
+specifier ``c`` is given. The IPv6 address is surrounded by ``[``, ``]`` in
+case of additional specifiers ``p``, ``f`` or ``s`` as suggested by
+https://tools.ietf.org/html/draft-ietf-6man-text-addr-representation-07
+
+In case of IPv4 addresses, the additional ``h``, ``n``, ``b``, and ``l``
+specifiers can be used as well and are ignored in case of an IPv6
+address.
+
+Passed by reference.
+
+Further examples::
+
+ %pISfc 1.2.3.4 or [1:2:3:4:5:6:7:8]/123456789
+ %pISsc 1.2.3.4 or [1:2:3:4:5:6:7:8]%1234567890
+ %pISpfc 1.2.3.4:12345 or [1:2:3:4:5:6:7:8]:12345/123456789
+
+UUID/GUID addresses
+-------------------
+
+::
+
+ %pUb 00010203-0405-0607-0809-0a0b0c0d0e0f
+ %pUB 00010203-0405-0607-0809-0A0B0C0D0E0F
+ %pUl 03020100-0504-0706-0809-0a0b0c0e0e0f
+ %pUL 03020100-0504-0706-0809-0A0B0C0E0E0F
+
+For printing 16-byte UUID/GUIDs addresses. The additional ``l``, ``L``,
+``b`` and ``B`` specifiers are used to specify a little endian order in
+lower (l) or upper case (L) hex notation - and big endian order in lower (b)
+or upper case (B) hex notation.
+
+Where no additional specifiers are used the default big endian
+order with lower case hex notation will be printed.
+
+Passed by reference.
+
+dentry names
+------------
+
+::
+
+ %pd{,2,3,4}
+ %pD{,2,3,4}
+
+For printing dentry name; if we race with :c:func:`d_move`, the name might
+be a mix of old and new ones, but it won't oops. %pd dentry is a safer
+equivalent of %s dentry->d_name.name we used to use, %pd<n> prints ``n``
+last components. %pD does the same thing for struct file.
+
+Passed by reference.
+
+block_device names
+------------------
+
+::
+
+ %pg sda, sda1 or loop0p1
+
+For printing name of block_device pointers.
+
+struct va_format
+----------------
+
+::
+
+ %pV
+
+For printing struct va_format structures. These contain a format string
+and va_list as follows::
+
+ struct va_format {
+ const char *fmt;
+ va_list *va;
+ };
+
+Implements a "recursive vsnprintf".
+
+Do not use this feature without some mechanism to verify the
+correctness of the format string and va_list arguments.
+
+Passed by reference.
+
+kobjects
+--------
+
+::
+
+ %pOF[fnpPcCF]
+
+
+For printing kobject based structs (device nodes). Default behaviour is
+equivalent to %pOFf.
+
+ - f - device node full_name
+ - n - device node name
+ - p - device node phandle
+ - P - device node path spec (name + @unit)
+ - F - device node flags
+ - c - major compatible string
+ - C - full compatible string
+
+The separator when using multiple arguments is ':'
+
+Examples::
+
+ %pOF /foo/bar@0 - Node full name
+ %pOFf /foo/bar@0 - Same as above
+ %pOFfp /foo/bar@0:10 - Node full name + phandle
+ %pOFfcF /foo/bar@0:foo,device:--P- - Node full name +
+ major compatible string +
+ node flags
+ D - dynamic
+ d - detached
+ P - Populated
+ B - Populated bus
+
+Passed by reference.
+
+struct clk
+----------
+
+::
+
+ %pC pll1
+ %pCn pll1
+ %pCr 1560000000
+
+For printing struct clk structures. %pC and %pCn print the name
+(Common Clock Framework) or address (legacy clock framework) of the
+structure; %pCr prints the current clock rate.
+
+Passed by reference.
+
+bitmap and its derivatives such as cpumask and nodemask
+-------------------------------------------------------
+
+::
+
+ %*pb 0779
+ %*pbl 0,3-6,8-10
+
+For printing bitmap and its derivatives such as cpumask and nodemask,
+%*pb outputs the bitmap with field width as the number of bits and %*pbl
+output the bitmap as range list with field width as the number of bits.
+
+Passed by reference.
+
+Flags bitfields such as page flags, gfp_flags
+---------------------------------------------
+
+::
+
+ %pGp referenced|uptodate|lru|active|private
+ %pGg GFP_USER|GFP_DMA32|GFP_NOWARN
+ %pGv read|exec|mayread|maywrite|mayexec|denywrite
+
+For printing flags bitfields as a collection of symbolic constants that
+would construct the value. The type of flags is given by the third
+character. Currently supported are [p]age flags, [v]ma_flags (both
+expect ``unsigned long *``) and [g]fp_flags (expects ``gfp_t *``). The flag
+names and print order depends on the particular type.
+
+Note that this format should not be used directly in the
+:c:func:`TP_printk()` part of a tracepoint. Instead, use the show_*_flags()
+functions from <trace/events/mmflags.h>.
+
+Passed by reference.
+
+Network device features
+-----------------------
+
+::
+
+ %pNF 0x000000000000c000
+
+For printing netdev_features_t.
+
+Passed by reference.
+
+Thanks
+======
+
+If you add other %p extensions, please extend <lib/test_printf.c> with
+one or more test cases, if at all feasible.
+
+Thank you for your cooperation and attention.
diff --git a/Documentation/core-api/refcount-vs-atomic.rst b/Documentation/core-api/refcount-vs-atomic.rst
new file mode 100644
index 000000000000..83351c258cdb
--- /dev/null
+++ b/Documentation/core-api/refcount-vs-atomic.rst
@@ -0,0 +1,150 @@
+===================================
+refcount_t API compared to atomic_t
+===================================
+
+.. contents:: :local:
+
+Introduction
+============
+
+The goal of refcount_t API is to provide a minimal API for implementing
+an object's reference counters. While a generic architecture-independent
+implementation from lib/refcount.c uses atomic operations underneath,
+there are a number of differences between some of the ``refcount_*()`` and
+``atomic_*()`` functions with regards to the memory ordering guarantees.
+This document outlines the differences and provides respective examples
+in order to help maintainers validate their code against the change in
+these memory ordering guarantees.
+
+The terms used through this document try to follow the formal LKMM defined in
+github.com/aparri/memory-model/blob/master/Documentation/explanation.txt
+
+memory-barriers.txt and atomic_t.txt provide more background to the
+memory ordering in general and for atomic operations specifically.
+
+Relevant types of memory ordering
+=================================
+
+.. note:: The following section only covers some of the memory
+ ordering types that are relevant for the atomics and reference
+ counters and used through this document. For a much broader picture
+ please consult memory-barriers.txt document.
+
+In the absence of any memory ordering guarantees (i.e. fully unordered)
+atomics & refcounters only provide atomicity and
+program order (po) relation (on the same CPU). It guarantees that
+each ``atomic_*()`` and ``refcount_*()`` operation is atomic and instructions
+are executed in program order on a single CPU.
+This is implemented using :c:func:`READ_ONCE`/:c:func:`WRITE_ONCE` and
+compare-and-swap primitives.
+
+A strong (full) memory ordering guarantees that all prior loads and
+stores (all po-earlier instructions) on the same CPU are completed
+before any po-later instruction is executed on the same CPU.
+It also guarantees that all po-earlier stores on the same CPU
+and all propagated stores from other CPUs must propagate to all
+other CPUs before any po-later instruction is executed on the original
+CPU (A-cumulative property). This is implemented using :c:func:`smp_mb`.
+
+A RELEASE memory ordering guarantees that all prior loads and
+stores (all po-earlier instructions) on the same CPU are completed
+before the operation. It also guarantees that all po-earlier
+stores on the same CPU and all propagated stores from other CPUs
+must propagate to all other CPUs before the release operation
+(A-cumulative property). This is implemented using
+:c:func:`smp_store_release`.
+
+A control dependency (on success) for refcounters guarantees that
+if a reference for an object was successfully obtained (reference
+counter increment or addition happened, function returned true),
+then further stores are ordered against this operation.
+Control dependency on stores are not implemented using any explicit
+barriers, but rely on CPU not to speculate on stores. This is only
+a single CPU relation and provides no guarantees for other CPUs.
+
+
+Comparison of functions
+=======================
+
+case 1) - non-"Read/Modify/Write" (RMW) ops
+-------------------------------------------
+
+Function changes:
+
+ * :c:func:`atomic_set` --> :c:func:`refcount_set`
+ * :c:func:`atomic_read` --> :c:func:`refcount_read`
+
+Memory ordering guarantee changes:
+
+ * none (both fully unordered)
+
+
+case 2) - increment-based ops that return no value
+--------------------------------------------------
+
+Function changes:
+
+ * :c:func:`atomic_inc` --> :c:func:`refcount_inc`
+ * :c:func:`atomic_add` --> :c:func:`refcount_add`
+
+Memory ordering guarantee changes:
+
+ * none (both fully unordered)
+
+case 3) - decrement-based RMW ops that return no value
+------------------------------------------------------
+
+Function changes:
+
+ * :c:func:`atomic_dec` --> :c:func:`refcount_dec`
+
+Memory ordering guarantee changes:
+
+ * fully unordered --> RELEASE ordering
+
+
+case 4) - increment-based RMW ops that return a value
+-----------------------------------------------------
+
+Function changes:
+
+ * :c:func:`atomic_inc_not_zero` --> :c:func:`refcount_inc_not_zero`
+ * no atomic counterpart --> :c:func:`refcount_add_not_zero`
+
+Memory ordering guarantees changes:
+
+ * fully ordered --> control dependency on success for stores
+
+.. note:: We really assume here that necessary ordering is provided as a
+ result of obtaining pointer to the object!
+
+
+case 5) - decrement-based RMW ops that return a value
+-----------------------------------------------------
+
+Function changes:
+
+ * :c:func:`atomic_dec_and_test` --> :c:func:`refcount_dec_and_test`
+ * :c:func:`atomic_sub_and_test` --> :c:func:`refcount_sub_and_test`
+ * no atomic counterpart --> :c:func:`refcount_dec_if_one`
+ * ``atomic_add_unless(&var, -1, 1)`` --> ``refcount_dec_not_one(&var)``
+
+Memory ordering guarantees changes:
+
+ * fully ordered --> RELEASE ordering + control dependency
+
+.. note:: :c:func:`atomic_add_unless` only provides full order on success.
+
+
+case 6) - lock-based RMW
+------------------------
+
+Function changes:
+
+ * :c:func:`atomic_dec_and_lock` --> :c:func:`refcount_dec_and_lock`
+ * :c:func:`atomic_dec_and_mutex_lock` --> :c:func:`refcount_dec_and_mutex_lock`
+
+Memory ordering guarantees changes:
+
+ * fully ordered --> RELEASE ordering + control dependency + hold
+ :c:func:`spin_lock` on success