locking will be seen as ordered by CPUs not holding that lock.
Consider this example:
- /* See Z6.0+pooncelock+pooncelock+pombonce.litmus. */
+ /* See Z6.0+pooncerelease+poacquirerelease+fencembonceonce.litmus. */
void CPU0(void)
{
spin_lock(&mylock);
smp_wmb() and smp_rmb() APIs are still heavily used, so it is important
to understand their use cases. The general approach is shown below:
- /* See MP+wmbonceonce+rmbonceonce.litmus. */
+ /* See MP+fencewmbonceonce+fencermbonceonce.litmus. */
void CPU0(void)
{
WRITE_ONCE(x, 1);
One way of avoiding the counter-intuitive outcome is through the use of a
control dependency paired with a full memory barrier:
- /* See LB+ctrlonceonce+mbonceonce.litmus. */
+ /* See LB+fencembonceonce+ctrlonceonce.litmus. */
void CPU0(void)
{
r0 = READ_ONCE(x);
while another CPU stores to the second variable and then loads from the
first. Preserving order requires nothing less than full barriers:
- /* See SB+mbonceonces.litmus. */
+ /* See SB+fencembonceonces.litmus. */
void CPU0(void)
{
WRITE_ONCE(x, 1);
The memory model is used, in conjunction with "herd7", to exhaustively
explore the state space of small litmus tests.
-For example, to run SB+mbonceonces.litmus against the memory model:
+For example, to run SB+fencembonceonces.litmus against the memory model:
- $ herd7 -conf linux-kernel.cfg litmus-tests/SB+mbonceonces.litmus
+ $ herd7 -conf linux-kernel.cfg litmus-tests/SB+fencembonceonces.litmus
Here is the corresponding output:
- Test SB+mbonceonces Allowed
+ Test SB+fencembonceonces Allowed
States 3
0:r0=0; 1:r0=1;
0:r0=1; 1:r0=0;
Witnesses
Positive: 0 Negative: 3
Condition exists (0:r0=0 /\ 1:r0=0)
- Observation SB+mbonceonces Never 0 3
- Time SB+mbonceonces 0.01
+ Observation SB+fencembonceonces Never 0 3
+ Time SB+fencembonceonces 0.01
Hash=d66d99523e2cac6b06e66f4c995ebb48
The "Positive: 0 Negative: 3" and the "Never 0 3" each indicate that
The "klitmus7" tool converts a litmus test into a Linux kernel module,
which may then be loaded and run.
-For example, to run SB+mbonceonces.litmus against hardware:
+For example, to run SB+fencembonceonces.litmus against hardware:
$ mkdir mymodules
- $ klitmus7 -o mymodules litmus-tests/SB+mbonceonces.litmus
+ $ klitmus7 -o mymodules litmus-tests/SB+fencembonceonces.litmus
$ cd mymodules ; make
$ sudo sh run.sh
The corresponding output includes:
- Test SB+mbonceonces Allowed
+ Test SB+fencembonceonces Allowed
Histogram (3 states)
644580 :>0:r0=1; 1:r0=0;
644328 :>0:r0=0; 1:r0=1;
Positive: 0, Negative: 2000000
Condition exists (0:r0=0 /\ 1:r0=0) is NOT validated
Hash=d66d99523e2cac6b06e66f4c995ebb48
- Observation SB+mbonceonces Never 0 2000000
- Time SB+mbonceonces 0.16
+ Observation SB+fencembonceonces Never 0 2000000
+ Time SB+fencembonceonces 0.16
The "Positive: 0 Negative: 2000000" and the "Never 0 2000000" indicate
that during two million trials, the state specified in this litmus
--- /dev/null
+C IRIW+fencembonceonces+OnceOnce
+
+(*
+ * Result: Never
+ *
+ * Test of independent reads from independent writes with smp_mb()
+ * between each pairs of reads. In other words, is smp_mb() sufficient to
+ * cause two different reading processes to agree on the order of a pair
+ * of writes, where each write is to a different variable by a different
+ * process? This litmus test exercises LKMM's "propagation" rule.
+ *)
+
+{}
+
+P0(int *x)
+{
+ WRITE_ONCE(*x, 1);
+}
+
+P1(int *x, int *y)
+{
+ int r0;
+ int r1;
+
+ r0 = READ_ONCE(*x);
+ smp_mb();
+ r1 = READ_ONCE(*y);
+}
+
+P2(int *y)
+{
+ WRITE_ONCE(*y, 1);
+}
+
+P3(int *x, int *y)
+{
+ int r0;
+ int r1;
+
+ r0 = READ_ONCE(*y);
+ smp_mb();
+ r1 = READ_ONCE(*x);
+}
+
+exists (1:r0=1 /\ 1:r1=0 /\ 3:r0=1 /\ 3:r1=0)
+++ /dev/null
-C IRIW+mbonceonces+OnceOnce
-
-(*
- * Result: Never
- *
- * Test of independent reads from independent writes with smp_mb()
- * between each pairs of reads. In other words, is smp_mb() sufficient to
- * cause two different reading processes to agree on the order of a pair
- * of writes, where each write is to a different variable by a different
- * process? This litmus test exercises LKMM's "propagation" rule.
- *)
-
-{}
-
-P0(int *x)
-{
- WRITE_ONCE(*x, 1);
-}
-
-P1(int *x, int *y)
-{
- int r0;
- int r1;
-
- r0 = READ_ONCE(*x);
- smp_mb();
- r1 = READ_ONCE(*y);
-}
-
-P2(int *y)
-{
- WRITE_ONCE(*y, 1);
-}
-
-P3(int *x, int *y)
-{
- int r0;
- int r1;
-
- r0 = READ_ONCE(*y);
- smp_mb();
- r1 = READ_ONCE(*x);
-}
-
-exists (1:r0=1 /\ 1:r1=0 /\ 3:r0=1 /\ 3:r1=0)
+++ /dev/null
-C LB+ctrlonceonce+mbonceonce
-
-(*
- * Result: Never
- *
- * This litmus test demonstrates that lightweight ordering suffices for
- * the load-buffering pattern, in other words, preventing all processes
- * reading from the preceding process's write. In this example, the
- * combination of a control dependency and a full memory barrier are enough
- * to do the trick. (But the full memory barrier could be replaced with
- * another control dependency and order would still be maintained.)
- *)
-
-{}
-
-P0(int *x, int *y)
-{
- int r0;
-
- r0 = READ_ONCE(*x);
- if (r0)
- WRITE_ONCE(*y, 1);
-}
-
-P1(int *x, int *y)
-{
- int r0;
-
- r0 = READ_ONCE(*y);
- smp_mb();
- WRITE_ONCE(*x, 1);
-}
-
-exists (0:r0=1 /\ 1:r0=1)
--- /dev/null
+C LB+fencembonceonce+ctrlonceonce
+
+(*
+ * Result: Never
+ *
+ * This litmus test demonstrates that lightweight ordering suffices for
+ * the load-buffering pattern, in other words, preventing all processes
+ * reading from the preceding process's write. In this example, the
+ * combination of a control dependency and a full memory barrier are enough
+ * to do the trick. (But the full memory barrier could be replaced with
+ * another control dependency and order would still be maintained.)
+ *)
+
+{}
+
+P0(int *x, int *y)
+{
+ int r0;
+
+ r0 = READ_ONCE(*x);
+ if (r0)
+ WRITE_ONCE(*y, 1);
+}
+
+P1(int *x, int *y)
+{
+ int r0;
+
+ r0 = READ_ONCE(*y);
+ smp_mb();
+ WRITE_ONCE(*x, 1);
+}
+
+exists (0:r0=1 /\ 1:r0=1)
--- /dev/null
+C MP+fencewmbonceonce+fencermbonceonce
+
+(*
+ * Result: Never
+ *
+ * This litmus test demonstrates that smp_wmb() and smp_rmb() provide
+ * sufficient ordering for the message-passing pattern. However, it
+ * is usually better to use smp_store_release() and smp_load_acquire().
+ *)
+
+{}
+
+P0(int *x, int *y)
+{
+ WRITE_ONCE(*x, 1);
+ smp_wmb();
+ WRITE_ONCE(*y, 1);
+}
+
+P1(int *x, int *y)
+{
+ int r0;
+ int r1;
+
+ r0 = READ_ONCE(*y);
+ smp_rmb();
+ r1 = READ_ONCE(*x);
+}
+
+exists (1:r0=1 /\ 1:r1=0)
+++ /dev/null
-C MP+wmbonceonce+rmbonceonce
-
-(*
- * Result: Never
- *
- * This litmus test demonstrates that smp_wmb() and smp_rmb() provide
- * sufficient ordering for the message-passing pattern. However, it
- * is usually better to use smp_store_release() and smp_load_acquire().
- *)
-
-{}
-
-P0(int *x, int *y)
-{
- WRITE_ONCE(*x, 1);
- smp_wmb();
- WRITE_ONCE(*y, 1);
-}
-
-P1(int *x, int *y)
-{
- int r0;
- int r1;
-
- r0 = READ_ONCE(*y);
- smp_rmb();
- r1 = READ_ONCE(*x);
-}
-
-exists (1:r0=1 /\ 1:r1=0)
--- /dev/null
+C R+fencembonceonces
+
+(*
+ * Result: Never
+ *
+ * This is the fully ordered (via smp_mb()) version of one of the classic
+ * counterintuitive litmus tests that illustrates the effects of store
+ * propagation delays. Note that weakening either of the barriers would
+ * cause the resulting test to be allowed.
+ *)
+
+{}
+
+P0(int *x, int *y)
+{
+ WRITE_ONCE(*x, 1);
+ smp_mb();
+ WRITE_ONCE(*y, 1);
+}
+
+P1(int *x, int *y)
+{
+ int r0;
+
+ WRITE_ONCE(*y, 2);
+ smp_mb();
+ r0 = READ_ONCE(*x);
+}
+
+exists (y=2 /\ 1:r0=0)
+++ /dev/null
-C R+mbonceonces
-
-(*
- * Result: Never
- *
- * This is the fully ordered (via smp_mb()) version of one of the classic
- * counterintuitive litmus tests that illustrates the effects of store
- * propagation delays. Note that weakening either of the barriers would
- * cause the resulting test to be allowed.
- *)
-
-{}
-
-P0(int *x, int *y)
-{
- WRITE_ONCE(*x, 1);
- smp_mb();
- WRITE_ONCE(*y, 1);
-}
-
-P1(int *x, int *y)
-{
- int r0;
-
- WRITE_ONCE(*y, 2);
- smp_mb();
- r0 = READ_ONCE(*x);
-}
-
-exists (y=2 /\ 1:r0=0)
Test of write-write coherence, that is, whether or not two
successive writes to the same variable are ordered.
-IRIW+mbonceonces+OnceOnce.litmus
+IRIW+fencembonceonces+OnceOnce.litmus
Test of independent reads from independent writes with smp_mb()
between each pairs of reads. In other words, is smp_mb()
sufficient to cause two different reading processes to agree on
Can a release-acquire chain order a prior store against
a later load?
-LB+ctrlonceonce+mbonceonce.litmus
+LB+fencembonceonce+ctrlonceonce.litmus
Does a control dependency and an smp_mb() suffice for the
load-buffering litmus test, where each process reads from one
of two variables then writes to the other?
As below, but with the first access of the writer process
and the second access of reader process protected by a lock.
-MP+wmbonceonce+rmbonceonce.litmus
+MP+fencewmbonceonce+fencermbonceonce.litmus
Does a smp_wmb() (between the stores) and an smp_rmb() (between
the loads) suffice for the message-passing litmus test, where one
process writes data and then a flag, and the other process reads
the flag and then the data. (This is similar to the ISA2 tests,
but with two processes instead of three.)
-R+mbonceonces.litmus
+R+fencembonceonces.litmus
This is the fully ordered (via smp_mb()) version of one of
the classic counterintuitive litmus tests that illustrates the
effects of store propagation delays.
R+poonceonces.litmus
As above, but without the smp_mb() invocations.
-SB+mbonceonces.litmus
+SB+fencembonceonces.litmus
This is the fully ordered (again, via smp_mb() version of store
buffering, which forms the core of Dekker's mutual-exclusion
algorithm.
S+poonceonces.litmus
As below, but without the smp_wmb() and acquire load.
-S+wmbonceonce+poacquireonce.litmus
+S+fencewmbonceonce+poacquireonce.litmus
Can a smp_wmb(), instead of a release, and an acquire order
a prior store against a subsequent store?
WRC+poonceonces+Once.litmus
-WRC+pooncerelease+rmbonceonce+Once.litmus
+WRC+pooncerelease+fencermbonceonce+Once.litmus
These two are members of an extension of the MP litmus-test
class in which the first write is moved to a separate process.
The second is forbidden because smp_store_release() is
As above, but with smp_mb__after_spinlock() immediately
following the spin_lock().
-Z6.0+pooncerelease+poacquirerelease+mbonceonce.litmus
+Z6.0+pooncerelease+poacquirerelease+fencembonceonce.litmus
Is the ordering provided by a release-acquire chain sufficient
to make ordering apparent to accesses by a process that does
not participate in that release-acquire chain?
--- /dev/null
+C S+fencewmbonceonce+poacquireonce
+
+(*
+ * Result: Never
+ *
+ * Can a smp_wmb(), instead of a release, and an acquire order a prior
+ * store against a subsequent store?
+ *)
+
+{}
+
+P0(int *x, int *y)
+{
+ WRITE_ONCE(*x, 2);
+ smp_wmb();
+ WRITE_ONCE(*y, 1);
+}
+
+P1(int *x, int *y)
+{
+ int r0;
+
+ r0 = smp_load_acquire(y);
+ WRITE_ONCE(*x, 1);
+}
+
+exists (x=2 /\ 1:r0=1)
+++ /dev/null
-C S+wmbonceonce+poacquireonce
-
-(*
- * Result: Never
- *
- * Can a smp_wmb(), instead of a release, and an acquire order a prior
- * store against a subsequent store?
- *)
-
-{}
-
-P0(int *x, int *y)
-{
- WRITE_ONCE(*x, 2);
- smp_wmb();
- WRITE_ONCE(*y, 1);
-}
-
-P1(int *x, int *y)
-{
- int r0;
-
- r0 = smp_load_acquire(y);
- WRITE_ONCE(*x, 1);
-}
-
-exists (x=2 /\ 1:r0=1)
--- /dev/null
+C SB+fencembonceonces
+
+(*
+ * Result: Never
+ *
+ * This litmus test demonstrates that full memory barriers suffice to
+ * order the store-buffering pattern, where each process writes to the
+ * variable that the preceding process reads. (Locking and RCU can also
+ * suffice, but not much else.)
+ *)
+
+{}
+
+P0(int *x, int *y)
+{
+ int r0;
+
+ WRITE_ONCE(*x, 1);
+ smp_mb();
+ r0 = READ_ONCE(*y);
+}
+
+P1(int *x, int *y)
+{
+ int r0;
+
+ WRITE_ONCE(*y, 1);
+ smp_mb();
+ r0 = READ_ONCE(*x);
+}
+
+exists (0:r0=0 /\ 1:r0=0)
+++ /dev/null
-C SB+mbonceonces
-
-(*
- * Result: Never
- *
- * This litmus test demonstrates that full memory barriers suffice to
- * order the store-buffering pattern, where each process writes to the
- * variable that the preceding process reads. (Locking and RCU can also
- * suffice, but not much else.)
- *)
-
-{}
-
-P0(int *x, int *y)
-{
- int r0;
-
- WRITE_ONCE(*x, 1);
- smp_mb();
- r0 = READ_ONCE(*y);
-}
-
-P1(int *x, int *y)
-{
- int r0;
-
- WRITE_ONCE(*y, 1);
- smp_mb();
- r0 = READ_ONCE(*x);
-}
-
-exists (0:r0=0 /\ 1:r0=0)
--- /dev/null
+C WRC+pooncerelease+fencermbonceonce+Once
+
+(*
+ * Result: Never
+ *
+ * This litmus test is an extension of the message-passing pattern, where
+ * the first write is moved to a separate process. Because it features
+ * a release and a read memory barrier, it should be forbidden. More
+ * specifically, this litmus test is forbidden because smp_store_release()
+ * is A-cumulative in LKMM.
+ *)
+
+{}
+
+P0(int *x)
+{
+ WRITE_ONCE(*x, 1);
+}
+
+P1(int *x, int *y)
+{
+ int r0;
+
+ r0 = READ_ONCE(*x);
+ smp_store_release(y, 1);
+}
+
+P2(int *x, int *y)
+{
+ int r0;
+ int r1;
+
+ r0 = READ_ONCE(*y);
+ smp_rmb();
+ r1 = READ_ONCE(*x);
+}
+
+exists (1:r0=1 /\ 2:r0=1 /\ 2:r1=0)
+++ /dev/null
-C WRC+pooncerelease+rmbonceonce+Once
-
-(*
- * Result: Never
- *
- * This litmus test is an extension of the message-passing pattern, where
- * the first write is moved to a separate process. Because it features
- * a release and a read memory barrier, it should be forbidden. More
- * specifically, this litmus test is forbidden because smp_store_release()
- * is A-cumulative in LKMM.
- *)
-
-{}
-
-P0(int *x)
-{
- WRITE_ONCE(*x, 1);
-}
-
-P1(int *x, int *y)
-{
- int r0;
-
- r0 = READ_ONCE(*x);
- smp_store_release(y, 1);
-}
-
-P2(int *x, int *y)
-{
- int r0;
- int r1;
-
- r0 = READ_ONCE(*y);
- smp_rmb();
- r1 = READ_ONCE(*x);
-}
-
-exists (1:r0=1 /\ 2:r0=1 /\ 2:r1=0)
--- /dev/null
+C Z6.0+pooncerelease+poacquirerelease+fencembonceonce
+
+(*
+ * Result: Sometimes
+ *
+ * This litmus test shows that a release-acquire chain, while sufficient
+ * when there is but one non-reads-from (AKA non-rf) link, does not suffice
+ * if there is more than one. Of the three processes, only P1() reads from
+ * P0's write, which means that there are two non-rf links: P1() to P2()
+ * is a write-to-write link (AKA a "coherence" or just "co" link) and P2()
+ * to P0() is a read-to-write link (AKA a "from-reads" or just "fr" link).
+ * When there are two or more non-rf links, you typically will need one
+ * full barrier for each non-rf link. (Exceptions include some cases
+ * involving locking.)
+ *)
+
+{}
+
+P0(int *x, int *y)
+{
+ WRITE_ONCE(*x, 1);
+ smp_store_release(y, 1);
+}
+
+P1(int *y, int *z)
+{
+ int r0;
+
+ r0 = smp_load_acquire(y);
+ smp_store_release(z, 1);
+}
+
+P2(int *x, int *z)
+{
+ int r1;
+
+ WRITE_ONCE(*z, 2);
+ smp_mb();
+ r1 = READ_ONCE(*x);
+}
+
+exists (1:r0=1 /\ z=2 /\ 2:r1=0)
+++ /dev/null
-C Z6.0+pooncerelease+poacquirerelease+mbonceonce
-
-(*
- * Result: Sometimes
- *
- * This litmus test shows that a release-acquire chain, while sufficient
- * when there is but one non-reads-from (AKA non-rf) link, does not suffice
- * if there is more than one. Of the three processes, only P1() reads from
- * P0's write, which means that there are two non-rf links: P1() to P2()
- * is a write-to-write link (AKA a "coherence" or just "co" link) and P2()
- * to P0() is a read-to-write link (AKA a "from-reads" or just "fr" link).
- * When there are two or more non-rf links, you typically will need one
- * full barrier for each non-rf link. (Exceptions include some cases
- * involving locking.)
- *)
-
-{}
-
-P0(int *x, int *y)
-{
- WRITE_ONCE(*x, 1);
- smp_store_release(y, 1);
-}
-
-P1(int *y, int *z)
-{
- int r0;
-
- r0 = smp_load_acquire(y);
- smp_store_release(z, 1);
-}
-
-P2(int *x, int *z)
-{
- int r1;
-
- WRITE_ONCE(*z, 2);
- smp_mb();
- r1 = READ_ONCE(*x);
-}
-
-exists (1:r0=1 /\ z=2 /\ 2:r1=0)