aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/atomic_t.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/atomic_t.txt')
-rw-r--r--Documentation/atomic_t.txt118
1 files changed, 106 insertions, 12 deletions
diff --git a/Documentation/atomic_t.txt b/Documentation/atomic_t.txt
index 0ab747e0d5ac..d7adc6d543db 100644
--- a/Documentation/atomic_t.txt
+++ b/Documentation/atomic_t.txt
@@ -85,21 +85,21 @@ smp_store_release() respectively. Therefore, if you find yourself only using
the Non-RMW operations of atomic_t, you do not in fact need atomic_t at all
and are doing it wrong.
-A subtle detail of atomic_set{}() is that it should be observable to the RMW
-ops. That is:
+A note for the implementation of atomic_set{}() is that it must not break the
+atomicity of the RMW ops. That is:
- C atomic-set
+ C Atomic-RMW-ops-are-atomic-WRT-atomic_set
{
- atomic_set(v, 1);
+ atomic_t v = ATOMIC_INIT(1);
}
- P1(atomic_t *v)
+ P0(atomic_t *v)
{
- atomic_add_unless(v, 1, 0);
+ (void)atomic_add_unless(v, 1, 0);
}
- P2(atomic_t *v)
+ P1(atomic_t *v)
{
atomic_set(v, 0);
}
@@ -233,19 +233,19 @@ as well. Similarly, something like:
is an ACQUIRE pattern (though very much not typical), but again the barrier is
strictly stronger than ACQUIRE. As illustrated:
- C strong-acquire
+ C Atomic-RMW+mb__after_atomic-is-stronger-than-acquire
{
}
- P1(int *x, atomic_t *y)
+ P0(int *x, atomic_t *y)
{
r0 = READ_ONCE(*x);
smp_rmb();
r1 = atomic_read(y);
}
- P2(int *x, atomic_t *y)
+ P1(int *x, atomic_t *y)
{
atomic_inc(y);
smp_mb__after_atomic();
@@ -253,14 +253,14 @@ strictly stronger than ACQUIRE. As illustrated:
}
exists
- (r0=1 /\ r1=0)
+ (0:r0=1 /\ 0:r1=0)
This should not happen; but a hypothetical atomic_inc_acquire() --
(void)atomic_fetch_inc_acquire() for instance -- would allow the outcome,
because it would not order the W part of the RMW against the following
WRITE_ONCE. Thus:
- P1 P2
+ P0 P1
t = LL.acq *y (0)
t++;
@@ -271,3 +271,97 @@ WRITE_ONCE. Thus:
SC *y, t;
is allowed.
+
+
+CMPXCHG vs TRY_CMPXCHG
+----------------------
+
+ int atomic_cmpxchg(atomic_t *ptr, int old, int new);
+ bool atomic_try_cmpxchg(atomic_t *ptr, int *oldp, int new);
+
+Both provide the same functionality, but try_cmpxchg() can lead to more
+compact code. The functions relate like:
+
+ bool atomic_try_cmpxchg(atomic_t *ptr, int *oldp, int new)
+ {
+ int ret, old = *oldp;
+ ret = atomic_cmpxchg(ptr, old, new);
+ if (ret != old)
+ *oldp = ret;
+ return ret == old;
+ }
+
+and:
+
+ int atomic_cmpxchg(atomic_t *ptr, int old, int new)
+ {
+ (void)atomic_try_cmpxchg(ptr, &old, new);
+ return old;
+ }
+
+Usage:
+
+ old = atomic_read(&v); old = atomic_read(&v);
+ for (;;) { do {
+ new = func(old); new = func(old);
+ tmp = atomic_cmpxchg(&v, old, new); } while (!atomic_try_cmpxchg(&v, &old, new));
+ if (tmp == old)
+ break;
+ old = tmp;
+ }
+
+NB. try_cmpxchg() also generates better code on some platforms (notably x86)
+where the function more closely matches the hardware instruction.
+
+
+FORWARD PROGRESS
+----------------
+
+In general strong forward progress is expected of all unconditional atomic
+operations -- those in the Arithmetic and Bitwise classes and xchg(). However
+a fair amount of code also requires forward progress from the conditional
+atomic operations.
+
+Specifically 'simple' cmpxchg() loops are expected to not starve one another
+indefinitely. However, this is not evident on LL/SC architectures, because
+while an LL/SC architecture 'can/should/must' provide forward progress
+guarantees between competing LL/SC sections, such a guarantee does not
+transfer to cmpxchg() implemented using LL/SC. Consider:
+
+ old = atomic_read(&v);
+ do {
+ new = func(old);
+ } while (!atomic_try_cmpxchg(&v, &old, new));
+
+which on LL/SC becomes something like:
+
+ old = atomic_read(&v);
+ do {
+ new = func(old);
+ } while (!({
+ volatile asm ("1: LL %[oldval], %[v]\n"
+ " CMP %[oldval], %[old]\n"
+ " BNE 2f\n"
+ " SC %[new], %[v]\n"
+ " BNE 1b\n"
+ "2:\n"
+ : [oldval] "=&r" (oldval), [v] "m" (v)
+ : [old] "r" (old), [new] "r" (new)
+ : "memory");
+ success = (oldval == old);
+ if (!success)
+ old = oldval;
+ success; }));
+
+However, even the forward branch from the failed compare can cause the LL/SC
+to fail on some architectures, let alone whatever the compiler makes of the C
+loop body. As a result there is no guarantee what so ever the cacheline
+containing @v will stay on the local CPU and progress is made.
+
+Even native CAS architectures can fail to provide forward progress for their
+primitive (See Sparc64 for an example).
+
+Such implementations are strongly encouraged to add exponential backoff loops
+to a failed CAS in order to ensure some progress. Affected architectures are
+also strongly encouraged to inspect/audit the atomic fallbacks, refcount_t and
+their locking primitives.