Thu Sep 3 00:23:57 2020 UTC ()
atomic_load/store_* appeared in NetBSD 9, not 10.

Pullup preceded release of 9.0.


(riastradh)
diff -r1.5 -r1.6 src/share/man/man9/atomic_loadstore.9

cvs diff -r1.5 -r1.6 src/share/man/man9/atomic_loadstore.9 (switch to unified diff)

--- src/share/man/man9/atomic_loadstore.9 2019/12/08 00:00:59 1.5
+++ src/share/man/man9/atomic_loadstore.9 2020/09/03 00:23:57 1.6
@@ -1,806 +1,806 @@ @@ -1,806 +1,806 @@
1.\" $NetBSD: atomic_loadstore.9,v 1.5 2019/12/08 00:00:59 uwe Exp $ 1.\" $NetBSD: atomic_loadstore.9,v 1.6 2020/09/03 00:23:57 riastradh Exp $
2.\" 2.\"
3.\" Copyright (c) 2019 The NetBSD Foundation 3.\" Copyright (c) 2019 The NetBSD Foundation
4.\" All rights reserved. 4.\" All rights reserved.
5.\" 5.\"
6.\" This code is derived from software contributed to The NetBSD Foundation 6.\" This code is derived from software contributed to The NetBSD Foundation
7.\" by Taylor R. Campbell. 7.\" by Taylor R. Campbell.
8.\" 8.\"
9.\" Redistribution and use in source and binary forms, with or without 9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions 10.\" modification, are permitted provided that the following conditions
11.\" are met: 11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright 12.\" 1. Redistributions of source code must retain the above copyright
13.\" notice, this list of conditions and the following disclaimer. 13.\" notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright 14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\" notice, this list of conditions and the following disclaimer in the 15.\" notice, this list of conditions and the following disclaimer in the
16.\" documentation and/or other materials provided with the distribution. 16.\" documentation and/or other materials provided with the distribution.
17.\" 17.\"
18.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS 18.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
19.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 19.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
20.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 20.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
21.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS 21.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
22.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 22.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
23.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 23.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
24.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 24.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
25.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 25.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
26.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 26.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
27.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 27.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
28.\" POSSIBILITY OF SUCH DAMAGE. 28.\" POSSIBILITY OF SUCH DAMAGE.
29.\" 29.\"
30.Dd November 25, 2019 30.Dd November 25, 2019
31.Dt ATOMIC_LOADSTORE 9 31.Dt ATOMIC_LOADSTORE 9
32.Os 32.Os
33.Sh NAME 33.Sh NAME
34.Nm atomic_load_relaxed , 34.Nm atomic_load_relaxed ,
35.Nm atomic_load_acquire , 35.Nm atomic_load_acquire ,
36.Nm atomic_load_consume , 36.Nm atomic_load_consume ,
37.Nm atomic_store_relaxed , 37.Nm atomic_store_relaxed ,
38.Nm atomic_store_release 38.Nm atomic_store_release
39.Nd atomic and ordered memory operations 39.Nd atomic and ordered memory operations
40.Sh SYNOPSIS 40.Sh SYNOPSIS
41.In sys/atomic.h 41.In sys/atomic.h
42.Ft T 42.Ft T
43.Fn atomic_load_relaxed "const volatile T *p" 43.Fn atomic_load_relaxed "const volatile T *p"
44.Ft T 44.Ft T
45.Fn atomic_load_acquire "const volatile T *p" 45.Fn atomic_load_acquire "const volatile T *p"
46.Ft T 46.Ft T
47.Fn atomic_load_consume "const volatile T *p" 47.Fn atomic_load_consume "const volatile T *p"
48.Ft void 48.Ft void
49.Fn atomic_store_relaxed "volatile T *p" "T v" 49.Fn atomic_store_relaxed "volatile T *p" "T v"
50.Ft void 50.Ft void
51.Fn atomic_store_release "volatile T *p" "T v" 51.Fn atomic_store_release "volatile T *p" "T v"
52.Sh DESCRIPTION 52.Sh DESCRIPTION
53These type-generic macros implement memory operations that are 53These type-generic macros implement memory operations that are
54.Em atomic 54.Em atomic
55and that have 55and that have
56.Em memory ordering constraints . 56.Em memory ordering constraints .
57Aside from atomicity and ordering, the load operations are equivalent 57Aside from atomicity and ordering, the load operations are equivalent
58to 58to
59.Li * Ns Fa p 59.Li * Ns Fa p
60and the store operations are equivalent to 60and the store operations are equivalent to
61.Li * Ns Fa p Li "=" Fa v . 61.Li * Ns Fa p Li "=" Fa v .
62The pointer 62The pointer
63.Fa p 63.Fa p
64must be aligned, even on architectures like x86 which generally lack 64must be aligned, even on architectures like x86 which generally lack
65strict alignment requirements; see 65strict alignment requirements; see
66.Sx SIZE AND ALIGNMENT 66.Sx SIZE AND ALIGNMENT
67for details. 67for details.
68.Pp 68.Pp
69.Em Atomic 69.Em Atomic
70means that the memory operations cannot be 70means that the memory operations cannot be
71.Em fused 71.Em fused
72or 72or
73.Em torn : 73.Em torn :
74.Bl -bullet 74.Bl -bullet
75.It 75.It
76.Em Fusing 76.Em Fusing
77is combining multiple memory operations on a single object into one 77is combining multiple memory operations on a single object into one
78memory operation, such as replacing 78memory operation, such as replacing
79.Bd -literal -compact 79.Bd -literal -compact
80 *p = v; 80 *p = v;
81 x = *p; 81 x = *p;
82.Ed 82.Ed
83by 83by
84.Bd -literal -compact 84.Bd -literal -compact
85 *p = v; 85 *p = v;
86 x = v; 86 x = v;
87.Ed 87.Ed
88since the compiler can prove that 88since the compiler can prove that
89.Li \&*p 89.Li \&*p
90will yield 90will yield
91.Li v 91.Li v
92after 92after
93.Li \&*p\ =\ v . 93.Li \&*p\ =\ v .
94For 94For
95.Em atomic 95.Em atomic
96memory operations, the implementation 96memory operations, the implementation
97.Em will not 97.Em will not
98assume that 98assume that
99.Bl -dash -compact 99.Bl -dash -compact
100.It 100.It
101consecutive loads of the same object will return the same value, or 101consecutive loads of the same object will return the same value, or
102.It 102.It
103a store followed by a load of the same object will return the value 103a store followed by a load of the same object will return the value
104stored, or 104stored, or
105.It 105.It
106consecutive stores of the same object are redundant. 106consecutive stores of the same object are redundant.
107.El 107.El
108Thus, the implementation will not replace two consecutive atomic loads 108Thus, the implementation will not replace two consecutive atomic loads
109by one, will not elide an atomic load following a store, and will not 109by one, will not elide an atomic load following a store, and will not
110combine two consecutive atomic stores into one. 110combine two consecutive atomic stores into one.
111.Pp 111.Pp
112For example, 112For example,
113.Bd -literal 113.Bd -literal
114 atomic_store_relaxed(&flag, 1); 114 atomic_store_relaxed(&flag, 1);
115 while (atomic_load_relaxed(&flag)) 115 while (atomic_load_relaxed(&flag))
116 continue; 116 continue;
117.Ed 117.Ed
118.Pp 118.Pp
119may be used to set a flag and then busy-wait until another thread 119may be used to set a flag and then busy-wait until another thread
120clears it, whereas 120clears it, whereas
121.Bd -literal 121.Bd -literal
122 flag = 1; 122 flag = 1;
123 while (flag) 123 while (flag)
124 continue; 124 continue;
125.Ed 125.Ed
126.Pp 126.Pp
127may be transformed into the infinite loop 127may be transformed into the infinite loop
128.Bd -literal 128.Bd -literal
129 flag = 1; 129 flag = 1;
130 while (1) 130 while (1)
131 continue; 131 continue;
132.Ed 132.Ed
133.It 133.It
134.Em Tearing 134.Em Tearing
135is implementing a memory operation on a large data unit such as a 135is implementing a memory operation on a large data unit such as a
13632-bit word by issuing multiple memory operations on smaller data units 13632-bit word by issuing multiple memory operations on smaller data units
137such as 8-bit bytes. 137such as 8-bit bytes.
138The implementation will not tear 138The implementation will not tear
139.Em atomic 139.Em atomic
140loads or stores into smaller ones. 140loads or stores into smaller ones.
141Thus, as far as any interrupt, other thread, or other CPU can tell, an 141Thus, as far as any interrupt, other thread, or other CPU can tell, an
142atomic memory operation is issued either all at once or not at all. 142atomic memory operation is issued either all at once or not at all.
143.Pp 143.Pp
144For example, if a 32-bit word 144For example, if a 32-bit word
145.Va w 145.Va w
146is written with 146is written with
147.Pp 147.Pp
148.Dl atomic_store_relaxed(&w,\ 0x00010002); 148.Dl atomic_store_relaxed(&w,\ 0x00010002);
149.Pp 149.Pp
150then an interrupt, other thread, or other CPU reading it with 150then an interrupt, other thread, or other CPU reading it with
151.Li atomic_load_relaxed(&w) 151.Li atomic_load_relaxed(&w)
152will never witness it partially written, whereas 152will never witness it partially written, whereas
153.Pp 153.Pp
154.Dl w\ =\ 0x00010002; 154.Dl w\ =\ 0x00010002;
155.Pp 155.Pp
156might be compiled into a pair of separate 16-bit store instructions 156might be compiled into a pair of separate 16-bit store instructions
157instead of one single word-sized store instruction, in which case other 157instead of one single word-sized store instruction, in which case other
158threads may see the intermediate state with only one of the halves 158threads may see the intermediate state with only one of the halves
159written. 159written.
160.El 160.El
161.Pp 161.Pp
162Atomic operations on any single object occur in a total order shared by 162Atomic operations on any single object occur in a total order shared by
163all interrupts, threads, and CPUs, which is consistent with the program 163all interrupts, threads, and CPUs, which is consistent with the program
164order in every interrupt, thread, and CPU. 164order in every interrupt, thread, and CPU.
165A single program without interruption or other threads or CPUs will 165A single program without interruption or other threads or CPUs will
166always observe its own loads and stores in program order, but another 166always observe its own loads and stores in program order, but another
167program in an interrupt handler, in another thread, or on another CPU 167program in an interrupt handler, in another thread, or on another CPU
168may issue loads that return values as if the first program's stores 168may issue loads that return values as if the first program's stores
169occurred out of program order, and vice versa. 169occurred out of program order, and vice versa.
170Two different threads might each observe a third thread's memory 170Two different threads might each observe a third thread's memory
171operations in different orders. 171operations in different orders.
172.Pp 172.Pp
173The 173The
174.Em memory ordering constraints 174.Em memory ordering constraints
175make limited guarantees of ordering relative to memory operations on 175make limited guarantees of ordering relative to memory operations on
176.Em other 176.Em other
177objects as witnessed by interrupts, other threads, or other CPUs, and 177objects as witnessed by interrupts, other threads, or other CPUs, and
178have the following meanings: 178have the following meanings:
179.Bl -tag -width relaxed 179.Bl -tag -width relaxed
180.It relaxed 180.It relaxed
181No ordering relative to memory operations on any other objects is 181No ordering relative to memory operations on any other objects is
182guaranteed. 182guaranteed.
183Relaxed ordering is the default for ordinary non-atomic memory 183Relaxed ordering is the default for ordinary non-atomic memory
184operations like 184operations like
185.Li "*p" 185.Li "*p"
186and 186and
187.Li "*p = v" . 187.Li "*p = v" .
188.Pp 188.Pp
189Atomic operations with relaxed ordering are cheap: they are not 189Atomic operations with relaxed ordering are cheap: they are not
190read/modify/write atomic operations, and they do not involve any kind 190read/modify/write atomic operations, and they do not involve any kind
191of inter-CPU ordering barriers. 191of inter-CPU ordering barriers.
192.It acquire 192.It acquire
193This memory operation happens before all subsequent memory operations 193This memory operation happens before all subsequent memory operations
194in program order. 194in program order.
195However, prior memory operations in program order may be reordered to 195However, prior memory operations in program order may be reordered to
196happen after this one. 196happen after this one.
197For example, assuming no aliasing between the pointers, the 197For example, assuming no aliasing between the pointers, the
198implementation is allowed to treat 198implementation is allowed to treat
199.Bd -literal 199.Bd -literal
200 int x = *p; 200 int x = *p;
201 if (atomic_load_acquire(q)) { 201 if (atomic_load_acquire(q)) {
202 int y = *r; 202 int y = *r;
203 *s = x + y; 203 *s = x + y;
204 return 1; 204 return 1;
205 } 205 }
206.Ed 206.Ed
207.Pp 207.Pp
208as if it were 208as if it were
209.Bd -literal 209.Bd -literal
210 if (atomic_load_acquire(q)) { 210 if (atomic_load_acquire(q)) {
211 int x = *p; 211 int x = *p;
212 int y = *r; 212 int y = *r;
213 *s = x + y; 213 *s = x + y;
214 return 1; 214 return 1;
215 } 215 }
216.Ed 216.Ed
217.Pp 217.Pp
218but 218but
219.Em not 219.Em not
220as if it were 220as if it were
221.Bd -literal 221.Bd -literal
222 int x = *p; 222 int x = *p;
223 int y = *r; 223 int y = *r;
224 *s = x + y; 224 *s = x + y;
225 if (atomic_load_acquire(q)) { 225 if (atomic_load_acquire(q)) {
226 return 1; 226 return 1;
227 } 227 }
228.Ed 228.Ed
229.It consume 229.It consume
230This memory operation happens before all memory operations on objects 230This memory operation happens before all memory operations on objects
231at addresses that are computed from the value returned by this one. 231at addresses that are computed from the value returned by this one.
232Otherwise, no ordering relative to memory operations on other objects 232Otherwise, no ordering relative to memory operations on other objects
233is implied. 233is implied.
234.Pp 234.Pp
235For example, the implementation is allowed to treat 235For example, the implementation is allowed to treat
236.Bd -literal 236.Bd -literal
237 struct foo *foo0, *foo1; 237 struct foo *foo0, *foo1;
238 238
239 struct foo *f0 = atomic_load_consume(&foo0); 239 struct foo *f0 = atomic_load_consume(&foo0);
240 struct foo *f1 = atomic_load_consume(&foo1); 240 struct foo *f1 = atomic_load_consume(&foo1);
241 int x = f0->x; 241 int x = f0->x;
242 int y = f1->y; 242 int y = f1->y;
243.Ed 243.Ed
244.Pp 244.Pp
245as if it were 245as if it were
246.Bd -literal 246.Bd -literal
247 struct foo *foo0, *foo1; 247 struct foo *foo0, *foo1;
248 248
249 struct foo *f1 = atomic_load_consume(&foo1); 249 struct foo *f1 = atomic_load_consume(&foo1);
250 struct foo *f0 = atomic_load_consume(&foo0); 250 struct foo *f0 = atomic_load_consume(&foo0);
251 int y = f1->y; 251 int y = f1->y;
252 int x = f0->x; 252 int x = f0->x;
253.Ed 253.Ed
254.Pp 254.Pp
255but loading 255but loading
256.Li f0->x 256.Li f0->x
257is guaranteed to happen after loading 257is guaranteed to happen after loading
258.Li foo0 258.Li foo0
259even if the CPU had a cached value for the address that 259even if the CPU had a cached value for the address that
260.Li f0->x 260.Li f0->x
261happened to be at, and likewise for 261happened to be at, and likewise for
262.Li f1->y 262.Li f1->y
263and 263and
264.Li foo1 . 264.Li foo1 .
265.Pp 265.Pp
266.Fn atomic_load_consume 266.Fn atomic_load_consume
267functions like 267functions like
268.Fn atomic_load_acquire 268.Fn atomic_load_acquire
269as long as the memory operations that must happen after it are limited 269as long as the memory operations that must happen after it are limited
270to addresses that depend on the value returned by it, but it is almost 270to addresses that depend on the value returned by it, but it is almost
271always as cheap as 271always as cheap as
272.Fn atomic_load_relaxed . 272.Fn atomic_load_relaxed .
273See 273See
274.Sx ACQUIRE OR CONSUME? 274.Sx ACQUIRE OR CONSUME?
275below for more details. 275below for more details.
276.It release 276.It release
277All prior memory operations in program order happen before this one. 277All prior memory operations in program order happen before this one.
278However, subsequent memory operations in program order may be reordered 278However, subsequent memory operations in program order may be reordered
279to happen before this one too. 279to happen before this one too.
280For example, assuming no aliasing between the pointers, the 280For example, assuming no aliasing between the pointers, the
281implementation is allowed to treat 281implementation is allowed to treat
282.Bd -literal 282.Bd -literal
283 int x = *p; 283 int x = *p;
284 *q = x; 284 *q = x;
285 atomic_store_release(r, 0); 285 atomic_store_release(r, 0);
286 int y = *s; 286 int y = *s;
287 return x + y; 287 return x + y;
288.Ed 288.Ed
289.Pp 289.Pp
290as if it were 290as if it were
291.Bd -literal 291.Bd -literal
292 int y = *s; 292 int y = *s;
293 int x = *p; 293 int x = *p;
294 *q = x; 294 *q = x;
295 atomic_store_release(r, 0); 295 atomic_store_release(r, 0);
296 return x + y; 296 return x + y;
297.Ed 297.Ed
298.Pp 298.Pp
299but 299but
300.Em not 300.Em not
301as if it were 301as if it were
302.Bd -literal 302.Bd -literal
303 atomic_store_release(r, 0); 303 atomic_store_release(r, 0);
304 int x = *p; 304 int x = *p;
305 int y = *s; 305 int y = *s;
306 *q = x; 306 *q = x;
307 return x + y; 307 return x + y;
308.Ed 308.Ed
309.El 309.El
310.Ss PAIRING ORDERED MEMORY OPERATIONS 310.Ss PAIRING ORDERED MEMORY OPERATIONS
311In general, each 311In general, each
312.Fn atomic_store_release 312.Fn atomic_store_release
313.Em must 313.Em must
314be paired with either 314be paired with either
315.Fn atomic_load_acquire 315.Fn atomic_load_acquire
316or 316or
317.Fn atomic_load_consume 317.Fn atomic_load_consume
318in order to have an effect \(em it is only when a release operation 318in order to have an effect \(em it is only when a release operation
319synchronizes with an acquire or consume operation that any ordering 319synchronizes with an acquire or consume operation that any ordering
320guaranteed between memory operations 320guaranteed between memory operations
321.Em before 321.Em before
322the release operation and memory operations 322the release operation and memory operations
323.Em after 323.Em after
324the acquire/consume operation. 324the acquire/consume operation.
325.Pp 325.Pp
326For example, to set up an entry in a table and then mark the entry 326For example, to set up an entry in a table and then mark the entry
327ready, you should: 327ready, you should:
328.Bl -enum 328.Bl -enum
329.It 329.It
330Perform memory operations to initialize the data. 330Perform memory operations to initialize the data.
331.Bd -literal 331.Bd -literal
332 tab[i].x = ...; 332 tab[i].x = ...;
333 tab[i].y = ...; 333 tab[i].y = ...;
334.Ed 334.Ed
335.It 335.It
336Issue 336Issue
337.Fn atomic_store_release 337.Fn atomic_store_release
338to mark it ready. 338to mark it ready.
339.Bd -literal 339.Bd -literal
340 atomic_store_release(&tab[i].ready, 1); 340 atomic_store_release(&tab[i].ready, 1);
341.Ed 341.Ed
342.It 342.It
343Possibly in another thread, issue 343Possibly in another thread, issue
344.Fn atomic_load_acquire 344.Fn atomic_load_acquire
345to ascertain whether it is ready. 345to ascertain whether it is ready.
346.Bd -literal 346.Bd -literal
347 if (atomic_load_acquire(&tab[i].ready) == 0) 347 if (atomic_load_acquire(&tab[i].ready) == 0)
348 return EWOULDBLOCK; 348 return EWOULDBLOCK;
349.Ed 349.Ed
350.It 350.It
351Perform memory operations to use the data. 351Perform memory operations to use the data.
352.Bd -literal 352.Bd -literal
353 do_stuff(tab[i].x, tab[i].y); 353 do_stuff(tab[i].x, tab[i].y);
354.Ed 354.Ed
355.El 355.El
356.Pp 356.Pp
357Similarly, if you want to create an object, initialize it, and then 357Similarly, if you want to create an object, initialize it, and then
358publish it to be used by another thread, then you should: 358publish it to be used by another thread, then you should:
359.Bl -enum 359.Bl -enum
360.It 360.It
361Perform memory operations to initialize the object. 361Perform memory operations to initialize the object.
362.Bd -literal 362.Bd -literal
363 struct mumble *m = kmem_alloc(sizeof(*m), KM_SLEEP); 363 struct mumble *m = kmem_alloc(sizeof(*m), KM_SLEEP);
364 m->x = x; 364 m->x = x;
365 m->y = y; 365 m->y = y;
366 m->z = m->x + m->y; 366 m->z = m->x + m->y;
367.Ed 367.Ed
368.It 368.It
369Issue 369Issue
370.Fn atomic_store_release 370.Fn atomic_store_release
371to publish it. 371to publish it.
372.Bd -literal 372.Bd -literal
373 atomic_store_release(&the_mumble, m); 373 atomic_store_release(&the_mumble, m);
374.Ed 374.Ed
375.It 375.It
376Possibly in another thread, issue 376Possibly in another thread, issue
377.Fn atomic_load_consume 377.Fn atomic_load_consume
378to get it. 378to get it.
379.Bd -literal 379.Bd -literal
380 struct mumble *m = atomic_load_consume(&the_mumble); 380 struct mumble *m = atomic_load_consume(&the_mumble);
381.Ed 381.Ed
382.It 382.It
383Perform memory operations to use the object's members. 383Perform memory operations to use the object's members.
384.Bd -literal 384.Bd -literal
385 m->y &= m->x; 385 m->y &= m->x;
386 do_things(m->x, m->y, m->z); 386 do_things(m->x, m->y, m->z);
387.Ed 387.Ed
388.El 388.El
389.Pp 389.Pp
390In both examples, assuming that the value written by 390In both examples, assuming that the value written by
391.Fn atomic_store_release 391.Fn atomic_store_release
392in step\~2 392in step\~2
393is read by 393is read by
394.Fn atomic_load_acquire 394.Fn atomic_load_acquire
395or 395or
396.Fn atomic_load_consume 396.Fn atomic_load_consume
397in step\~3, this guarantees that all of the memory operations in 397in step\~3, this guarantees that all of the memory operations in
398step\~1 complete before any of the memory operations in step\~4 \(em 398step\~1 complete before any of the memory operations in step\~4 \(em
399even if they happen on different CPUs. 399even if they happen on different CPUs.
400.Pp 400.Pp
401Without 401Without
402.Em both 402.Em both
403the release operation in step\~2 403the release operation in step\~2
404.Em and 404.Em and
405the acquire or consume operation in step\~3, no ordering is guaranteed 405the acquire or consume operation in step\~3, no ordering is guaranteed
406between the memory operations in steps\~1 and\~4. 406between the memory operations in steps\~1 and\~4.
407In fact, without 407In fact, without
408.Em both 408.Em both
409release and acquire/consume, even the assignment 409release and acquire/consume, even the assignment
410.Li m->z\ =\ m->x\ +\ m->y 410.Li m->z\ =\ m->x\ +\ m->y
411in step\~1 might read values of 411in step\~1 might read values of
412.Li m->x 412.Li m->x
413and 413and
414.Li m->y 414.Li m->y
415that were written in step\~4. 415that were written in step\~4.
416.Ss ACQUIRE OR CONSUME? 416.Ss ACQUIRE OR CONSUME?
417You must use 417You must use
418.Fn atomic_load_acquire 418.Fn atomic_load_acquire
419when subsequent memory operations in program order that must happen 419when subsequent memory operations in program order that must happen
420after the load are on objects at 420after the load are on objects at
421.Em addresses that might not depend arithmetically on the resulting value . 421.Em addresses that might not depend arithmetically on the resulting value .
422This applies particularly when the choice of whether to do the 422This applies particularly when the choice of whether to do the
423subsequent memory operation depends on a 423subsequent memory operation depends on a
424.Em control-flow decision based on the resulting value : 424.Em control-flow decision based on the resulting value :
425.Bd -literal 425.Bd -literal
426 struct gadget { 426 struct gadget {
427 int ready, x; 427 int ready, x;
428 } the_gadget; 428 } the_gadget;
429 429
430 /* Producer */ 430 /* Producer */
431 the_gadget.x = 42; 431 the_gadget.x = 42;
432 atomic_store_release(&the_gadget.ready, 1); 432 atomic_store_release(&the_gadget.ready, 1);
433 433
434 /* Consumer */ 434 /* Consumer */
435 if (atomic_load_acquire(&the_gadget.ready) == 0) 435 if (atomic_load_acquire(&the_gadget.ready) == 0)
436 return EWOULDBLOCK; 436 return EWOULDBLOCK;
437 int x = the_gadget.x; 437 int x = the_gadget.x;
438.Ed 438.Ed
439.Pp 439.Pp
440Here the 440Here the
441.Em decision of whether to load 441.Em decision of whether to load
442.Li the_gadget.x 442.Li the_gadget.x
443depends on a control-flow decision depending on value loaded from 443depends on a control-flow decision depending on value loaded from
444.Li the_gadget.ready , 444.Li the_gadget.ready ,
445and loading 445and loading
446.Li the_gadget.x 446.Li the_gadget.x
447must happen after loading 447must happen after loading
448.Li the_gadget.ready . 448.Li the_gadget.ready .
449Using 449Using
450.Fn atomic_load_acquire 450.Fn atomic_load_acquire
451guarantees that the compiler and CPU do not conspire to load 451guarantees that the compiler and CPU do not conspire to load
452.Li the_gadget.x 452.Li the_gadget.x
453before we have ascertained that it is ready. 453before we have ascertained that it is ready.
454.Pp 454.Pp
455You may use 455You may use
456.Fn atomic_load_consume 456.Fn atomic_load_consume
457if all subsequent memory operations in program order that must happen 457if all subsequent memory operations in program order that must happen
458after the load are performed on objects at 458after the load are performed on objects at
459.Em addresses computed arithmetically from the resulting value , 459.Em addresses computed arithmetically from the resulting value ,
460such as loading a pointer to a structure object and then dereferencing 460such as loading a pointer to a structure object and then dereferencing
461it: 461it:
462.Bd -literal 462.Bd -literal
463 struct gizmo { 463 struct gizmo {
464 int x, y, z; 464 int x, y, z;
465 }; 465 };
466 struct gizmo null_gizmo; 466 struct gizmo null_gizmo;
467 struct gizmo *the_gizmo = &null_gizmo; 467 struct gizmo *the_gizmo = &null_gizmo;
468 468
469 /* Producer */ 469 /* Producer */
470 struct gizmo *g = kmem_alloc(sizeof(*g), KM_SLEEP); 470 struct gizmo *g = kmem_alloc(sizeof(*g), KM_SLEEP);
471 g->x = 12; 471 g->x = 12;
472 g->y = 34; 472 g->y = 34;
473 g->z = 56; 473 g->z = 56;
474 atomic_store_release(&the_gizmo, g); 474 atomic_store_release(&the_gizmo, g);
475 475
476 /* Consumer */ 476 /* Consumer */
477 struct gizmo *g = atomic_load_consume(&the_gizmo); 477 struct gizmo *g = atomic_load_consume(&the_gizmo);
478 int y = g->y; 478 int y = g->y;
479.Ed 479.Ed
480.Pp 480.Pp
481Here the 481Here the
482.Em address 482.Em address
483of 483of
484.Li g->y 484.Li g->y
485depends on the value of the pointer loaded from 485depends on the value of the pointer loaded from
486.Li the_gizmo . 486.Li the_gizmo .
487Using 487Using
488.Fn atomic_load_consume 488.Fn atomic_load_consume
489guarantees that we do not witness a stale cache for that address. 489guarantees that we do not witness a stale cache for that address.
490.Pp 490.Pp
491In some cases it may be unclear. 491In some cases it may be unclear.
492For example: 492For example:
493.Bd -literal 493.Bd -literal
494 int x[2]; 494 int x[2];
495 bool b; 495 bool b;
496 496
497 /* Producer */ 497 /* Producer */
498 x[0] = 42; 498 x[0] = 42;
499 atomic_store_release(&b, 0); 499 atomic_store_release(&b, 0);
500 500
501 /* Consumer 1 */ 501 /* Consumer 1 */
502 int y = atomic_load_???(&b) ? x[0] : x[1]; 502 int y = atomic_load_???(&b) ? x[0] : x[1];
503 503
504 /* Consumer 2 */ 504 /* Consumer 2 */
505 int y = x[atomic_load_???(&b) ? 0 : 1]; 505 int y = x[atomic_load_???(&b) ? 0 : 1];
506 506
507 /* Consumer 3 */ 507 /* Consumer 3 */
508 int y = x[atomic_load_???(&b) ^ 1]; 508 int y = x[atomic_load_???(&b) ^ 1];
509.Ed 509.Ed
510.Pp 510.Pp
511Although the three consumers seem to be equivalent, by the letter of 511Although the three consumers seem to be equivalent, by the letter of
512C11 consumers\~1 and\~2 require 512C11 consumers\~1 and\~2 require
513.Fn atomic_load_acquire 513.Fn atomic_load_acquire
514because the value determines the address of a subsequent load only via 514because the value determines the address of a subsequent load only via
515control-flow decisions in the 515control-flow decisions in the
516.Li ?: 516.Li ?:
517operator, whereas consumer\~3 can use 517operator, whereas consumer\~3 can use
518.Fn atomic_load_consume . 518.Fn atomic_load_consume .
519However, if you're not sure, you should err on the side of 519However, if you're not sure, you should err on the side of
520.Fn atomic_load_acquire 520.Fn atomic_load_acquire
521until C11 implementations have ironed out the kinks in the semantics. 521until C11 implementations have ironed out the kinks in the semantics.
522.Pp 522.Pp
523On all CPUs other than DEC Alpha, 523On all CPUs other than DEC Alpha,
524.Fn atomic_load_consume 524.Fn atomic_load_consume
525is cheap \(em it is identical to 525is cheap \(em it is identical to
526.Fn atomic_load_relaxed . 526.Fn atomic_load_relaxed .
527In contrast, 527In contrast,
528.Fn atomic_load_acquire 528.Fn atomic_load_acquire
529usually implies an expensive memory barrier. 529usually implies an expensive memory barrier.
530.Ss SIZE AND ALIGNMENT 530.Ss SIZE AND ALIGNMENT
531The pointer 531The pointer
532.Fa p 532.Fa p
533must be aligned \(em that is, if the object it points to is 533must be aligned \(em that is, if the object it points to is
534.\" 534.\"
5352\c 5352\c
536.ie t \s-2\v'-0.4m'n\v'+0.4m'\s+2 536.ie t \s-2\v'-0.4m'n\v'+0.4m'\s+2
537.el ^n 537.el ^n
538.\" 538.\"
539bytes long, then the low-order 539bytes long, then the low-order
540.Ar n 540.Ar n
541bits of 541bits of
542.Fa p 542.Fa p
543must be zero. 543must be zero.
544.Pp 544.Pp
545All 545All
546.Nx 546.Nx
547ports support atomic loads and stores on units of data up to 32 bits. 547ports support atomic loads and stores on units of data up to 32 bits.
548Some ports additionally support atomic loads and stores on larger 548Some ports additionally support atomic loads and stores on larger
549quantities, like 64-bit quantities, if 549quantities, like 64-bit quantities, if
550.Dv __HAVE_ATOMIC64_LOADSTORE 550.Dv __HAVE_ATOMIC64_LOADSTORE
551is defined. 551is defined.
552The macros are not allowed on larger quantities of data than the port 552The macros are not allowed on larger quantities of data than the port
553supports atomically; attempts to use them for such quantities should 553supports atomically; attempts to use them for such quantities should
554result in a compile-time assertion failure. 554result in a compile-time assertion failure.
555.Pp 555.Pp
556For example, as long as you use 556For example, as long as you use
557.Fn atomic_store_* 557.Fn atomic_store_*
558to write a 32-bit quantity, you can safely use 558to write a 32-bit quantity, you can safely use
559.Fn atomic_load_relaxed 559.Fn atomic_load_relaxed
560to optimistically read it outside a lock, but for a 64-bit quantity it 560to optimistically read it outside a lock, but for a 64-bit quantity it
561must be conditional on 561must be conditional on
562.Dv __HAVE_ATOMIC64_LOADSTORE 562.Dv __HAVE_ATOMIC64_LOADSTORE
563\(em otherwise it will lead to compile-time errors on platforms without 563\(em otherwise it will lead to compile-time errors on platforms without
56464-bit atomic loads and stores: 56464-bit atomic loads and stores:
565.Bd -literal 565.Bd -literal
566 struct foo { 566 struct foo {
567 kmutex_t f_lock; 567 kmutex_t f_lock;
568 uint32_t f_refcnt; 568 uint32_t f_refcnt;
569 uint64_t f_ticket; 569 uint64_t f_ticket;
570 }; 570 };
571 571
572 if (atomic_load_relaxed(&foo->f_refcnt) == 0) 572 if (atomic_load_relaxed(&foo->f_refcnt) == 0)
573 return 123; 573 return 123;
574#ifdef __HAVE_ATOMIC64_LOADSTORE 574#ifdef __HAVE_ATOMIC64_LOADSTORE
575 if (atomic_load_relaxed(&foo->f_ticket) == ticket) 575 if (atomic_load_relaxed(&foo->f_ticket) == ticket)
576 return 123; 576 return 123;
577#endif 577#endif
578 mutex_enter(&foo->f_lock); 578 mutex_enter(&foo->f_lock);
579 if (foo->f_refcnt == 0 || foo->f_ticket == ticket) 579 if (foo->f_refcnt == 0 || foo->f_ticket == ticket)
580 ret = 123; 580 ret = 123;
581 ... 581 ...
582#ifdef __HAVE_ATOMIC64_LOADSTORE 582#ifdef __HAVE_ATOMIC64_LOADSTORE
583 atomic_store_relaxed(&foo->f_ticket, foo->f_ticket + 1); 583 atomic_store_relaxed(&foo->f_ticket, foo->f_ticket + 1);
584#else 584#else
585 foo->f_ticket++; 585 foo->f_ticket++;
586#endif 586#endif
587 ... 587 ...
588 mutex_exit(&foo->f_lock); 588 mutex_exit(&foo->f_lock);
589.Ed 589.Ed
590.Sh C11 COMPATIBILITY 590.Sh C11 COMPATIBILITY
591These macros are meant to follow 591These macros are meant to follow
592.Tn C11 592.Tn C11
593semantics, in terms of 593semantics, in terms of
594.Li atomic_load_explicit() 594.Li atomic_load_explicit()
595and 595and
596.Li atomic_store_explicit() 596.Li atomic_store_explicit()
597with the appropriate memory order specifiers, and are meant to make 597with the appropriate memory order specifiers, and are meant to make
598future adoption of the 598future adoption of the
599.Tn C11 599.Tn C11
600atomic API easier. 600atomic API easier.
601Eventually it may be mandatory to use the 601Eventually it may be mandatory to use the
602.Tn C11 602.Tn C11
603.Vt _Atomic 603.Vt _Atomic
604type qualifier or equivalent for the operands. 604type qualifier or equivalent for the operands.
605.Sh LINUX ANALOGUES 605.Sh LINUX ANALOGUES
606The Linux kernel provides two macros 606The Linux kernel provides two macros
607.Li READ_ONCE(x) 607.Li READ_ONCE(x)
608and 608and
609.Li WRITE_ONCE(x,\ v) 609.Li WRITE_ONCE(x,\ v)
610which are similar to 610which are similar to
611.Li atomic_load_consume(&x) 611.Li atomic_load_consume(&x)
612and 612and
613.Li atomic_store_relaxed(&x,\ v) , 613.Li atomic_store_relaxed(&x,\ v) ,
614respectively. 614respectively.
615However, while Linux's 615However, while Linux's
616.Li READ_ONCE 616.Li READ_ONCE
617and 617and
618.Li WRITE_ONCE 618.Li WRITE_ONCE
619prevent fusing, they may in some cases be torn \(em and therefore fail 619prevent fusing, they may in some cases be torn \(em and therefore fail
620to guarantee atomicity \(em because: 620to guarantee atomicity \(em because:
621.Bl -bullet 621.Bl -bullet
622.It 622.It
623They do not require the address 623They do not require the address
624.Li "&x" 624.Li "&x"
625to be aligned. 625to be aligned.
626.It 626.It
627They do not require 627They do not require
628.Li sizeof(x) 628.Li sizeof(x)
629to be at most the largest size of available atomic loads and stores on 629to be at most the largest size of available atomic loads and stores on
630the host architecture. 630the host architecture.
631.El 631.El
632.Sh MEMORY BARRIERS AND ATOMIC READ/MODIFY/WRITE 632.Sh MEMORY BARRIERS AND ATOMIC READ/MODIFY/WRITE
633The atomic read/modify/write operations in 633The atomic read/modify/write operations in
634.Xr atomic_ops 3 634.Xr atomic_ops 3
635have relaxed ordering by default, but can be combined with the memory 635have relaxed ordering by default, but can be combined with the memory
636barriers in 636barriers in
637.Xr membar_ops 3 637.Xr membar_ops 3
638for the same effect as an acquire operation and a release operation for 638for the same effect as an acquire operation and a release operation for
639the purposes of pairing with 639the purposes of pairing with
640.Fn atomic_store_release 640.Fn atomic_store_release
641and 641and
642.Fn atomic_load_acquire 642.Fn atomic_load_acquire
643or 643or
644.Fn atomic_load_consume . 644.Fn atomic_load_consume .
645If 645If
646.Li atomic_r/m/w() 646.Li atomic_r/m/w()
647is an atomic read/modify/write operation in 647is an atomic read/modify/write operation in
648.Xr atomic_ops 3 , 648.Xr atomic_ops 3 ,
649then 649then
650.Bd -literal 650.Bd -literal
651 membar_exit(); 651 membar_exit();
652 atomic_r/m/w(obj, ...); 652 atomic_r/m/w(obj, ...);
653.Ed 653.Ed
654.Pp 654.Pp
655functions like a release operation on 655functions like a release operation on
656.Va obj , 656.Va obj ,
657and 657and
658.Bd -literal 658.Bd -literal
659 atomic_r/m/w(obj, ...); 659 atomic_r/m/w(obj, ...);
660 membar_enter(); 660 membar_enter();
661.Ed 661.Ed
662.Pp 662.Pp
663functions like a acquire operation on 663functions like a acquire operation on
664.Va obj . 664.Va obj .
665.Pp 665.Pp
666.Sy WARNING : 666.Sy WARNING :
667The combination of 667The combination of
668.Fn atomic_load_relaxed 668.Fn atomic_load_relaxed
669and 669and
670.Xr membar_enter 3 670.Xr membar_enter 3
671.Em does not 671.Em does not
672make an acquire operation; only read/modify/write atomics may be 672make an acquire operation; only read/modify/write atomics may be
673combined with 673combined with
674.Xr membar_enter 3 674.Xr membar_enter 3
675this way. 675this way.
676.Pp 676.Pp
677On architectures where 677On architectures where
678.Dv __HAVE_ATOMIC_AS_MEMBAR 678.Dv __HAVE_ATOMIC_AS_MEMBAR
679is defined, all the 679is defined, all the
680.Xr atomic_ops 3 680.Xr atomic_ops 3
681imply release and acquire operations, so the 681imply release and acquire operations, so the
682.Xr membar_enter 3 682.Xr membar_enter 3
683and 683and
684.Xr membar_exit 3 684.Xr membar_exit 3
685are redundant. 685are redundant.
686.Sh EXAMPLES 686.Sh EXAMPLES
687Maintaining lossy counters. 687Maintaining lossy counters.
688These may lose some counts, because the read/modify/write cycle as a 688These may lose some counts, because the read/modify/write cycle as a
689whole is not atomic. 689whole is not atomic.
690But this guarantees that the count will increase by at most one each 690But this guarantees that the count will increase by at most one each
691time. 691time.
692In contrast, without atomic operations, in principle a write to a 692In contrast, without atomic operations, in principle a write to a
69332-bit counter might be torn into multiple smaller stores, which could 69332-bit counter might be torn into multiple smaller stores, which could
694appear to happen out of order from another CPU's perspective, leading 694appear to happen out of order from another CPU's perspective, leading
695to nonsensical counter readouts. 695to nonsensical counter readouts.
696(For frequent events, consider using per-CPU counters instead in 696(For frequent events, consider using per-CPU counters instead in
697practice.) 697practice.)
698.Bd -literal 698.Bd -literal
699 unsigned count; 699 unsigned count;
700 700
701 void 701 void
702 record_event(void) 702 record_event(void)
703 { 703 {
704 atomic_store_relaxed(&count, 704 atomic_store_relaxed(&count,
705 1 + atomic_load_relaxed(&count)); 705 1 + atomic_load_relaxed(&count));
706 } 706 }
707 707
708 unsigned 708 unsigned
709 read_event_count(void) 709 read_event_count(void)
710 { 710 {
711 return atomic_load_relaxed(&count); 711 return atomic_load_relaxed(&count);
712 } 712 }
713.Ed 713.Ed
714.Pp 714.Pp
715Initialization barrier. 715Initialization barrier.
716.Bd -literal 716.Bd -literal
717 int ready; 717 int ready;
718 struct data d; 718 struct data d;
719 719
720 void 720 void
721 setup_and_notify(void) 721 setup_and_notify(void)
722 { 722 {
723 setup_data(&d.things); 723 setup_data(&d.things);
724 atomic_store_release(&ready, 1); 724 atomic_store_release(&ready, 1);
725 } 725 }
726 726
727 void 727 void
728 try_if_ready(void) 728 try_if_ready(void)
729 { 729 {
730 if (atomic_load_acquire(&ready)) 730 if (atomic_load_acquire(&ready))
731 do_stuff(d.things); 731 do_stuff(d.things);
732 } 732 }
733.Ed 733.Ed
734.Pp 734.Pp
735Publishing a pointer to the current snapshot of data. 735Publishing a pointer to the current snapshot of data.
736(Caller must arrange that only one call to 736(Caller must arrange that only one call to
737.Li take_snapshot() 737.Li take_snapshot()
738happens at any 738happens at any
739given time; generally this should be done in coordination with 739given time; generally this should be done in coordination with
740.Xr pserialize 9 740.Xr pserialize 9
741or similar to enable resource reclamation.) 741or similar to enable resource reclamation.)
742.Bd -literal 742.Bd -literal
743 struct data *current_d; 743 struct data *current_d;
744 744
745 void 745 void
746 take_snapshot(void) 746 take_snapshot(void)
747 { 747 {
748 struct data *d = kmem_alloc(sizeof(*d)); 748 struct data *d = kmem_alloc(sizeof(*d));
749 749
750 d->things = ...; 750 d->things = ...;
751 751
752 atomic_store_release(&current_d, d); 752 atomic_store_release(&current_d, d);
753 } 753 }
754 754
755 struct data * 755 struct data *
756 get_snapshot(void) 756 get_snapshot(void)
757 { 757 {
758 return atomic_load_consume(&current_d); 758 return atomic_load_consume(&current_d);
759 } 759 }
760.Ed 760.Ed
761.Sh CODE REFERENCES 761.Sh CODE REFERENCES
762.Pa sys/sys/atomic.h 762.Pa sys/sys/atomic.h
763.Sh SEE ALSO 763.Sh SEE ALSO
764.Xr atomic_ops 3 , 764.Xr atomic_ops 3 ,
765.Xr membar_ops 3 , 765.Xr membar_ops 3 ,
766.Xr pserialize 9 766.Xr pserialize 9
767.Sh HISTORY 767.Sh HISTORY
768These atomic operations first appeared in 768These atomic operations first appeared in
769.Nx 10.0 . 769.Nx 9.0 .
770.Sh CAVEATS 770.Sh CAVEATS
771C11 formally specifies that all subexpressions, except the left 771C11 formally specifies that all subexpressions, except the left
772operands of the 772operands of the
773.Ql && , 773.Ql && ,
774.Ql || , 774.Ql || ,
775.Ql ?: , 775.Ql ?: ,
776and 776and
777.Ql \&, 777.Ql \&,
778operators and the 778operators and the
779.Li kill_dependency() 779.Li kill_dependency()
780macro, carry dependencies for which 780macro, carry dependencies for which
781.Dv memory_order_consume 781.Dv memory_order_consume
782guarantees ordering, but most or all implementations to date simply 782guarantees ordering, but most or all implementations to date simply
783treat 783treat
784.Dv memory_order_consume 784.Dv memory_order_consume
785as 785as
786.Dv memory_order_acquire 786.Dv memory_order_acquire
787and do not take advantage of data dependencies to elide costly memory 787and do not take advantage of data dependencies to elide costly memory
788barriers or load-acquire CPU instructions. 788barriers or load-acquire CPU instructions.
789.Pp 789.Pp
790Instead, we implement 790Instead, we implement
791.Fn atomic_load_consume 791.Fn atomic_load_consume
792as 792as
793.Fn atomic_load_relaxed 793.Fn atomic_load_relaxed
794followed by 794followed by
795.Xr membar_datadep_consumer 3 , 795.Xr membar_datadep_consumer 3 ,
796which is equivalent to 796which is equivalent to
797.Xr membar_consumer 3 797.Xr membar_consumer 3
798on DEC Alpha and 798on DEC Alpha and
799.Xr __insn_barrier 3 799.Xr __insn_barrier 3
800elsewhere. 800elsewhere.
801.Sh BUGS 801.Sh BUGS
802Some idiot decided to call it 802Some idiot decided to call it
803.Em tearing , 803.Em tearing ,
804depriving us of the opportunity to say that atomic operations prevent 804depriving us of the opportunity to say that atomic operations prevent
805fusion and 805fusion and
806.Em fission . 806.Em fission .