Memory ordering: Difference between revisions
m →Compiler support for hardware memory barriers: formatting improvements |
m Explanation of abbreviation. |
||
Line 9: | Line 9: | ||
Compilers can also reorder operations (when?) to hide memory latencies. There are memory barriers, which prevent compiler to move memory operations around it.--> |
Compilers can also reorder operations (when?) to hide memory latencies. There are memory barriers, which prevent compiler to move memory operations around it.--> |
||
== In SMP microprocessor systems == |
== In Symmetric multiprocessing (SMP) microprocessor systems == |
||
There are several memory-consistency models for [[Symmetric multiprocessing|SMP]] systems: |
There are several memory-consistency models for [[Symmetric multiprocessing|SMP]] systems: |
Revision as of 04:31, 15 October 2012
![]() | It has been suggested that this article be merged with Memory barrier. (Discuss) Proposed since April 2010. |
![]() |
Memory ordering is a group of properties of the modern microprocessors, characterising their possibilities in memory operations reordering. It is a type of out-of-order execution. Memory reordering can be used to fully utilize different cache and memory banks.
On most modern uniprocessors memory operations are not executed in the order specified by the program code. But in singlethreaded programs from the programmer's point of view, all operations appear to have been executed in the order specified, with all inconsistencies hidden by hardware.
In Symmetric multiprocessing (SMP) microprocessor systems
There are several memory-consistency models for SMP systems:
- sequential consistency (All reads and all writes are in-order)
- relaxed consistency (Some types of reordering are allowed)
- Loads can be reordered after Loads (for better working of cache coherency, better scaling)
- Loads can be reordered after Stores
- Stores can be reordered after Stores
- Stores can be reordered after Loads
- weak consistency (Reads and Writes are arbitrarily reordered, limited only by explicit memory barriers)
On some CPUs
- atomic operations can be reordered with Loads and Stores.
- there can be incoherent instruction cache pipeline, which prevent self-modifying code to be executed without special ICache flush/reload instructions.
- dependent loads can be reordered (this is unique for Alpha). If the processor fetches a pointer to some data after this reordering, it might not fetch the data itself but use stale data which it has already cached and not yet invalidated. Allowing this relaxation makes cache hardware simpler and faster but leads to the requirement of memory barriers for readers and writers.[1]
Type | Alpha | ARMv7 | PA-RISC | POWER | SPARC RMO | SPARC PSO | SPARC TSO | x86 | x86 oostore | AMD64 | IA64 | zSeries |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Loads reordered after Loads | Y | Y | Y | Y | Y | Y | Y | |||||
Loads reordered after Stores | Y | Y | Y | Y | Y | Y | Y | |||||
Stores reordered after Stores | Y | Y | Y | Y | Y | Y | Y | Y | ||||
Stores reordered after Loads | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y |
Atomic reordered with Loads | Y | Y | Y | Y | Y | |||||||
Atomic reordered with Stores | Y | Y | Y | Y | Y | Y | ||||||
Dependent Loads reordered | Y | |||||||||||
Incoherent Instruction cache pipeline | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y |
Some older x86 and AMD systems have weaker memory ordering[4]
SPARC memory ordering modes:
- SPARC TSO = total-store order (default)
- SPARC RMO = relaxed-memory order (not supported on recent CPUs)
- SPARC PSO = partial store order (not supported on recent CPUs)
Memory barrier types
Compiler memory barrier
These barriers prevent a compiler from reordering instructions, they do not prevent reordering by CPU.
- The GNU inline assembler statement
asm volatile("" ::: "memory");
or even
__asm__ __volatile__ ("" ::: "memory");
forbids GCC compiler to reorder read and write commands around it.[5]
- Intel ECC compiler uses "full compiler fence"
__memory_barrier()
- Microsoft Visual C++ Compiler:[8]
_ReadWriteBarrier()
Hardware memory barrier
Many architectures with SMP support have special hardware instruction for flushing reads and writes.
lfence (asm), void_mm_lfence(void) sfence (asm), void_mm_sfence(void) [9] mfence (asm), void_mm_mfence(void) [10]
sync (asm)
dcs (asm)
dmb (asm)
Compiler support for hardware memory barriers
Some compilers support builtins that emit hardware memory barrier instructions:
- GCC[11], version 4.4.0 and later[12], has
__sync_synchronize
. - The Microsoft Visual C++ compiler[13] has
MemoryBarrier()
. - Sun Studio Compiler Suite[14] has
__machine_r_barrier
,__machine_w_barrier
and__machine_rw_barrier
.
See also
References
- ^ Reordering on an Alpha processor by Kourosh Gharachorloo
- ^ Memory Ordering in Modern Microprocessors by Paul McKenney
- ^ Memory Barriers: a Hardware View for Software Hackers, Figure 5 on Page 16
- ^ Table 1. Summary of Memory Ordering, from "Memory Ordering in Modern Microprocessors, Part I"
- ^ GCC compiler-gcc.h
- ^ ECC compiler-intel.h
- ^ Intel(R) C++ Compiler Intrinsics Reference
Creates a barrier across which the compiler will not schedule any data access instruction. The compiler may allocate local data in registers across a memory barrier, but not global data.
- ^ Visual C++ Language Reference _ReadWriteBarrier
- ^ SFENCE — Store Fence
- ^ MFENCE — Memory Fence
- ^ Atomic Builtins
- ^ http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36793
- ^ MemoryBarrier macro
- ^ Handling Memory Ordering in Multithreaded Applications with Oracle Solaris Studio 12 Update 2: Part 2, Memory Barriers and Memory Fence [1]
Further reading
- Computer Architecture — A quantitative approach. 4th edition. J Hennessy, D Patterson, 2007. Chapter 4.6
- Sarita V. Adve, Kourosh Gharachorloo, Shared Memory Consistency Models: A Tutorial
- Intel 64 Architecture Memory Ordering White Paper
- Memory ordering in Modern Microprocessors part 1
- Memory ordering in Modern Microprocessors part 2
- IA (Intel Architecture) Memory Ordering on YouTube - Google Tech Talk
- Articles to be merged from April 2010
- Articles needing cleanup from November 2009
- Cleanup tagged articles without a reason field from November 2009
- Wikipedia pages needing cleanup from November 2009
- Computer architecture
- Computer memory
- Consistency models
- Compiler construction
- Programming language design
- Run-time systems
- Concurrency (computer science)