Specifying Weak Memory Consistency with Temporal Logic Maximilian Senftleben and Klaus Schneider Department of Computer Science University of Kaiserslautern http://es.cs.uni-kl.de Abstract. Modern multiprocessors only provide weak memory consis- tency that deviates from the traditional sequential consistency in many unintuitive ways: In essence, the read and write operations issued by one processor might be observed by other processors in a different order than executed by the issuing processor. It is therefore strongly recommended to formally verify multithreaded systems. To this end, it is necessary to offer unambiguous and complete descriptions of the used memory consis- tency models. Different formal methods have been used so far to specify the possible executions under particular weak memory consistency mod- els. However, some of them cannot be directly used for verification by model checking. In this paper, we present a new way for specifying weak memory consistency models that is based on temporal logic. Using Lin- ear Temporal Logic (LTL) to define weak memory systems, we can eas- ily check properties of one and the same multithreaded program against several consistency models. This way, we can determine the weakest con- sistency guarantees required to ensure given specifications. Keywords: temporal logic, model checking, weak memory consistency, specification. 1 Introduction Communication between several threads over a shared memory is one of the ma- jor bottlenecks in modern multiprocessor systems. As processors have to compete for accessing the shared memory and have to synchronize their memory opera- tions, the processor performance would significantly decrease with an increasing number of processors if no countermeasures would be taken: all modern multi- core and multiprocessor systems therefore employ optimizations in their memory architecture to further increase their performance. While some of these optimizations do not modify the result of programs like the use of local caches with a snooping-based cache coherence protocol [27], other optimizations like store buffers might sometimes give unexpected results [33]. To be more precise, snooping-based cache coherence protocols maintain the sequentially consistent behavior [20] as known from the execution of processes on uniprocessor systems while other optimizations weaken the consistency to allow 2 Maximilian Senftleben and Klaus Schneider further behaviors that are not possible by simply interleaving the behaviors of the threads. A memory system is thereby called weak if different processors may disagree about the order of observed memory operations. As single-threaded programs still account for a large part of the workload of general purpose processors, weak memory goes unnoticed most of the time. Multithreaded programs which rely on inter-thread communication however may result in different results when executed on a single core or multiple cores. bool b; int v; thread P() { v = value; b = true; } thread Q() { while(not(b)) {}; r = v; } Fig. 1. Two threads P and Q demonstrating weak memory consistency. For example, consider the two processes P and Q shown in Figure 1 and assume variable b has initially the value false. Process P writes a value to variable v and then signals its completion by setting another variable b to true. Process Q observes variable b, and as soon it finds a value true in b, it proceeds with reading the value of variable v. Obviously, this program is written with the assumption that write operations of one process (P) are observed in the same order by all other processes (Q). If we relax this restriction in a weak memory model, then it may happen that Q first observes the assignment to b and proceeds with reading from v even though the assignment to v is not yet visible to Q, resulting in an unexpected behavior. Each possible optimization of a memory system may result in different pos- sible behaviors, and therefore define their own so-called memory consistency model. For example, current multicore processors maintain for each core a load/ store buffer where they buffer their load/store instructions to be issued to the memory system. While these instructions may not yet have been accepted by the memory system, they can already be seen by the processor core that owns the buffer. If even different buffers were used by one core, e.g., for different mem- ory banks, this leads to a weak memory consistency model where assignments executed later by a processor may arrive first in memory. Specifying Weak Memory Consistency with Temporal Logic 3 SC PC-G PC-D GAO TSO GPO+GDO CAUSAL PSO PRAM-M CC PRAM GWO SLOW LOCAL Fig. 2. Hierarchy of weak memory models (see [32]). Many different optimizations of memory architectures have been implemented so far that lead to different weak memory consistency models. Figure 2 gives an overview of some of these models (see also [24, 1, 32, 13] for recent surveys). A memory model is thereby called weaker than another one if it allows more possi- ble executions as denoted by the arrows in the figure. For example, all executions that are sequentially consistent (SC) are total store order consistent (TSO) as well, but there may exist TSO-consistent executions that are not sequentially consistent. Since the presence of weak memory models often leads to unintuitive be- haviors, it is strongly recommended to formally verify multithreaded programs taking the underlying weak memory model of the hardware platform into ac- count. To this end, one obviously needs precise formal descriptions of the possible behaviors of each model that can be used for formal verification of the multi- threaded programs. However, there is still no commonly accepted formalism to define weak memory models which makes it also very difficult to formally reason about them (see next section). The original descriptions of the weak memory models were unfortunately written only in natural language and are therefore often ambiguous. Recent efforts use formal methods to specify weak memory models, as e.g., higher order logic [26], and partial orders [32], which are both not well-suited for model checking. In this paper, we propose the use of temporal logic as a new way to for- mally define weak memory models. We emphasize in this paper that the so far mainly used weak memory models can all be conveniently defined by temporal logic. This directly allows us to formally verify temporal logic specifications for a given system with different weak memory models in a flexible way. In particular, we can determine the minimal requirements for a memory system for correctly implementing a multithreaded system. 4 Maximilian Senftleben and Klaus Schneider The outline of the paper is as follows: Section 2 surveys the state of the art in specifying weak memory consistency models. Section 3 is the core of the paper that presents the use of linear temporal logic for specifying different weak memory models. In Section 4, we present first experimental results, and finally list first conclusions in the final section of the paper. 2 Related Work [23] as well as [1] provide recommendable introductions to memory consistency in general. [24] gives a good overview over many of the models known at that time and compares these with each other. [32] introduced a unified framework that derives weak memory models as the composition of four independent basic properties and defined this way a lattice of memory models. In our previous work, we analyzed the complexity of testing whether given execution traces comply with a certain memory model in [12, 13]. In most cases, this problem is NP-complete even if some parameters like the number of variables are kept constant. Other unified frameworks were introduced by [2], [18], and [4]. The first descriptions of weak memory models were only given in an informal manner and sometimes lead to misinterpretations, e.g., the interpretations of [24] and [3] for PRAM-consistency which was informally introduced by [21] were different: In contrast to the definition by [3], the definition by [24] assumes that a processor first updates its local memory before it broadcasts a write operation to other processors. When manufacturers recognized the impact of weakly consistent systems on programming, they provided distinguishing test cases (so-called litmus tests [15, 5]). Litmus tests are either examples of possible weak behaviors or examples of behaviors that may not occur at all[15, 5]. It became quickly clear that the inherent incompleteness given by sample programs as well as the ambiguity given by informal definitions are inadequate for any kind of formal reasoning about multithreaded programs. [25] discusses the problems that arise from ambiguously defined memory models in modern architectures and high-level languages, and [29] revealed that the Java memory model was flawed for many years. It is therefore very important to come up with formally precise, but still com- prehensive definitions of memory models. There are already many established ways of specifying the behavior of different memory systems. In the remainder of this section, we classify the so-far given approaches in different categories. 2.1 Operational Definitions Providing a full system description even at an abstract level would provide an operational semantics of a memory system, but due to the proprietary nature of processors such descriptions are usually not publicly available. Nonetheless, in some cases like publicly available processor cores, the system implementa- tion might be available, and in other cases, an operational semantics can be Specifying Weak Memory Consistency with Temporal Logic 5 re-engineered from an available formal definition [7, 31]. However, the opera- tional models introduce a big burden for formal verification since, in addition to the multithreaded program, one also has to model the underlying hardware platform with its memory transactions for the verification. 2.2 Axiomatic Definitions Another way to specify a weak memory model is to list axioms that have to be satisfied by the allowed behaviors. This has been done first for the SPARC pro- cessors [33], but would have to be re-engineered for the other models. Axiomatic semantics makes use of quantifiers and often of higher order logic [26], and is therefore not well-suited for verification by model checking. 2.3 View-based Definitions View-based definitions describe the ordering of operations seen from the indi- vidual processors or the memory system. This type of definition is superior in that it allows one to abstract from the actual implementation and focuses on the processors’ view which simplifies its use for the programmer. Some view-based definitions use the notion of an execution which consist of write and read oper- ations that might occur in that way, more specifically, the read operations are already annotated with the value they will read. A benefit of view-based defini- tions is that they allow to introduce unified ways of defining different memory models as shown by [32]. This concept can be helpful for analyzing specific cases, but may be hard to follow, as it is rather related with litmus tests than with actual programs because in a program, we usually do not know the read value in advance. 3 Specifying Weak Consistency by Temporal Logic As early as 1981, it was suggested by [28] that temporal logics would be a suitable formalism to describe the behavior of concurrent systems. Since then, specifying and verifying the concurrent behavior of reactive and other systems became one of the success stories of computer science [16] and lead to many Turing awards. Surprisingly, to the best of our knowledge, no one described the behavior of different weak memory models using temporal logic until today. Therefore, this is the first paper that systematically introduces linear time temporal logic (LTL) as an alternative, uniform way to specify weak memory models. In the following, we will first define the general setting using LTL for specifying weak memory models in Section 3.1, and will then apply this to specific memory models in Section 3.2. 3.1 The General Setting Linear Temporal Logic (LTL). LTL is a variant of temporal logic whose models are single execution paths of a system. As LTL is based on a discrete 6 Maximilian Senftleben and Klaus Schneider notion of time, each point of time can be denoted as an integer value. The semantics is defined for a labelled transition system (a Kripke structure) K = (S, I, R, L) that consists of a set of states S, initial states I ⊆ S, a transition relation R ⊆ S ×S, and the label function L that maps each state to the variables that hold there. Each path consists of a sequence of states determined by the transition relation R. An LTL formula is satisfied for a given structure K if it is satisfied for all infinite paths starting in any of the initial states I. The temporal operators used in this paper are: – Gϕ (Globally): ϕ holds in the current state and all future states. – Fϕ (Finally): ϕ eventually holds at least once (now or in future). – Xϕ (Next): ϕ holds in the next state of the path. – [ψ U ϕ] (Until): ϕ holds until the first time when ψ holds, and ψ has to hold eventually.   – ψ U ϕ (Weak Until): ϕ holds until the first time when ψ holds, and ψ may never hold (in which case ϕ holds ad infinitum). For more information about temporal logics, see e.g., [11, 22, 6, 10, 30]. Read/Write Events. In general, a multiprocessor system can be modeled by a set of processors that execute a set of programs connected to a central memory system via a well-defined interface. In the following, we will provide LTL specifications for various memory consistency models. The specifications describe the behavior of the memory system in a specific environment. The environment consists of multiple processors which interact with the memory system via events. For each processor p, we distinguish write events W p and read events Rp . A global unique identifier (an integer value) is assigned to each write event W P denoted by Id (W P ). We write Wip as a shorthand for W p ∧ Id (W p ) = i. A write instruction which is issued multiple times, e.g., in a loop, will result in different write events with different identifiers. The expression Loc(W p ) denotes the memory location to write to, and Val (W p ) the value to be written. Similarly, Loc(Rp ) denotes the memory location to read from, and Val (Rp ) the value returned by the memory system. Note that only Val (Rp ) is an output of the memory system and everything else is considered as an input. In addition to the mentioned events W p and Rp , the specifications use an additional event q O which models that a processor q has observed a write event. Prc(qO) denotes the originating processor and Id (qO) the identifier of the observed write. For conciseness, we write qOip for qO∧Prc(qO) = p∧Id (qO) = i. A processor is assumed to observe at most one write at a time. Similarly to the previous definitions, Loc(qO) denotes the memory location, and Val (qO) the value of the observed write. Minimal Requirements for all Models. To provide a well-defined context for the specifications, the environment has to fulfill certain assumptions. First, quite obviously the same event should not occur more than once (1), next the identifiers of the writes should be globally unique (2), and for each processor Specifying Weak Memory Consistency with Temporal Logic 7 strictly increasing (3). Furthermore, it is assumed that a processor only issues either a write or a read event at a time (4). G(Wip → XG(¬Wip )) (1) _ G(Wip → ¬ (Wiq )) (2) q6=p ^ G(Wip → XG( ¬Wjp )) (3) j≤i G(¬(Wip ∧ Rp )) (4) A system has to satisfy at least some basic properties to be considered a rea- sonable memory system. First, observation events should be causally related to write events, i.e., an observation event Oip may only occur if there was a corre- sponding write event Wiq before (5). Next, a processor should observe each write event only once (6). Naturally, we would like Loc(O) and Val (O) to return the same values as their corresponding write event (7). G (¬ qOip ) U Wip   (5) q p q p G( Oi → XG¬ Oi ) (6) F(Wip ∧ Loc(W p ) = l ∧ Val (W p ) = v) → G(qOip → Loc(qOip ) = l ∧ Val (qOip ) = v)) (7) Read operations should either return the default value (in this paper denoted as ⊥) as long as there was no observed write to that location (8) or the value of the latest observed write event (9). ( [Rq ∧ Loc(Rq ) = l] → Val (Rq ) = ⊥ ) U (qO ∧ Loc(qO) = l) (8) q q q G ( Oi ∧ Val ( O) = v ∧ Loc( O) = l) → (R → Val (Rq ) = v) U (qO ∧ Loc(qO) = l ∧ Id (qO) 6= i)  q  (9) 3.2 Specifying Particular Memory Models In the following, we will introduce the additional properties that are needed to specify some of the known memory models. Local Consistency. Local consistency was defined by [17] as the weakest mem- ory model. It requires each process to observe its own writes in the order they were issued, but allows other processes’ writes to be observed in any order. This correlates to the Read-My-Writes consistency property known in the database community. This property can be expressed in LTL as follows: G(Wip → pOip ) In other words, a write event requires the issuing processor to immediately ob- serve its own write. Figure 3 shows an example execution for local consistency. It shows that local consistency allows reordering of writes of another processor, i.e., only the issuing processor is required to observe its own writes in order. 8 Maximilian Senftleben and Klaus Schneider Write(X = 1) Read(X = 2) Write(X = 2) Read(X = 1) Fig. 3. TestLocal: Local consistent execution which is not Slow consistent. Slow Consistency. Slow consistency [19] was introduced to increase memory performance by reducing consistency maintenance. Slow consistency extends lo- cal consistency by requiring processors to observe the writes of another processor to the same location in the order they were issued. The corresponding LTL rep- resentation of that property is as follows: ^  p G qOip → XG ¬ qOj ∧ Loc(qOip ) = Loc(qOjp )  j≤i This means that if a write is observed then no earlier write of that process to the same location may be observed in the future any more. As shown in Figure 4, slow consistency allows reordering of writes if they do not target the same location. In this case, the write to Y is visible before the second write to X was observed. Write(X = 1) Read(Y = 1) Write(X = 2) Read(X = 1) Write(Y = 1) Read(X = 2) Fig. 4. TestSlow: Slow consistent execution which is not PRAM consistent. PRAM Consistency. Pipelined random access memory (PRAM) consistency was first introduced in 1988 by [21]. PRAM requires each process to respect the order of the writes of other processes, but not their read operations. This means that two writes of the same process will always be observed in the same order by all other processes. Therefore, the PRAM specification extends slow consistency by an additional property:  q p  ^ [FWjp ] → (¬ qOip ) U qOjp   F( Oi ) → j≤i That is, if a processor observes a write operation, then it has to observe all earlier writes of that processor beforehand. While the execution in Figure 5 is PRAM consistent, it is not cache consistent (see next paragraph) as cache consistency requires writes to the same location to be observed in the same order by all processors. Specifying Weak Memory Consistency with Temporal Logic 9 Write(X = 1) Write(X = 2) Read(X = 2) Read(X = 1) Fig. 5. TestPRAM: PRAM consistent execution which is not CC consistent. Cache Consistency. In 1989 [14] introduced weak consistency and claimed that it would be the weakest form of consistency. Later on, after shown not to be the weakest model (see Local or Slow consistency), it became known as Cache consistency (CC). Cache consistency is stronger than slow consistency and extends it by the following property:  q p  q p F Oi ∧ F ( Oj ∧ Loc(qOip ) = Loc(qOjp ))  ^  F(rOip ∧ (F rOjp ))  → r This implies that if a processor observes two writes to the same location, then all other processors observe them in the same order. In the case of Figure 6, the second processor does not observe all writes of the first processor in the order they were issued and therefore can not be PRAM consistent, but as the observations are not in conflict with the location-specific ordering of the first processor, the execution is still cache consistent. Write(X = 1) Read(Y = 1) Write(X = 2) Read(X = 1) Write(Y = 1) Fig. 6. TestCC: CC consistent execution which is not PRAM consistent. 10 Maximilian Senftleben and Klaus Schneider Sequential Consistency. Sequential consistency was defined by [20] and de- fines a behavior that may occur if programs are executed on a single processor (core). It requires all processors to agree upon a single sequential total ordering of all write operations they observe. The first required property (Totality) can be expressed as: ^  G Wip → F qOip  q That means, whenever a write event occurs, then each processor has to observe that write operattion some time in the future. The other property to ensure a unique sequential representation is as follows: 0 0 h i ^h 0 p 0 i F(qOip ∧ F qOjp ) → F(q Oi ∧ F q Ojp ) q0 This implies that if one processor observes two writes in a specific order, then all other processors have to observe these two writes in the same order. 4 Experimental Results The environment and specifications described in the previous section have been implemented in the SMV [9] input language. This allows us to utilize NuSMV [9] and NuXMV [8] for LTL verification either using BDD-based or SAT-based bounded model checking (BMC). For the processor implementation, the envi- ronment follows an assembler representation of the programs or test cases to analyze. 4.1 Property Verification To verify a safety property, we abstract from the actual memory system imple- mentation and only provide a memory specification to check the property against all possible behaviors. Multithreaded programs often come along with parts of the code that requires mutually exclusive access to some variables to achieve the correct behavior, i.e., they contain a critical section. For example, an instance of Peterson’s algorithm should never allow two processes to access the critical section at the same time. When implementing such programs as parallel modules in SMV and expressing the properties in LTL, we can verify them using LTL model checking. This can be written as follows: (Process modules) |= (Memory Spec) → (Property) Using tools like NuSMV or NuXMV, properties can be verified using BDD ap- proaches or by searching for counterexamples using a SAT-based bounded model checking approach. Specifying Weak Memory Consistency with Temporal Logic 11 4.2 Examples and Results In the following, several algorithms are described and analyzed for their minimal required memory models. We start with the well-known mutual exclusion algo- rithm due to Peterson, and then consider a simple consumer-producer algorithm. First, Peterson’s mutual exclusion algorithm (see Figure 7) will be shown to work as expected with sequential consistency, but fails with weaker consistency models. bool flagP, flagQ; int turn, data; thread P() { thread Q() { flagP = true; flagQ = true; turn = 0; turn = 1; while(flagQ & turn==0) {}; while(flagP & turn==1) {}; // Begin of Critical Section // Begin of Critical Section data = data + 1; data = data + 1; data = data - 1; data = data - 1; // End of Critical Section // End of Critical Section flagP = false; flagQ = false; } } Fig. 7. Peterson mutual exclusion protocol Peterson’s mutex algorithm works as follows: Whenever a process wants to enter the critical section, it sets its own flag. Then, it sets the turn variable to the id of the other process. Afterwards, the algorithm checks whether the other process indicated a critical section request with its flag, too. If so, depending on the state of the turn variable, it will either idle as long as the others process flag holds, or it proceeds to the critical section. After a process finished its critical operations, it resets its flag to signal the other process that it is safe to progress. In the example provided the critical section contains of an increment and an decrement of data. Assuming the mutual exclusive execution of the critical sections the value of data should always be either 0 or 1. To examine the memory behavior of that algorithm, it has to be translated in a (pseudo) assembler representation which reveals the individual load-store instructions as shown in Figure 8. The representation uses memory locations already present in the high-level implementation (see Figure 7): f lag[0], f lag[1], turn, and data. Consider the non-atomic instructions data = data+1 and data = data − 1. Assuming the initial value of data is 0, then data should only alternate between 0 and 1 and after both processes are finished we expect it to be 0. But if the reading and writing part of the instructions are interleaved, data may have more intermediate values −1, 0, 1, 2 and either −1, 0, or −1 in the end. Therefore, the instructions can be used to model a critical section, as it adds unexpected 12 Maximilian Senftleben and Klaus Schneider behavior if the mutual exclusion is not ensured. In this case, we would like to verify that the location data is 0 or 1 all the time. 1 write(flag[id],1) 2 write(turn,1-id) 3 reg = read(flag[1-id]) 4 if (reg=0) then goto 7 5 reg = read(turn) 6 if (reg=(1-id)) then goto 3 7 reg = read(data) 8 write(data,reg+1) 9 reg = read(data) 10 write(data,reg-1) 11 write(flag[id],0) 12 goto 12 Fig. 8. Peterson’s algorithm for two processes in an ‘assembler’ representation. Con- stant id is 0 for the first process and 1 for the other. To achieve this verification, we model the instructions depicted in Figure 8 on two processor modules in the SVM input language. Then, we add an LTL spec- ification which reads (ModelSpecification) → (G (data = 0 ∨ data = 1))). This means whenever a path of the state transition system satisfies the specification of the memory model, then we imply that it will never be the case that data is a value other than 0 or 1. Using NuSMV, we proved the safety property to be valid for sequential consistency, and to be invalid for the other models described in this paper: Local, Slow, CC, and PRAM consistency by providing counterexamples. The second algorithm we analyze is a simple producer-consumer algorithm as seen in Figure 9. Producer P writes one data value and then waits until it has been read by consumer C. P signals the availability of data by setting variable ready to true and C signals that it read the value by setting ready to f alse again. bool ready = false; int data = 0; thread P() { thread C() { for(i=0..N-1) { int local[N]; bool done; data = i; for(i=0..N-1) { ready = true; while(!ready) {}; while(ready) {}; local[i] = data; } ready = false; } } done = true; } Fig. 9. Producer-Consumer Algorithm Specifying Weak Memory Consistency with Temporal Logic 13 Clearly, the depicted producer-consumer algorithm works as expected for sequential consistency: Whenever C observes ready to be true, the correspond- ing write to data is visible to C, too. Therefore, there is only a single possible outcome for the values of the local registers in a sequentially consistent environ- ment. Analogously to the first algorithm, using a suitable low-level representa- tion and defining a correctness property like (ModelSpecification) → (G (done → VN −1 i=0 (local[i] = i))), we were able to prove that the algorithm works as expected for sequential consistency. While we were able to find counterexamples for Local, Slow and CC consistency, we could prove that PRAM consistency is sufficient to ensure the specified correctness property. This shows that a more efficient, but weakly consistent memory system can be used for this algorithm without compromising the expected behavior. Note that in these examples, it is quite easy to determine a suitable depth as both processes terminate after a predetermined number of steps. This approach can be used for repetitive programs as well, but suitable bounds have to be determined based on the number of steps required to cover all relevant behavior. Using the described technique, we were able to verify other small litmus-test- alike examples for different memory models, and to disprove them for weaker models. This way, we are able to determine the minimally required consistency models/guarantees to ensure that a given property holds. Verifying against mul- tiple models is as easy as replacing the LTL specification. Neither the processor representation nor the property description have to be changed. 5 Conclusion and Future Work In this paper, we introduced a novel approach to specify weak memory systems using temporal logic. Using temporal logic, we were able to describe the behavior of different memory consistency models, i.e., restricting the allowed read results in correspondence to the history of issued memory write operations. This itself is already a useful result, as it offers a new perspective and makes the topic more accessible for programmers already familiar with property specifications in LTL. Model checking can directly use our LTL specifications, so that we can use established tools to verify multithreaded programs. Moreover, the approach allows us to easily determine the weakest consistency requirements a program needs to satisfy a given property. However, the approach suffers from the state explosion problem as weak consistency considers all possible write events and therefore has to quantify over time, processes, variables, and all possible values. Verification with NuSMV’s BDD model checking of non-trivial examples like the Peterson mutual exclusion introduced in Figure 7 already require several GB of memory to finish. Using NuSMV’s BMC with reasonable bounds allows us to inspect more examples, but inevitable will run out of memory for more complex examples, too. In the future, we would like to provide specifications for more models, includ- ing an analysis how models like TSO act differently than the presented models. Furthermore, we are looking at finding better representations, especially for a 14 Maximilian Senftleben and Klaus Schneider possibility to get rid of the global write identifier. In this context, it may be pos- sible to reuse already computed information like the reachable states for model checking when only the memory model is changed. To strengthen the confidence in the presented LTL specifications, we are interested in proving the equivalence between the LTL specification and reference implementations, and furthermore to prove the relationship between different models using our representations (e.g., that sequential consistency implies PRAM consistency). References 1. S. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial. IEEE Computer, 29(12):66–76, December 1996. 2. S. Adve and M. Hill. A unified formalization of four shared-memory models. IEEE Transactions on Parallel and Distributed Systems (TPDS), 4(6):613–624, June 1993. 3. M. Ahamad, R. Bazzi, R. John, P. Kohli, and G. Neiger. The power of processor consistency. In L. Snyder, editor, Symposium on Parallel Algorithms and Archi- tectures (SPAA), pages 251–260, Velen, Germany, 1993. ACM. 4. J. Alglave. A formal hierarchy of weak memory models. Formal Methods in System Design (FMSD), 41(2):178–210, October 2012. 5. J. Alglave, L. Maranget, S. Sarkar, and P. Sewell. Litmus: Running tests against hardware. In P. Abdulla and K. Leino, editors, Tools and Algorithms for the Construction and Analysis of Systems (TACAS), volume 6605 of LNCS, pages 41–44, Saarbrücken, Germany, 2011. Springer. 6. C. Baier and J.-P. Katoen. Principles of Model Checking. MIT Press, Cambridge, Massachusetts, USA, 2008. 7. G. Boudol and G. Petri. Relaxed memory models: an operational approach. In Z. Shao and B. Pierce, editors, Principles of Programming Languages (POPL), pages 392–403, Savannah, Georgia, USA, 2009. ACM. 8. R. Cavada, A. Cimatti, M. Dorigatti, A. Griggio, A. Mariotti, A. Micheli, S. Mover, M. Roveri, and S. Tonetta. The nuXmv symbolic model checker. In A. Biere and R. Bloem, editors, Computer Aided Verification (CAV), volume 8559 of LNCS, pages 334–342, Vienna, Austria, 2014. Springer. 9. A. Cimatti, E. Clarke, F. Giunchiglia, and M. Roveri. NUSMV: A new symbolic model verifier. In N. Halbwachs and D. Peled, editors, Computer Aided Verification (CAV), volume 1633 of LNCS, pages 495–499, Trento, Italy, 1999. Springer. 10. E. Clarke, O. Grumberg, and D. Peled. Model Checking. MIT Press, 1999. 11. E. Emerson. Temporal and modal logic. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, volume B: Formal Models and Semantics, chap- ter 16, pages 995–1072. Elsevier, 1990. 12. F. Furbach, R. Meyer, K. Schneider, and M. Senftleben. Memory model-aware testing - a unified complexity analysis. In Application of Concurrency to System Design (ACSD), pages 92–101, Tunis La Marsa, Tunisia, 2014. IEEE Computer Society. 13. F. Furbach, R. Meyer, K. Schneider, and M. Senftleben. Memory-model-aware testing – a unified complexity analysis. Transactions on Embedded Computing Systems (TECS), 2015. Specifying Weak Memory Consistency with Temporal Logic 15 14. M. Goodrich. Constructing arrangements optimally in parallel (preliminary ver- sion). In F. Leighton, editor, Symposium on Parallel Algorithms and Architectures (SPAA), pages 169–179, Hilton Head, South Carolina, USA, 1991. ACM. 15. R. Grisenthwaite. Barrier litmus tests and cookbook. Technical Report PRD03- GENC-007826 1.0, ARM Limited, November 2009. 16. O. Grumberg and H. Veith, editors. 25 Years of Model Checking – History, Achieve- ments, Perspectives, volume 5000 of LNCS. Springer, 2008. 17. A. Heddaya and H. Sinha. Coherence, non-coherence and local consistency in distributed shared memory for parallel computing. Technical Report BU-CS-92- 004, Department of Computer Science, Boston University, 1992. 18. L. Higham, J. Kawash, and N. Verwaal. Weak memory consistency models – part I: Definitions and comparisons. Technical Report 98/612/03, Department of Computer Science, University of Calgary, 1998. 19. P. Hutto and M. Ahamad. Slow memory: Weakening consistency to enchance concurrency in distributed shared memories. In International Conference on Dis- tributed Computing Systems (ICDCS), pages 302–309, Paris, France, 1990. IEEE Computer Society. 20. L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers (T-C), 28(9):690–691, September 1979. 21. R. Lipton and J. Sandberg. PRAM: A scalable shared memory. Technical Report CS-TR-180-88, Princeton University, 1988. 22. Z. Manna and A. Pnueli. The temporal Logic of Reactive and Concurrent Systems. Springer, 1992. 23. P. McKenney. Memory barriers: A hardware view for software hackers. http://www.rdrop.com/users/paulmck, June 2010. 24. D. Mosberger. Memory consistency models. ACM SIGOPS: Operating Systems Review, 27(1):18–26, January 1993. 25. F. Nardelli, P. Sewell, J. Ševčı́k, S. Sarkar, S. Owens, L. Maranget, M. Batty, and J. Alglave. Relaxed memory models must be rigorous. In Exploiting Concurrency Efficiently and Correctly (EC2), Snowbird, Utah, USA, 2011. 26. S. Owens, S. Sarkar, and P. Sewell. A better x86 memory model: x86-TSO. In S. Berghofer, T. Nipkow, C. Urban, and M. Wenzel, editors, Theorem Proving in Higher Order Logics (TPHOL), volume 5674 of LNCS, pages 391–407, Munich, Germany, 2009. Springer. 27. M. Papamarcos and J. Patel. A low-overhead coherence solution for multiproces- sors with private cache memories. In 25 Years of the International Symposia on Computer Architecture (ISCA), pages 284–290, Barcelona, Spain, 1998. ACM. 28. A. Pnueli. The temporal semantics of concurrent programs. Theoretical Computer Science (TCS), 13(1):45–60, 1981. 29. W. Pugh. The Java memory model is fatally flawed. Concurrency: Practice and Experience, 12(6):445–455, May 2000. 30. K. Schneider. Verification of Reactive Systems – Formal Methods and Algorithms. Texts in Theoretical Computer Science (EATCS Series). Springer, 2003. 31. M. Senftleben. Operational characterization of weak memory consistency models. Master’s thesis, Department of Computer Science, University of Kaiserslautern, Germany, March 2013. Master. 32. R. Steinke and G. Nutt. A unified theory of shared memory consistency. Journal of the ACM (JACM), 51(5):800–849, September 2004. 33. D. Weaver and T. Germond, editors. The SPARC Architecture Manual-Version 9. Prentice-Hall, Inc., 1994.