Fully Abstract Trace Semantics for Low-level Isolation Mechanisms – Extended Version

Marco Patrignani  Dave Clarke

Report CW651, November 2013
Fully Abstract Trace Semantics for Low-level Isolation Mechanisms – Extended Version

Marco Patrignani        Dave Clarke

Report CW 651, November 2013

Department of Computer Science, K.U.Leuven

Abstract

Many software systems adopt isolation mechanisms of modern processors as software security building blocks. Reasoning about these building blocks means reasoning about elaborate assembly code, which can be very complex due to the loose structure of the code. One way to overcome this complexity is providing the code with a more structured semantics. This paper presents one such semantics, namely a fully abstract trace semantics, for an assembly language enhanced with protection mechanisms of modern processors. The trace semantics represents the behaviour of protected assembly code with simple abstractions, unburdened by low-level details, at the maximum degree of precision. Furthermore, it captures the capabilities of attackers to protected software and simplifies providing a secure compiler targeting the assembly language.
**Abstract**

Many software systems adopt isolation mechanisms of modern processors as software security building blocks. Reasoning about these building blocks means reasoning about elaborate assembly code, which can be very complex due to the loose structure of the code. One way to overcome this complexity is providing the code with a more structured semantics. This paper presents one such semantics, namely a *fully abstract* trace semantics, for an assembly language enhanced with protection mechanisms of modern processors. The trace semantics represents the behaviour of protected assembly code with simple abstractions, unburdened by low-level details, at the maximum degree of precision. Furthermore, it captures the capabilities of attackers to protected software and simplifies providing a secure compiler targeting the assembly language.

1. **Introduction**

Many processors provide isolation mechanisms as software security building blocks. These are used to withstand low-level attackers who, through injected assembly code, can read the whole memory space and access secrets in memory, violate integrity constraints and so on. With isolation mechanisms, attackers cannot directly violate security properties of isolated software since the isolated memory is not accessible to them. Examples of these protection mechanisms are fine grained, program counter-based memory access control mechanism [4, 10, 11, 14, 18–20], which enforce security properties at process or lower levels (Ring-1). With this mechanism, the software to be secured is placed in a protected memory partition that shields it from the surrounding, potentially malicious code. The malicious code cannot read nor write the protected memory; it can only jump to specific addresses in protected memory in order to call functions of the protected code. This protection mechanism makes software more resilient against low-level attackers. However, this does not prevent an attacker from interacting with the protected code and violating security properties.

To enable reasoning about the behaviour of isolated software, this paper firstly introduces an assembly language enhanced with a fine grained, program counter-based memory access control. This protection mechanism was chosen due to its application in secure compilation [2, 15] and to its recent industrial implementation: the Intel SGX [11]. Then, this paper explores the design space of trace semantics and presents two different *fully abstract* [16] trace semantics for the protected assembly code. The paper provides a general proof strategy where full abstraction of the trace semantics can be proven simply. To the best of the authors’ knowledge, this is the first such result for an assembly language extended with the said protection mechanism. Moreover, the results seem applicable to other isolation mechanisms, modulo technical changes.

The fully abstract trace semantics provides a number of benefits. Trace semantics uses simple abstractions to represent the behaviour of protected assembly code, unburdened by low-level details, at the maximum degree of precision. So, dually, it models the behaviour of an attacker to the protected code, since it captures precisely the capabilities of that attacker. Secondly, without the trace semantics, the behaviour of protected assembly code (and thus of an attacker) is given in terms of low-level contexts. While being precise, low-level contexts are notoriously difficult to reason about, since they offer no inductive nor coinductive structure. The fully abstract trace semantics allows low-level contexts to be disregarded, since low-level contexts and traces are proven to be equally precise. Finally, the field of secure compilation benefits from the trace semantics. Recently, Aten et al. [2] and Patrignani et al. [15] presented secure compilers to assembly code enhanced with the aforementioned protection mechanism. Both results assume that the assembly language has a fully abstract trace semantics, thus the results of this paper strengthen those secure compilation claims.

In summary, the contributions of this paper are:

- the formalisation of an assembly language modelling an architecture enhanced with a fine grained, program counter-based memory access control;
- two different trace semantics for that low-level model;
- the proof that both trace semantics are fully abstract in a general proof strategy.

Limitations of this work are twofold. Firstly, the trace semantics cannot express side-channel attacks. Secondly, the formalisation does not consider details of the architecture such as caches; yet this is a often assumed [2, 13, 15, 17].

The paper is organised as follows. Section 2 introduces background notions for the work. Section 3 formalises syntax and operational semantics of the assembly language. Section 4 describes two trace semantics for the language. Section 5 contains the proof of full abstraction of both trace semantics. Section 6 describes related work. Section 7 discusses future works and concludes. The appendices present the proofs of the main theorems.
2. Protected Programs and Trace Semantics

This section describes the memory access control mechanism. Then, it discusses how to turn a trace semantics for protected assembly code into a fully abstract one, thus turning an imprecise representation of the behaviour of the code into a precise one.

2.1 The Protection Mechanism, Informally

The assembly language is enhanced with an isolation mechanism: a fine-grained, program-counter-based memory access control mechanism [4, 10, 14, 18–20]. We review this mechanism from the work of Strackx and Piessens [19]. However, the techniques presented in this paper can be easily adapted to reasoning about other isolation mechanisms [4, 10, 18]. The protection mechanism provides a secure environment for running code that must be protected from the code it interacts with. This mechanism assumes that the memory is logically divided into a protected and an unprotected partition. The protected partition is further divided into a code and a data section. The code section contains a variable number of entry points: the only addresses to which instructions in unprotected memory can jump and execute. The data section is accessible only from the protected partition. For the sake of simplicity, code in protected memory cannot read the unprotected one; as discussed in Section 2.2 below. Based on the location of the program counter, instructions that violate the access control policy are ruled out. The table below summarises the access control model enforced by the protection mechanism.

<table>
<thead>
<tr>
<th>From \ To</th>
<th>Protected</th>
<th>Code</th>
<th>Data</th>
<th>Unprotected</th>
</tr>
</thead>
<tbody>
<tr>
<td>Protected</td>
<td>r x</td>
<td>r w</td>
<td>w x</td>
<td></td>
</tr>
<tr>
<td>Unprotected</td>
<td>r x</td>
<td></td>
<td></td>
<td>r w x</td>
</tr>
</tbody>
</table>

2.2 From Naïve to Fully Abstract Trace Semantics

Following is the notation used in all code examples throughout this paper. Every instruction in the code is preceded by the address where it is located. Call the left program \( P_L \) and the right one \( P_R \). When \( P_L \) and \( P_R \) are executed, they are placed in a protected memory partition spanning from address 100 to 200, with a single entry point at address 100.

**Example 1.** Both \( P_L \) and \( P_R \) assign to \( r_0 \) the result of \( r_0 - r_1 \) (line 1). If the result of the operation is not less than 0 (line 3), they respectively write the contents of \( r_2 \) and \( r_2 \) at the unprotected address 10 (lines 4,5) and call a function whose address is stored in \( r_2 \) (line 6). Otherwise, they assign different values to \( r_1 \) (line 7) and return 0 (lines 8,9).

In order to describe the behaviour of the protected code of Example 1, this paper defines a traces semantics for it. The trace semantics gives a simple characterisation of the behaviour of that code as a set of sequences of labels. In this paper, trace semantics are devised to capture the execution of a protected program, which is a program allocated in the protected memory partition. The execution of a program is the evaluation of a sequence of instructions; in order to abstract over instructions, not all of them generate a label in a trace. Labels are generated only by call and ret instructions. If they are executed by unprotected code, they are named calls and returnbacks, if they are executed by protected code they are named callbacks and returns [2, 8]. Following is the syntax of labels.

\[
L ::= a \cdot \tau \cdot a ::= g ? | g ! \quad g ::= \text{call } p(\overline{\tau}) \mid \text{ret } v
\]

A label \( L \) can be an observable action \( a \) or a non-observable action \( \tau \). Decorations ? and ! indicate the direction of the observable action: from unprotected to protected code (?) or vice-versa (!). Address \( p \) is an address in memory, \( \overline{\tau} \) is a list of the contents of all registers and \( v \) indicates the contents of register \( r_0 \).

Informally, a trace semantics is fully abstract when its labels capture all that is being communicated between the protected and the unprotected code. The trace semantics hinted at above is not fully abstract due to a number of subtleties, as highlighted by the following traces for the code of Example 1. Omitted details are indicated using . . .: use \( \cdot \) to separate actions of the same trace.

\[
\overline{\tau} = \text{call } 100(1,2,\ldots)\cdot \text{ret } 0!
\]

\[
\overline{\tau} = \text{call } 100(2,1,40,\ldots)\cdot \text{call } 40(\ldots)!
\]

Traces \( \overline{\tau} \) and \( \overline{\tau} \) are generated by both \( P_L \) and \( P_R \). Trace \( \overline{\tau} \) does not capture the different value stored in \( r_1 \) (line 7), which, even if it is not the returned value of the function, still constitutes an observable difference between \( P_L \) and \( P_R \). Trace \( \overline{\tau} \) does not capture the different value written at address 10 (line 5), which also constitutes an observable difference between \( P_L \) and \( P_R \). Since \( \overline{\tau} \) and \( \overline{\tau} \) do not capture the observable differences between \( P_L \) and \( P_R \), the trace semantics fails to be fully abstract.

As Curien stated [3], two ways to achieve full abstraction for a trace semantics exist. The first is to modify the labels so that they capture more precisely what is communicated between protected and unprotected code. In this case, labels should capture the values of all registers and what protected code writes in unprotected memory. The second is to change the operational semantics, to restrict what is communicated to what is captured by the labels. This is achieved by restricting the ways in which communication is performed, e.g. by preventing writeouts. Both approaches are presented in Section 3; they rely on an assembly language and on its operational semantics which are formalised in Section 3.

Readouts. As mentioned in Section 2.1, protected memory cannot read the unprotected one. This is enforced by some protection mechanisms [10] while it is allowed yet discouraged by others [2, 14, 15, 19]. We briefly discuss the implications of read-outs, to keep the rest of the paper simple.

Consider two programs that read-out values at two different locations, disregard those values and return 0. They do not present an observable difference, but if read-outs generated a visible action, the two programs would have different traces. This would break full abstraction so read-outs generate a \( \tau \) action.

Now consider two programs that read-out values at two different locations and return those values. The read values must be known in order to determine the traces of the two programs, so the traces become parametrised on the read values. This can be achieved in different ways, to keep the presentation of this paper simple, a more thorough discussion of the subject is left for future work.

3. Assembly Language Formalisation

This section formalises the syntax and operational semantics of an assembly language, followed by the definition of contextual equivalence.

3.1 Syntax

The language is run on an architecture that models a Von Neumann machine consisting of a program counter \( p \), a register file \( r \), a flags register \( f \) and memory space \( m \). The program counter indicates
the address of the instruction that is executed. The register file contains 12 general purpose registers R0 to R11 and a stack pointer register SP, which contains the address of the top of the current call stack. The flags register contains a zero flag ZF and a sign flag SF, which are set or cleared by arithmetic instructions and are used by branching instructions. For the sake of simplicity, assume the architecture targeted by the language works with ℓ-bit-long words, where ℓ is a power of 2. This allows the formalisation presented to scale to architectures with words of different sizes.

Fig. 1. Elements of the assembly language formalisation.

Fig. 1 presents elements of the formalisation. Words w are either instructions i or values v. Values v are elements of the set \( \mathcal{V} \), they are sequences of bits 0 or 1 of length ℓ that are different from the encoding of instructions. The explanation of why values are encoded in this way is delayed to Example 3 in Section 4.1. Instructions i are elements of the set I and define the programming language executed on the architecture. (Fig. 2). The empty word 0

\[
\begin{align*}
\text{movi } r_4 & \ r_4, \quad \text{Load the word from the memory address in register } r_4 \ \\
\text{movs } r_4 & \ r_4, \quad \text{Store the contents of register } r_4 \ \\
\text{movi } r_4 & \ k, \quad \text{Load the constant value } k \ \\
\text{add } r_4 & \ r_4, \quad \text{Write } r_4 + r_4 \mod 2^\ell \ \\
\text{sub } r_4 & \ r_4, \quad \text{Write } r_4 \mod 2^\ell \ \\
\text{cmp } r_1 & \ r_2, \quad \text{Calculate } r_1 - r_2 \ \\
\text{jmp } r_1 & \ r_1, \quad \text{Jump to the address located in register } r_1 \ \\
\text{je } r_1 & \ r_1, \quad \text{If the ZF flag is set, jump to the address in register } r_1 \ \\
\text{call } r_1 & \ r_1, \quad \text{Push the value of the program counter +1 onto the stack and jump to the address in register } r_1 \ \\
\text{ret} & \ r_0, \quad \text{Pop a value from the stack and jump to the popped location} \ \\
\text{halt} & \ r_0, \quad \text{Stop the execution with the result in register } r_0.
\end{align*}
\]

Fig. 2. Instruction set I. is a sequence of 0’s whose length is based on the architecture being considered. Addresses a are natural numbers, ranging from 0 to \( 2^\ell - 1 \). Memories m are maps from addresses to words. Memory access, denoted as \( m(a) \), is defined as follows: \( m(a) = w \) if \( a \rightarrow w \in m \); it is undefined otherwise. Define the domain of a memory as \( \text{dom}(m) = \{ a \mid a \rightarrow w \in m \} \). If two memories \( m \) and \( m' \) have disjoint domains, they can be merged in another memory. Formally, if \( \text{dom}(m) \cap \text{dom}(m') = \emptyset \), then \( m + m' = \{ a \rightarrow w \mid a \rightarrow w \in m \text{ or } a \rightarrow w \in m' \} \). Memory descriptors \( s \) are quadruples: \( (a_h, n_c, n_d, n) \) that formalise the concepts of Section 2.1: \( a_h \) is the address where the protected memory partition starts, \( n_c \) and \( n_d \) are the sizes (in number of addresses) of the code and data section respectively and \( n \) is the number of entry points. Entry points are allocated starting from the base address \( a_b \). Each entry point is \( N_c \) words long. Assume the entry points do not overflow the protected code section, thus the constraint \( n - N_c < n_c \) holds for all memory descriptors. Programs \( P \) are pairs of a memory \( m \) and a memory descriptor \( s \).

3.2 Operational Semantics

Before introducing the semantics, a number of auxiliary notions are defined.

Fig. 3 defines the access control enforcement rules informally presented in Section 2.1. Read judgments \( s \vdash \text{predicate}(a, b, \cdots) \) as: “according to \( s \), predicate holds for addresses \( a, b, \ldots \).”

Fig. 3. Access control enforcement rules. Assume \( s \equiv (a_h, n_c, n_d, n) \), which return the protected and unprotected parts of a memory \( m \) according to the descriptor \( s \), respectively as: \( m_{\text{sec}}(m, s) = \{ a \rightarrow w \mid a \rightarrow w \in m, s \vdash \text{protected}(a) \} \) and \( m_{\text{ext}}(m, s) = \{ a \rightarrow w \mid a \rightarrow w \in m, s \vdash \text{unprotected}(a) \} \).

In the semantics there are two call stacks, one for the protected code, called the secure stack, and one for the unprotected code, called the insecure stack. Each stack is preceded by a word containing the location of the current head of the stack, let \( S_{\text{sec}} \) and \( S_{\text{ext}} \) indicate the address of the secure and insecure stack respectively. When the current stack is changed in the semantics, the stack pointer register SP is initialised to the right address using \( S_{\text{sec}} \) and \( S_{\text{ext}} \). Given a memory descriptor \( s = (a_h, n_c, n_d, n) \), the secure stack starts at the beginning of the protected data section and the insecure stack starts at the end of the protected memory partition. Thus \( S_{\text{sec}} = (a_h + n_c) \) and, initially, \( S_{\text{sec}} \mapsto (a_h + n_c + 1) \); analogously, \( S_{\text{ext}} = (a_h + n_c + n_d) \) and, initially, \( S_{\text{ext}} \mapsto \).
(a_i + n_c + n_d + 1). Call and return instructions are responsible of setting the SP register to the correct address when crossing boundaries between protected and unprotected memory. The value of the program counter is pushed onto the stack by a call instruction (the stack grows down), while a ret instruction pops one address from the top of the stack and jumps to that location. Updating the stack pointer SP is performed by the auxiliary function setStack (Fig. 4). In the rules, notation m[a → w] indicates that memory m is updated to a new one that is equal to m except that the value stored at address a is w. Notation r[R → w] indicates that the register file r is updated to a new one that is equal to r except that the value stored in register R is w. Notation r(R) indicates the value contained in register R in register file r. Given a jump between addresses p and p', the stack switch rules produce a new register file r' and a new memory m' based on old ones r and m. The memory is updated to store the top of the current stack, located in SP, in the address storing the top of the current stack: SPsec or SPext. When the stack is changed, the register file is updated to initialise SP to the top of the right stack: the address stored at SPsec or SPext.

The operational semantics is a small step semantics that describes how each instruction of the language transforms an execution state into a new one. Thus, the operational semantics handles programs in the whole memory: both the protected and unprotected partitions.

**Definition 1** (Execution state). An execution state, denoted as \( \Omega \), is a quintuple \( \Omega = (p, r, f, m, s) \), where p is a program counter, r is a register file, f is a flag register, m is a memory and s is a memory descriptor.

Given \( \Omega = (p, r, f, m, s) \), let \( \{\Omega\} \) be the state \( (p, r, f, m_{sec}(m, s), s) \) and \( \{\Omega\} \) be the state \( (p, r, f, m_{ext}(m, s), s) \). Relations \( \rightarrow \subseteq \{\Omega\} \times \{\Omega\} \) and \( \rightarrow' \subseteq \{\Omega\} \times \{\Omega\} \) describe the evaluation of instructions that only affect the protected and unprotected parts of memory respectively. Fig. 5 presents the rules for \( \rightarrow \), rules for \( \rightarrow' \) are obtained by replacing an intJump assumption with an extJump one. Let \( m(p) = \text{inst} \) denote that \( \text{inst} \) is the word allocated in \( m(p) \), where \( \text{inst} \in I \). Note that the program counter is set to \(-1\) whenever the halt instruction is encountered, in order to capture termination. This way, no progress can be made, as \( m(-1) \) does not return a valid instruction: the program is in a stuck state.

**Definition 2** (Stuck state). A state \( \Omega = (p, r, f, m, s) \) is stuck, denoted as \( \Omega^* \), when the program counter does not point to a valid instruction: \( m(p) \notin I \).

The operational semantics of the language is a binary relation over states \( \rightarrow \subseteq \{\Omega\} \times \{\Omega\} \) defined by the rules of Fig. 6. Rules

$$
\begin{align*}
\text{Eval-protected:} & \\
\{\Omega\} & \rightarrow \{\Omega'\} & \{\Omega\} & \rightarrow \{\Omega'\} \\
\text{Eval-unprotected:} & \\
\{\Omega\} & \rightarrow \{\Omega'\} & \{\Omega\} & \rightarrow \{\Omega'\} \\
\text{Eval-moves-out:} & \\
\{\Omega\} & \rightarrow \{\Omega'\} & \{\Omega\} & \rightarrow \{\Omega'\} \\
\text{Eval-call:} & \\
\{\Omega\} & \rightarrow \{\Omega'\} & \{\Omega\} & \rightarrow \{\Omega'\} \\
\text{Eval-returnback:} & \\
\{\Omega\} & \rightarrow \{\Omega'\} & \{\Omega\} & \rightarrow \{\Omega'\} \\
\text{Eval-return:} & \\
\{\Omega\} & \rightarrow \{\Omega'\} & \{\Omega\} & \rightarrow \{\Omega'\} \\
\end{align*}
$$

**Figure 6.** Operational semantics of whole programs. \( \Delta_{ib} \) is the address of the returnback entry point.

Eval-callback and Eval-returnback ensure that the address to be followed after a callback is stored in the secure stack and that the address returnback entry point \( \Delta_{ib} \) is pushed on the insecure stack. Thus the unprotected code always jumps to the returnback entry point when returning from a callback. Code located at the returnback entry point must contain a ret instruction in order to correctly resume the execution. Rule Eval-moves-out ensures that the values that protected programs write in unprotected memory are not instructions.

The transitive closure of \( \rightarrow \) is indicated with \( \rightarrow^* \). A state \( \Omega \) performing \( n \) reduction steps is indicated as \( \Omega \rightarrow^n \Omega' \). The evaluation of program \( P \) is a sequence of steps that takes the initial state of \( P \) to another state.

**Definition 3** (Initial state). The initial state of a program \((m, s)\), denoted as \( \Omega_0(m, s) \), is the state \((p_0, r_0, f_0, m, s)\), where \( s = (a_i, n_c, n_d, n) \), \( p_0 = (a_i + n_c + n_d + 2) \), \( r_0 = [SP \rightarrow m(\text{SPsec})] \), \( r_1 = i_{0} \rightarrow i_{0.11} \), and \( f_0 = [zF \rightarrow 0; SF \rightarrow 0] \).

The evaluation of \( P \) terminates if \( \Omega_0(P) \rightarrow \Omega^* \); the result of the computation is stored in \( r_0 \). If the evaluation of program \( P \) does not terminate, \( P \) diverges. A program \( P \) diverges, denoted as \( P^\dagger \), if it executes an unbounded number of reduction steps. Formally: \( P^\dagger \) if \( \forall n \in \mathbb{N}, \exists \Omega_n \) \( \Omega_0(P) \rightarrow (\Omega_0 \rightarrow^n \Omega_n) \).

### 3.3 Contextual Equivalence

Contextual equivalence relates two programs that cannot be distinguished by any third program interacting with them [16]. This notion relies on the concept of contexts, which is introduced before presenting the equivalence itself.
Since we consider programs $P$ that are placed in protected memory and interact with arbitrary unprotected code, contexts model that unprotected code. Thus for any descriptor $s$, contexts $M$ are partial memories with a hole: $M = m[\cdot]$, where all addresses of $M$ are unprotected. Formally, given $s$, $\forall a \in \text{dom}(M), s \vdash \text{unprotected}(a)$. The hole models the possibility to combine a program $P$ with the memory $M$ iff they are compatible, denoted as $P \equiv M$, thus if the memories of $P$ and $M$ have disjoint domains. Let $\text{dom}(M) = \text{dom}(m)$ if $M = m[\cdot]$; formally, $P \equiv M$ if $P = (m', s)$ and $\text{dom}(m') \cap \text{dom}(M) = \emptyset$. $P$ and $M$ are compatible, the hole of $M$ can be filled with $P$ in order to model interaction between $P$ and $M$. Formally, if $P \equiv M$ then $\text{dom}(m') = \text{dom}(m)$.

Programs $P_1$ and $P_2$ are contextually equivalent, denoted as $P_1 \simeq P_2$, when, for all contexts they interact with, $P_1$ diverges if and only if $P_2$ also diverges.

**Definition 4** (Contextual equivalence). $P_1 \simeq P_2$ if $\forall M, P_1 \equiv M \wedge M[P_1] \Rightarrow P_2 \equiv M \wedge M[P_2] \Rightarrow$. An implication of this definition is that for $P_1$ and $P_2$ to be contextually equivalent they must have the same memory descriptor. For the sake of simplicity we will always assume the compatibility of a program and the context it is plugged in, shortening the above definition to: $P_1 \simeq P_2$ if $\forall M, M[P_1] \Rightarrow M[P_2] \Rightarrow$.

**Example 2** (Contextually equivalent programs). The following $P_1$ and $P_2$ write the values of $r_1$ and $r_2$ respectively at the protected address 150 (line 2) and then return 0 (line 3). Recall that the protected memory partition spans from address 100 to 200, with one entry point at address 100.

\[
\begin{align*}
\text{movi } r_1 & \ 150 \\
\text{movi } r_2 & \ 150 \\
\text{ret} & \ 0
\end{align*}
\]

The only difference between $P_L$ and $P_R$ is in the value stored at address 150. However, an unprotected program cannot read that value. Since value does not affect the computation of $P_L$ or $P_R$ or the unprotected code, $P_L$ and $P_R$ are contextually equivalent.

Having defined the assembly language and its operational semantics, the paper introduces the two different trace semantics. Trace equivalence is also introduced, it will be proven the same as contextual equivalence in Section 5.

4. **Trace Semantics**

This section gives two different trace semantics for protected programs. The differences stem from the different ways to achieve full abstraction pointed out by Curien [3]. The first, $Tr^L$, possesses more expressive labels, while the second, $Tr^R$, relies on changes to the semantics of programs. Both are proven to be fully abstract w.r.t. the appropriate operational semantics in Section 5. Finally, this section defines when two programs are trace equivalent.

As for the operational semantics, a notion of execution states is defined to: $P_1 \simeq P_2$ if $\forall M, M[P_1] \Rightarrow M[P_2] \Rightarrow$. The only difference between $P_L$ and $P_R$ is in the value stored at address 150. However, an unprotected program cannot read that value. Since value does not affect the computation of $P_L$ or $P_R$ or the unprotected code, $P_L$ and $P_R$ are contextually equivalent.
DEFINITION 5 (Initial state for traces). The initial state for traces of a program \((m, s)\), denoted as \(\Theta_0(m, s)\), is the state (unknown, \(m, s\)).

4.1 Tr\(^1\): Expressive Labels

Below are the labels exhibited by the Tr\(^1\) traces semantics.

\(\Lambda ::= \alpha \mid \tau \mid \gamma \mid \delta \mid \alpha \equiv \sqrt{\gamma} \equiv \delta \)

A label \(\Lambda\) can be either an observable action \(\alpha\) or a non-observable action \(\tau\). Action \(\tau\) indicates the unobservable action occurred in protected memory. Observable actions include a tick \(\sqrt{\gamma}\) indicating that the evaluation has terminated. Additionally, observable actions are function calls or returns to a certain address \(a\), combined with the registers \(r\) and flags \(f\). Registers and flags are in the labels as they convey information on the behaviour of programs. Observable, \(\tau\)-decorated actions can be prefixed by a number of write-outs of a value \(v\) to address \(a\). They are only \(\tau\)-decorated since the protection mechanism forbids writes from untrusted to protected memory, as well as read-outs.

Fig. 7 presents the rules defining the relation \(\Theta \equiv_\Theta \Theta'\), which describe when a state \(\Theta\) generates trace \(\pi\) and results in state \(\Theta'\). The Tr\(^1\) traces of a program \(P\) is defined as follows: Tr\(^1\)(\(P\)) = \{\pi \mid \exists \Theta. \Theta_0(P) \equiv_\Theta \Theta'\}.

Example 3 below can now explain why protected code cannot write instructions in unprotected memory.

Example 3 (Write-outs are not instructions). In the following example, protected programs can write instructions in unprotected memory. Recall that the protected memory partition spans from address 100 to 200, which one entry point at address 100.

The following \(P_1\) and \(P_2\) set \(r_0\) to 20 and 10 respectively (line 1), then write the instruction \(\text{jmp } r_0\) at address 20 and 10 respectively (line 2). Finally, they jump to the instruction they just wrote (line 3).

```
1 100 movi r0 20
2 101 movs r0 "jmp r0"
3 102 call r0
```

When \(r_0\) is set to 20 (resp. 10), the instruction \(\text{jmp } r_0\) written at address 20 (resp. 10) will diverge. Thus, \(P_1\) and \(P_2\) are contextually equivalent, since no context can differentiate between them. However, \(P_1\) and \(P_2\) are trace inequivalent, since the following is a trace of \(P_1\) and not of \(P_2\).

```
(...)
call 100?
 WRITE(20, "jmp r0")
 (...)
call 20!
```

Since we are interested in programs that do not extend the functionality of code in unprotected memory (and since these programs are the majority of assembly programs), protected code cannot write instructions in unprotected memory. Thus the programs of this example are not valid.

4.2 Tr\(^5\): Changes to the Semantics

The Tr\(^5\) semantics presents different labels from Tr\(^1\), which are driven by changes to the operational semantics of protected programs. These changes are formalised in Fig. 8: smaller changes to other rules are omitted. When the program counter jumps between the protected and the unprotected memory partitions, or vice-versa, flags are set to 0; in case of a return, all registers but \(R_0\) are also set to 0. Additionally, write-outs are prohibited.

Following are the labels of Tr\(^5\); we drop the greek letter notation and use latin letters in order to immediately differentiate between Tr\(^1\) and Tr\(^5\). Flags do not appear in traces because they are always set to 0, as are all registers but \(R_0\) in case of a return. Write-outs are prohibited, so there are no labels that capture them.

\(L ::= \alpha \mid \tau \mid \gamma \mid \delta \mid \alpha \equiv \sqrt{\gamma} \equiv \delta \mid \alpha \equiv \sqrt{\gamma} \equiv \delta \)

Rules defining the relation \(\Theta \equiv_\Theta \Theta'\) for Tr\(^5\) are omitted. They are a subset of the ones defined in Fig. 7, with a slight modification to the syntax of generated labels. The Tr\(^5\) traces of a program \(P\) is defined as follows: Tr\(^5\)(\(P\)) = \{\pi \mid \exists \Theta. \Theta_0(P) \equiv_\Theta \Theta'\}.

4.3 Trace Equivalence

The notion of trace equivalence is presented generically for both trace semantics under consideration. Use Tr\((P)\) to indicate the traces of a program \(P\), be it expressed through Tr\(^1\) or Tr\(^5\). Two programs \(P_1\) and \(P_2\) are trace-equivalent, denoted as \(P_1 \equiv_T P_2\), if their traces are the same and they have the same memory descriptor.

DEFINITION 6 (Trace equivalence). \(P_1 \equiv_T P_2\) if Tr\((P_1)\) = Tr\((P_2)\) and \(P_1 = (m_1, s)\) and \(P_2 = (m_2, s)\).

Following are two examples of trace equivalent and inequivalent programs. For the sake of simplicity, we use the Tr\(^1\) semantics and
\(s \vdash \text{writeAllowed}(p,a)\) (Stack-out-to-in') \[\vdash \text{entryJump}(p,p') \quad m' = m\) \(\text{SP}_{\text{ext}} \mapsto r\) \(\text{SP}\) \(\text{SP}_{\text{sec}} \mapsto r\) \(\text{SP}\) \(\vdash \text{protected}(p)\) \(\vdash \text{protected}(a)\) \(\vdash \text{setStack} \leftarrow p, r', f', m', m'\) \(\text{Eval-return}\) \(m(p) = (\text{ret})\) \(p' = m(r(SP))\) \(\vdash \text{exitJump}(p,p')\) \(r' = r(SP) \mapsto r\) \(\text{P} - r'\) \(\text{P} - f\) \(\text{P} - m\) \(\text{P} - s\) \(\vdash \text{setStack} \leftarrow p, r', f', m', m'\) \(\vdash \text{writeAllowed}(p,a)\) \(\vdash \text{writeAllowed}(a)\) \(\vdash \text{writeAllowed}(p)\) \(\vdash \text{writeAllowed}(a)\) \(\vdash \text{writeAllowed}(p)\) \(\vdash \text{writeAllowed}(a)\) \(\vdash \text{entryJump}(p,p') \quad m' = m(S_P) \mapsto r(S_P)\) \(\vdash \text{protected}(r(S_P))\) \(\vdash \text{protected}(r(S_P))\) \(\vdash \text{setStack} \leftarrow p, r', f', m', m'\) \(\vdash \text{writeAllowed}(p,a)\) \(\vdash \text{writeAllowed}(a)\) \(\vdash \text{writeAllowed}(p)\) \(\vdash \text{writeAllowed}(a)\) \(\vdash \text{writeAllowed}(p)\) \(\vdash \text{writeAllowed}(a)\).

Figure 8. Changes to auxiliary functions and to the operational semantics for \(\text{Tr}_2\).

indicate arbitrary values for registers and flags with notation \((r, f)\) and an unprotected address with \(p\).

**Example 4** (Traces of previous examples). The code of Example 1 is not trace equivalent; the following trace is generated by \(P_1\), but not by \(P_2\):

\[(r; f) \text{call } 100'! \cdot \ldots ; 41; f \text{ ret } p! \cdot \sqrt{\}.

The code of Example 2 is trace equivalent since the trace semantics of both \(P_1\) and \(P_2\) is a set whose sequences are concatenations of the following trace, each element of the sequence being parametrised on \(r\) and \(f\):

\[(r; f) \text{ call } 100'! \cdot \left(r[R_0 \mapsto 0]; f \text{ ret } p! \right).

**5. Full Abstraction of the Trace Semantics**

This section presents the general proof strategy through which both \(\text{Tr}_1\) and \(\text{Tr}_2\) are proven to be fully abstract w.r.t. the corresponding operational semantics and discusses the benefits of full abstraction of the trace semantics.

A fully abstract trace semantics is both sound and complete with respect to the operational semantics. Soundness states that the trace semantics captures all details of the operational semantics. Thus, for all contexts, two trace equivalent programs cannot be told apart. The universal quantification over contexts makes this more difficult to prove. Completeness states that abstractions of the trace semantics are preserved in the operational semantics. This is achieved by proving the contrapositive of the completeness statement. For this, the proof consists of devising an algorithm that constructs a program, a “witness”, that tells two trace inequivalent programs apart [2, 7, 15]. Since the language is low-level and untyped, this is not particularly complicated.

Full abstraction of trace semantics is formally stated as: \(P_1 \simeq_T \implies P_2 \iff P_1 \simeq P_2\); its proof is split in two cases, one for each direction of the co-implication.

**Theorem 1** (Soundness). \(P_1 \simeq_T \implies P_2 \simeq P_2\) \(\iff P_1 \simeq P_2\).

Call the interface of a state it's registers, flags and unprotected memory. Two states \(\Omega_1\) and \(\Omega_2\) have the same interface, denoted as \(\Omega_1 \tilde{=} \Omega_2\), if they have the same registers, flags and unprotected memory. Formally, \(\Omega_1 \tilde{=} \Omega_2\) if \(\Omega_1 = (p_1, r_1, f_1, m_1, s_1)\) and \(\Omega_2 = (p_2, r_2, f_2, m_2, s_2)\) and \(m_1(m_1, s_1) = m_2(m_2, s_2)\). Given \(\Omega = (p, r, f, m, s)\), redefine \(\Theta\) to be state \(\Theta = (p, r, f, m_\text{ext}(m, s), s)\) if \(s \text{ protected}(p)\) and (unknown, \(m, s\)) otherwise.

The proof of Theorem 1 states that an unprotected program interacting with \(P_1\) cannot distinguish it from \(P_2\). This is because both programs offer the same interface to the unprotected program. So this proof depends on an interface-preservation lemma (Lemma 1) which must be proven for each trace semantics since it depends on the labels of each trace semantics. Lemma 1 enunciates that two states with the same interface still have the same interface after they perform the same observable action. Thus unprotected programs do not see differences, in terms of flags, registers and unprotected memory, between \(P_1\) and \(P_2\).

**Lemma 1** (Interface preservation after same observable action).

If \(\Theta_1 \implies (\alpha \rightarrow \Theta_2)\) and \(\Theta_1 = (\Omega_1)\) and \(\Omega_1 \rightarrow \Theta_2\) and \(\Theta_2 = (\Omega_2)\) and \(\Omega_2 \rightarrow \Theta_3\) and \(\Theta_3 = (\Omega_3)\) and \(\Omega_3 \rightarrow \Theta_4\) then \(\Theta_4 \tilde{=} \Omega_4\).

**Theorem 2** (Completeness). \(P_1 \simeq P_2 \implies P_1 \approx_T P_2\).

Completeness is equivalently stated as: \(P_1 \not\approx_T P_2 \implies P_1 \neq P_2\). This is proven by devising an algorithm that takes as input two different traces \(\pi_1\) and \(\pi_2\) and the two programs \(P_1\) and \(P_2\) generating them and outputs a program \(P\) that interacts with \(P_1\) and \(P_2\) and is able to differentiate between them. The two different traces are generated as follows. Since \(P_1 \neq P_2\), we have that \(\text{Tr}(P_1) \neq \text{Tr}(P_2)\), thus there exists a trace \(\pi\) that belongs to either only \(\text{Tr}(P_1)\) or only \(\text{Tr}(P_2)\). Assume wlog that \(\pi \in \text{Tr}(P_1)\). The trace \(\pi\) can be split in two parts \(\pi_1\) and \(\pi_2\) such that \(\pi = \pi_1 \pi_2\), and so that there exists a trace \(\pi' \in \text{Tr}(P_2)\) that can be split in two parts \(\pi_1'\) and \(\pi_2'\) such that \(\pi_1' = \pi_1 \pi_2'\) and \(\pi_2' \neq \pi_2\). Trace \(\pi_1\) always exists, it could be an empty trace, it could be composed by an empty \(\pi_1\) and, possibly, by an empty \(\pi_2\). The traces inputs for the algorithm are \(\pi_1 = \pi_1\pi_2\) and \(\pi_2 = \pi_1\pi_2\).

*Proof Sketch*. This sketch shows how to generate \(P\) so that it interacts with \(P_1\) or \(P_2\) and tells them apart. Let \(d\) be a simple program written at some address \(X: \text{movi } r_0 X; \text{ jmp } r_0\). Differentiation between \(P_1\) and \(P_2\) is done by halting in one case and executing \(d\) in the other.

Input traces \(\pi_1\) and \(\pi_2\) are sequences of actions where even-numbered ones are messages from \(P\) to either \(P_1\) or \(P_2\) and odd-numbered ones are messages from either \(P_1\) or \(P_2\) to \(P\). By hypothesis there exists a different action in the traces, that action is at an odd-numbered index because it is generated by \(P_1\) or \(P_2\): \(P\) does not change so it generates the same actions. Thus there exists \(j\) for which \(\pi_1(j) \neq \pi_2(j)\). Once executed, \(P\) replicates actions from the traces until the \(j\)th, then it performs the differentiation.

Even-numbered actions in \(\pi_1\) and \(\pi_2\) are the same until the \(j\)th one. They are function calls or returns implemented in \(P\). To replicate them, \(P\) sets flags and registers as in the action and then performs a call to the right address or a return. This code is placed at the current code location, which is either the initial address \(p_0\) or the address indicated by a callback from \(P_1\) or \(P_2\).

Odd-numbered actions are carried out in \(P_1\) or \(P_2\). If these actions are the same, then no code is added to \(P\). If they are different, the differentiating code stated above is placed at the current code location. If both actions are callbacks, the current code location is updated to the address mentioned in the callback.

This general proof strategy is presented for both Theorem 1 and Theorem 2. The generalised approach is tailored to each semantics only in the relatively simple proof of Lemma 1. Since Theorem 1 and 2 hold for both \(\text{Tr}_1\) and \(\text{Tr}_2\), both semantics are fully abstract w.r.t. the corresponding operational semantics.
5.1 Benefits of Fully Abstract Trace Semantics

Let us now highlight two of the benefits of fully abstract trace semantics.

Without the trace semantics, the capabilities of an attacker towards protected code are expressed as sequences of instructions, which can be difficult to reason about. With the trace semantics, the capabilities of that attacker are captured via the simple notion of traces. The full abstraction property ensures that traces express precisely all the capabilities of an attacker, without leaving any out.

The second benefit is in the field of secure compilation. Given two programs $C_1$ and $C_2$ in a language and indicate their compilation to an assembly language with $C_1^a$ and $C_2^a$ respectively. Proving the compilation scheme secure is formally stated as $C_1 \simeq C_2 \iff C_1^a \simeq C_2^a$ [1]. The more complex direction of this proof is $C_1 \simeq C_2 \Rightarrow C_1^a \simeq C_2^a$, but it can be simplified by adopting a fully abstract trace semantics for the assembly language. That proof is simpler in this way than by adopting the notion of contextual equivalence and performing induction on all possible low-level contexts. The complex direction becomes $C_1 \simeq C_2 \Rightarrow C_1^a \simeq C_2^a$, whose contrapositive is $C_1^a \not\simeq C_2^a \Rightarrow C_1 \not\simeq C_2$. This can be proven similarly to Theorem 2, as Agten et al. [2] and Patrignani et al. [15] did. Both implicitly use the second trace semantics where the compiler adds code that enforces the changes to the operational semantics.

6. Related Work

Full abstraction has been largely studied as a way to state the correctness of a denotational semantics with respect to an operational one [16]. This technique has been adopted for different programming languages paradigms, such as the $\lambda$-calculus [12] and the $\pi$-calculus [6]. For example, Ghica and Tzvelekos [5] provided a fully abstract trace semantics, with regards to a game operational semantics, of a C-like language that, unlike this work, does not present a protection mechanism.

Fine grained, program counter-based memory access control mechanisms have been implemented in several software [10, 18–20] and hardware forms [4, 14] and recently by Intel in the SGX processor [11]. From the theoretical point of view, assembly languages extended with these protection mechanisms have been recently studied as target languages for secure compilation schemes [2, 15]. The language and trace semantics of this paper are inspired by those works. Besides elaborating on details of their formalisation, this work provides the first proof of full abstraction of the trace semantics.

A different research area studies logics for assembly languages: Hoare logics [17] or separation logics [9]. Jensen et al. [9] present a summary of the most recent advances in the latter. That research area focusses on providing reasoning facilities for assembly code, while this paper focusses on reasoning on the security of assembly code.

A different protection mechanism that could be employed at the assembly level is a typed assembly language [13]. To the best of the authors’ knowledge, no fully abstract trace semantics has been provided for such languages.

7. Conclusion and Future Work

This paper studied the characterisation of the behaviour of isolated programs. To this extent, it formalised an assembly language extended with a fine grained, program counter-based memory access control mechanism. Then, it provided two different trace semantics for that language and a general proof strategy where both semantics are proven to be fully abstract. These semantics model the capabilities of attackers that inject malicious assembly code and they simplify proving secure compilation to the assembly language.

Providing a fully abstract trace semantics for a machine with multiple instances of an isolation mechanism or with multiple cores seem natural extensions to this work. The latter seems crucial in order to provide a secure compiler for concurrent programs to machines using the protection mechanism presented.

Acknowledgements. The authors would like to thank Tarmo Uustalu and the anonymous reviewers for useful feedback on an earlier draft.

References

A. Algorithm description

Let us now give a description of the algorithm mentioned for Theorem 2. We drop the differentiation via \( \text{div} \) and \( \text{halt} \) and introduce differentiation via different values \((1 \text{ and } 2\) respectively) stored in \( r_0 \), followed by \( \text{halt} \). The two techniques are equivalent, the second one leading to an more easily implementable algorithm.

Moreover we assume that there is always enough memory to write the algorithm.

In the output code \( P \), there must be four functions in order to set the flags to the right combinations. These function are of the form:

- \( \text{store } r_1; \text{store } r_2; \)
- \( \text{set } r_1; \text{set } r_2 \) to the right values that set the flag in the wanted combination (e.g. for \( SF = 0;ZF = 1 \), set \( r_1 = 1 \) and \( r_2 = 1 \));
- \( \text{exec } \text{cmp } r_1; r_2; \)
- \( \text{restore } r_1; \text{restore } r_2 \) to the corresponding previous values;
- \( \text{ret} \).

The algorithm keeps track of where to write instructions in \( P \) with a stack: the "current address stack" \( c \). Initially, the top of \( c \) is set to \( p_0 \).

The algorithm scans the traces \( T_1 \) and \( T_2 \). By construction, each even-numbered label is \( = \)-decorated; each odd-numbered label is \( \neq \)-decorated. The algorithm is split in two subroutines based on what kind of actions it is examining. Each subroutine analyses one action from each trace and then calls the other subroutine on the following actions until the differentiation is achieved; in that case the algorithm terminates.

\( = \)-decorated actions. These actions are generated by the unprotected code. The algorithm must output \( P \) such that \( P \) generates those traces. Thus, at location \( c \), the algorithm writes code depending on what action is being considered.

\((r; f)\text{call } p?\) Firstly, the algorithm writes a \text{call} to the function that sets the flags to \( f \). Then the top of \( c \) is incremented by \( 1 \). Then, all twelve registers are set to the values of \( r \), thus for all \( i = 0..11 \), the following instruction is written: \( \text{movi } r_i, r_i \). Eventually, if the value to be written in a register is larger than the constant allowed by \( \text{movi} \), an add instruction is used. Then the top of \( c \) is increased by \( 12 \). Then based on which register contains the value \( p \) that is where the call is directed, instruction \text{call } r_i \) is written. Then the top of \( c \) is incremented by \( 1 \).

\((r; f)\text{ret}?\) Firstly, the algorithm writes a \text{call} to the function that sets the flags to \( f \). Then the top of \( c \) is incremented by \( 1 \). Then, all twelve registers are set to the values of \( r \), thus for all \( i = 0..11 \), the following instruction is written: \( \text{movi } r_i, r_i \). Eventually, if the value to be written in a register is larger than the constant allowed by \( \text{movi} \), an add instruction is used. Then the top of \( c \) is increased by \( 12 \). Then based on which register contains the value \( p \) that is where the call is directed, instruction \text{call } r_i \) is written. Then the top of \( c \) is incremented by \( 1 \).

\( \neq \)-decorated actions. These actions are generated by protected code. If the labels are the same, then no code is added to \( P \).

\text{callbacks}. If both actions are of the form \((r; f)\text{call } p?\), then \( p \) is pushed on top of the stack \( c \).

\text{returns}. If both actions are of the form \((r; f)\text{ret}?\), then the top of the stack \( c \) is popped.

\text{writeouts}. The algorithm adds no code to \( P \).

\( \neq \)-decorated actions. These actions are generated by the differentiating code at address \( c \) in \( P \). Differences in the labels can be of the following type:

different length. Thus one label is \( \neq \) and the other one is \( \alpha \neq \neq \). In this case, given that \( \alpha \) is generated by program \( P_1 \), the algorithm writes \( \text{movi } r_0; i \) and \( \text{halt} \) at the address indicated by \( c \).

different actions. Assume wlog that \( \alpha_1 = (r; f)\text{ret}! \) and \( \alpha_2 = (r; f)\text{call } 10! \). Then the algorithm writes instructions \( \text{movi } r_0; 1 \); \( \text{halt} \) at \( c \) and instructions \( \text{movi } r_0; 2 \); \( \text{halt} \) at address \( 10 \).

Assume wlog that \( \alpha_1 = \text{write}(a; v)\.\delta \) and \( \alpha_2 = \delta \). Then, before the code written in \( P \) for \( \delta \), the following code is written to \( P \): \( \text{movi } r_0; a \); \( \text{movi } r_1; u \); \( \text{movs } r_0; r_1 \), where \( u \) is a different value from \( v \). At address \( 20 \), the algorithm writes the instructions for the following pseudo-code: read the value of \( a \), subtract \( v \), if the result is 0, then \( \text{movi } r_0; 1 \); \( \text{halt} \), otherwise \( \text{movi } r_0; 2 \); \( \text{halt} \).

different values in the same action. Assume wlog that \( \alpha_1 = (r; 0, 1)\text{ret}! \) and \( \alpha_2 = (r; 0, 0)\text{ret}! \). Then the differentiating code is the following: perform a jump (via \( j1 \) in this case) to an address \( \alpha \) in case the flag is 1. At address \( \alpha \), instructions \( \text{movi } r_0; 1 \); \( \text{halt} \) are written. After the jump, instructions \( \text{movi } r_0; 2 \); \( \text{halt} \) are written.

Assume wlog that \( \alpha_1 = (1; \ldots; f)\text{ret}! \) and \( \alpha_2 = (2; \ldots; f)\text{ret}! \). Then the differentiating code is the following: \( \text{movi } r_1; 1; \text{sub } r_2; r_1 \). Now the problem is reduced to different values in flags, so the previous approach can be used.

Assume wlog that \( \alpha_1 = (r; f)\text{call } 10! \) and \( \alpha_2 = (r; f)\text{call } 20! \). Then the algorithm writes instructions \( \text{movi } r_1; 1 \); \( \text{call } 10 \) and instructions \( \text{movi } r_0; 2 \); \( \text{halt} \) at address \( 20 \).

Assume wlog that \( \alpha_1 = \text{write}(a_1, v_1)\.\delta \) and \( \alpha_2 = \text{write}(a_2, v_2)\.\delta \). The same procedure stated in the last paragraph for the previous point is applied.

B. Proofs

We overload the hole-filling notation and allow a hole to be filled by a state \( \Omega = (p, r, f, m, s) \) as follows: \( M[\Omega] = (p, r, f, m + m', s) \), if \( (m, s) \subseteq M \).

Proof. Proof of Theorem 1.

By Definition 4 the thesis \( P_1 \simeq P_2 \) becomes \( \forall M. M[P_1] \iff M[P_2] \).

The proof is split in two cases, one for each side of the co-implication.

1. \( \Rightarrow \)
   - The thesis is \( \forall M. M[P_1] \iff M[P_2] \).
   - As seen in Section 3.3 the thesis is \( \forall M. M[\Omega_0(P_1)] \iff M[\Omega_0(P_2)] \).
   - Let \( \Omega_1 = M[\Omega_0(P_1)] \) and \( \Omega_2 = M[\Omega_0(P_2)] \).
   - The thesis is \( \forall M. \forall n \in N. \Omega_1 \rightarrow^n \Omega_1' \Rightarrow \forall m \in N. \Omega_2 \rightarrow^m \Omega_2' \).
   - The proof proceeds by induction on \( m \).

Base case: \( m = 0 \). Straightforward: \( \Omega_1 \rightarrow^0 \Omega_2 \).

Inductive case: \( m = h + 1 \). The thesis is: \( \Omega_2 \rightarrow^h \Omega_1' \rightarrow^h \Omega_2' \).

The inductive hypothesis (IH) is: \( \forall n \in N. M[\Omega_0(P_1)] \rightarrow^n \Omega_1' \rightarrow M[\Omega_0(P_2)] \rightarrow^n \Omega_2' \).

By IH we have that: \( \exists \Omega_1, \forall n \in N. \Omega_1 \rightarrow^n \Omega_1' \rightarrow^n \Omega_2' \).

Let \( \Omega_1' = (p_1, \ldots) \) and \( \Omega_2' = (p_2, \ldots) \).

There are two cases based on \( p_1 \) and \( p_2 \) and on the last executed action.

The last executed action is an action \( \alpha \) such that: \( M[\Omega_0(P_1)] \rightarrow^n \Omega_1' \) and \( \exists k \in N. \Omega_2' \rightarrow^k \Omega_2' \).
By hypothesis $P_1 \simeq_T P_2$, $M[\Omega_0(P_2)] \overset{\alpha}{\Rightarrow} \Omega_2$ and
\[\exists k \in \mathbb{N}. \Omega_2 \overset{h}{\Rightarrow} \Omega_2.\]
(a) $s_1 \vdash \text{protected}(p_1)$ and $s_2 \vdash \text{protected}(p_2)$ and $\alpha = \gamma'$. 

There are two cases: $\exists \alpha. \left[\Omega_1' \overset{\alpha}{\Rightarrow} \Omega_1'' \right]$ or $\exists \alpha. \left[\Omega_1' \overset{\alpha}{\Rightarrow} \Omega_1'' \right] \alpha \alpha \Rightarrow \left[\Omega_1'' \right]$. 

i. $\exists \alpha. \left[\Omega_1' \overset{\alpha}{\Rightarrow} \Omega_1'' \right]$. 

By hypothesis $P_1 \simeq_T P_2$, $\left[\Omega_1' \overset{\alpha}{\Rightarrow} \Omega_1'' \right]$. 

This, in conjunction with $\text{IH}$, implies the thesis $\Omega_2 \overset{h+1}{\Rightarrow} \Omega_2$. 

ii. $\exists \alpha. \left[\Omega_1' \overset{\alpha}{\Rightarrow} \Omega_1'' \right]$. 

There are two cases: either $\left[\Omega_1' \overset{\alpha}{\Rightarrow} \Omega_1'' \right]$ or $\left[\Omega_1' \overset{\alpha}{\Rightarrow} \Omega_1'' \right]$. 

A. $\left[\Omega_1' \overset{\alpha}{\Rightarrow} \Omega_1'' \right]$. 

This case cannot happen as it contradicts the assumption $\forall \alpha \in \mathbb{N}. M[\Omega_0(P_2)] \overset{\alpha}{\Rightarrow} \Omega_2'$. 

B. $\left[\Omega_1' \overset{\alpha}{\Rightarrow} \Omega_1'' \right]$. 

Let $\Omega_1' = (p_1, r_1, f_1, m_1 + m_1, s_1)$ and $\Omega_2' = (p_2, r_2, f_2, m_2 + m_2, s_2)$.

By hypothesis $P_1 \simeq_T P_2$, $\left[\Omega_1' \overset{\alpha}{\Rightarrow} \Omega_1'' \right]$. 

By inspecting rules for generating $\gamma'$ in traces (the only possible rule that applies in this case), we have that $m_1(p_1) = t_1 \in \mathbb{N}$ and $m_2(p_2) = t_2 \in \mathbb{N}$. 

The thesis holds because $\Omega_2 \overset{h}{\Rightarrow} \Omega_2'$ and $\Omega_2' \overset{\alpha}{\Rightarrow} \Omega_2''$. 

(b) $s_1 \vdash \text{unprotected}(p_1)$ and $s_2 \vdash \text{unprotected}(p_2)$ and $\alpha = \gamma'$ or there was no $\alpha$. 

By $\text{IH}$ $\forall \alpha \leq h, M[\Omega_0(P_2)] \overset{\alpha}{\Rightarrow} \Omega_2'$. 

By hypothesis $P_1 \simeq_T P_2$, $M[\Omega_0(P_2)] \overset{\alpha}{\Rightarrow} \Omega_2'$. 

Additionally, $M[\Omega_0(P_2)] \overset{\alpha}{\Rightarrow} \Omega_2'$. 

By Lemma 1, $\Omega_1' = (p', r', f', M' + m_1, s_1)$ and $\Omega_2' = (p', r', f', M' + m_2, s_2)$.

Additionally, $\Omega_1' \overset{h-1}{\Rightarrow} \Omega_1''$ and $\Omega_2' \overset{h-1}{\Rightarrow} \Omega_2''$. 

Since $M'$ is the same for both $P_1$ and $P_2$, the $(h-1)$-steps they perform in unprotected memory are the same for $\Omega_1'$ and $\Omega_2'$. 

Thus $\Omega_1' = (p', r', f', M' + m_1, s_1)$ and $\Omega_2' = (p', r', f', M' + m_2, s_2)$.

As stated in Section 3.3 $p \in \text{dom}(M')$ implies that $s_1 \vdash \text{unprotected}(p')$ and $s_2 \vdash \text{unprotected}(p')$.

By hypothesis $\forall \alpha \in \mathbb{N}. M[\Omega_0(P_1)] \overset{\alpha}{\Rightarrow} \Omega_2'$. 

We have that $\Omega_1' \overset{\alpha}{\Rightarrow} \Omega_1''$ and that $M''(p) \approx i \in \mathbb{N}$. 

This implies the thesis: $\Omega_2 \overset{h+1}{\Rightarrow} \Omega_2'$ since $\Omega_2 \overset{h}{\Rightarrow} \Omega_2''$. 

2. $\Leftarrow$ As in case 1, mutatis mutandis. 

Proof of Lemma 1 for $\text{Tr}^h$ (Implies the proof of the same lemma for $\text{Tr}^r$). 

Straightforward induction on $\pi$ that leads to a case analysis on $\alpha$. 

$\sqrt{\circ}$. Straightforward: the thesis is $\Omega_1 \cong \Omega_2$, which is among the hypotheses. 

$\pi$-decorated actions. These actions are done in the same unprotected memory. Moreover they are generated by the same instructions (eventually if they can be at different addresses): call for a label of the form $(r, f) \text{call } p$? and ret for a label of the form $(r, f) \text{ret}$. Neither of those instructions touch the memory, and they only modify registers and flags in the same way. 

Thus, also in this case, the thesis $\Omega_1 \cong \Omega_2$ holds. 

$\pi$-decorated actions. These are the subcases for this one, one for each kind of label that can be $\pi$-decorated. 

$(r, f) \text{call } p$! This label is generated by a call instruction which, in a single step, does not change registers (besides SP) nor flags, nor unprotected memory. As the value of SP is captured in $r$, and $r$ is the same for both $\Omega_1$ and $\Omega_2$, the interface does not change. Thus, the thesis $\Omega_1 \cong \Omega_2$ holds. 

$(r, f) \text{ret } p$! This label is generated by a ret instruction which, in a single step, does not change registers (besides SP) nor flags, nor unprotected memory. The same considerations made for SP before hold here, thus the thesis $\Omega_1 \cong \Omega_2$ holds. 

Additionally those labels can be preceded by an arbitrary number of write($a, v$)! This prefix is generated by a move instruction which, in a single step, does not change registers nor flags. 

Address $a$ is unprotected, so the unprotected memory is changed but in the same way. Thus, these prefixes do not affect the thesis, which holds. 

$\tau$ actions. Unobservable actions can be generated in protected memory: $\tau_i$. 

Changes to registers and flags made by those actions are captured in the following, observable action $\alpha$. Since $\alpha$ is the same for both $\Theta_1$ and $\Theta_2$, registers and flag are changed in the same way. 

Changes to unprotected memory that are done by the protected code are captured by $\alpha$, thus the protected memory is changed in the same way. 

The proof of Lemma 1 for $\text{Tr}^h$ is a subset of the proof for $\text{Tr}^r$ since the labels of $\text{Tr}^h$ are a subset of the labels of $\text{Tr}^r$. 

\[\square\]