Semantic Optimizations for Symbolic Execution Efficiency - Technical Report

## Executive Summary

This technical report presents detailed semantic optimization work in the RISC-V semantics framework ([riscv-semantics](https://github.com/runtimeverification/riscv-semantics)) aimed at improving symbolic execution efficiency. The core objective of these optimizations is to **enable effective symbolic execution of REVM's EVM opcodes (including ADD and other opcodes) after compilation to RISC-V**.

**Application Context**:
REVM (Rust Ethereum Virtual Machine) is a high-performance EVM implementation written in Rust, containing implementations of all EVM opcodes. When these opcode implementations are compiled to RISC-V architecture, traditional symbolic execution faces serious efficiency challenges, particularly:
- EVM's 256-bit integer operations require complex decomposition on 32-bit RISC-V
- Memory access patterns have type mismatches with RISC-V's byte-level access
- Expression explosion during symbolic execution prevents verification completion

**Three Core Challenges and Solutions**:

1. **Type System Inconsistency Challenge**: Execution state (`<regs>`) and storage state (`<mem>`) use different types, causing frequent type conversions in EVM opcode implementations
   - **Solution**: Through conversion loop elimination, operator pushing, and Word wrapper elimination optimizations, enabling EVM integer operations to perform symbolic execution efficiently on RISC-V

2. **SparseBytes Memory Model Challenge**: Traditional single-byte interfaces cannot efficiently support EVM opcodes' multi-byte memory access patterns
   - **Solution**: By introducing multi-byte interfaces and #WB buffering mechanism, enabling EVM opcodes' memory operations to maintain efficiency in symbolic execution

3. **Complex Data Type Decomposition Challenge**: EVM's 256-bit operations decompose on RISC-V causing expression explosion in symbolic execution
   - **Solution**: Through integer operation chain optimization, byte sequence reorganization, and equality judgment optimization techniques, enabling complex EVM operations to undergo manageable symbolic execution on RISC-V

**Technical Achievements**:
Through these semantic optimizations, we successfully implemented symbolic execution support for REVM's EVM opcodes on RISC-V, laying important groundwork for zkEVM-related formal verification efforts.

## Detailed Analysis of Core Challenges

### Challenge 1: Type System Inconsistency

#### 1.1 Root Cause Analysis

This challenge stems from two reasonable but conflicting design decisions in RISC-V semantics design, which become particularly pronounced when executing EVM opcodes:

**Reasons for Using Word/Int in Execution State**:
- **Instruction Implementation Simplification**: RISC-V instructions operate on fixed-width registers; using Word/Int types makes instruction semantic implementation more straightforward and natural
- **Computational Efficiency**: Integer operations have native support in the K framework, offering better performance

**Reasons for Using Bytes in Storage State**:
- **Flexible Data Access**: Supports arbitrary-sized data access (1 byte, 2 bytes, 4 bytes, etc.), conforming to RISC-V memory access requirements
- **Actual Storage State**: Accurately reflects RISC-V's byte-level storage state in physical memory

#### 1.2 Problem Description

This design trade-off leads to type system inconsistency, causing serious performance issues when executing EVM opcodes:

```k
configuration
  <riscv>
    <regs> $REGS:Map </regs>      // Int |-> Word
    <mem> $MEM:SparseBytes </mem> // SparseBytes (byte-level memory model)
  </riscv>
```

**Specific Impact on EVM Opcode Symbolic Execution**:
- **Nested Conversion Explosion**: EVM's ADD opcode requires 256-bit integer arithmetic; frequent Bytes ↔ Int conversions lead to ever-growing nested Int2Bytes(Bytes2Int(...)) structures, making ADD operation symbolic execution non-terminating
- **Operator Incompatibility**: Bitwise operations (AND, OR, XOR) in EVM opcodes cannot execute directly on Bytes2Int-wrapped expressions, causing uninterpreted expression bloat that severely impacts smart contract verification efficiency
- **Word Syntax Complications**: Word syntax wrapping in EVM opcode implementations further exacerbates expression complexity, hindering symbolic execution

#### 1.3 Solution

For the three technical issues above, we adopted the following layered solution strategy to ensure EVM opcodes can perform symbolic execution efficiently on RISC-V:

**Solving Problem 1: Nested Conversion Explosion**

The core approach is through **conversion loop elimination rules** to directly break Int2Bytes and Bytes2Int nesting loops, allowing EVM's ADD opcode and other arithmetic operations to maintain manageable expression complexity in symbolic execution:

```k
// Eliminate Int2Bytes(Bytes2Int(...)) pattern
rule [int2bytes-bytes2int]: Int2Bytes(LEN:Int, Bytes2Int(B:Bytes, LE, Unsigned), LE) => substrBytes(B, 0, LEN)
  requires 0 <=Int LEN andBool LEN <=Int lengthBytes(B)
  [simplification, preserves-definedness]

// Eliminate Bytes2Int(Int2Bytes(...)) pattern  
rule [bytes2int-int2bytes]: Bytes2Int(Int2Bytes(LEN, V, LE), LE, Unsigned) => V &Int (2 ^Int (LEN *Int 8) -Int 1)
  requires 0 <=Int LEN [simplification, preserves-definedness]
```

**Solution Strategy**: These two rules directly simplify "round-trip conversions" to simpler operations—the first rule simplifies conversion chains to byte extraction operations, the second simplifies conversion chains to bit mask operations. This prevents unlimited Term growth and ensures normal symbolic execution of EVM opcodes.

**Solving Problem 2: Operator Incompatibility**

The core approach is enabling **Bytes2Int-wrapped expressions to participate directly in Int operations**, allowing bitwise operations (AND, OR, XOR) in EVM opcodes to be handled directly in symbolic execution:

```k
// Support Bytes2Int(...) &Int operations
rule [bytes2int-and-255-noop]: Bytes2Int(X, LE, Unsigned) &Int 255 => Bytes2Int(X, LE, Unsigned)
  requires lengthBytes(X) <=Int 1 [simplification]

// Support Bytes2Int(...) <<Int operations
rule [int-lsh-bytes2int]: Bytes2Int(B, LE, Unsigned) <<Int Y => Bytes2Int(padLeftBytes(B, Y /Int 8, 0), LE, Unsigned)
  requires 0 <=Int Y andBool Y modInt 8 ==Int 0 [simplification, preserves-definedness]

// Support Bytes2Int(...) >>Int operations
rule [int-rsh-substrbytes]: Bytes2Int(substrBytes(B, I, J), LE, Unsigned) >>Int Y => Bytes2Int(substrBytes(B, I +Int Y /Int 8, J), LE, Unsigned)
  requires 0 <=Int Y andBool Y modInt 8 ==Int 0 andBool I +Int Y /Int 8 <=Int J [simplification, preserves-definedness]

// Support Bytes2Int(...) |Int operations
rule [int-or-bytes-1]: Bytes2Int(b"\x00" +Bytes X, LE, Unsigned) |Int Bytes2Int(Y, _, Unsigned) => Bytes2Int(Y +Bytes X, LE, Unsigned)
  requires lengthBytes(Y) ==Int 1 [simplification]
```

**Solution Strategy**: Instead of converting Bytes2Int to Int first then operating, we directly "push" operations into Bytes2Int internally, performing equivalent operations at the byte level. This avoids intermediate conversion steps, ensuring efficient symbolic execution of EVM opcode bitwise operations.

**Solving Problem 3: Word Syntax Complications**

The core approach is **eliminating Word wrapper layers** to simplify directly to Int operations, thus simplifying expression structure in EVM opcode implementations:

```k
// Eliminate unnecessary sign extensions
rule signExtend(Bytes2Int(B, LE, Unsigned), NumBits) => Bytes2Int(B, LE, Unsigned)
  requires NumBits ==Int lengthBytes(B) *Int 8 [simplification(45), preserves-definedness]
// Remove Word wrapper layers
syntax Word ::= "W" "(" Int ")"
```

**Solution Strategy**: When sign extension bits match the actual bits of byte data, directly eliminate signExtend wrapping to avoid adding extra syntactic layers. This rule, combined with other Word-to-Int simplifications, effectively reduces expression complexity.

### Challenge 2: SparseBytes Memory Model Complexity

#### 2.1 Root Cause Analysis

This challenge stems from adaptation difficulties between **SparseBytes memory model and EVM opcode memory operations**, particularly in supporting EVM's complex memory access patterns:

**SparseBytes Design Advantages**:
- **Compared to simple Bytes representation**: SparseBytes can save significant storage space for uninitialized and sparse memory regions, substantially improving both concrete and symbolic execution efficiency
- **Compared to Map representation**: Map may have memory overlap issues during insertion, requiring complex overlap checking during storage, while SparseBytes avoids this through structural design
- **Compared to Array representation**: Arrays are inconvenient for dynamic, sparse memory access, while SparseBytes is optimized for symbolic execution memory access patterns

**Early Design Limitations**:

In version 16b2d11, SparseBytes only provided simple single-byte interfaces:
```k
// Early design (commit 16b2d11 version)
syntax MaybeByte ::= Int | ".Byte"  
syntax MaybeByte ::= readByte(SparseBytes, Int) [function, total]
syntax SparseBytes ::= writeByte(SparseBytes, Int, Int) [function, total]
```

**Core Challenge: Single-byte interfaces cannot effectively support EVM opcode memory operations**:
- **EVM memory access complexity**: EVM opcodes (like MLOAD, MSTORE) need to handle 256-bit data memory access; single-byte interfaces make implementation extremely complex
- **Low symbolic execution efficiency**: EVM opcode memory operations generate numerous intermediate expressions in symbolic execution, preventing smart contract verification completion
- **Frequent type conversions**: Frequent conversions between EVM's 256-bit data and RISC-V's 32-bit architecture severely impact symbolic execution efficiency

#### 2.2 Problem Description

**SparseBytes Alternating Structure Foundation**:

```k
syntax SparseBytes ::= SparseBytesEF | SparseBytesBF

// SparseBytesEF: Sparse byte structure starting with #empty (Empty First)
syntax SparseBytesEF ::=
    ".SparseBytes"            // Completely empty memory
  | EmptySBItem SparseBytesBF // #empty followed by SparseBytesBF

// SparseBytesBF: Sparse byte structure starting with #bytes (Bytes First)  
syntax SparseBytesBF ::=
    BytesSBItem SparseBytesEF // #bytes followed by SparseBytesEF

syntax BytesSBItem ::= #bytes(Bytes) // Initialized data segment
syntax EmptySBItem ::= #empty(Int)   // Uninitialized data segment
```

**Specific Issues with Single-byte Interface**:

1. **Complex load/store instruction implementation**:
   ```k
   // Complexity of implementing 4-byte load using single-byte interface
   rule <k> LW RD, OFF(RS1) => ... </k>
   // Requires 4 readByte calls, then concatenation to integer
   // Implementation is very complex and inefficient
   ```

2. **Expression explosion in symbolic execution**:
   Each multi-byte operation generates numerous single-byte operation sequences, causing exponential expression growth in symbolic execution

3. **Fundamental difficulty with symbolic index writes**:
   Single-byte interfaces cannot effectively handle symbolic index write operations; when write address is a symbolic value, the specific write location and impact range cannot be determined, preventing symbolic execution continuation

#### 2.3 Solution

For **single-byte interface limitations**, we designed a completely new multi-byte interface and intelligent buffering mechanism:

**Solution Strategy 1: Introducing Multi-byte Interfaces**

To support RISC-V's multi-byte load/store operations, we introduced new multi-byte interfaces:

```k
// New multi-byte interface design
syntax Int ::= readBytes(Int, Int, SparseBytes) [function, total]
syntax SparseBytes ::= writeBytes(Int, Int, Int, SparseBytes) [function, total]
```

**Multi-byte Interface Advantages**:
- **Atomic Operations**: Multi-byte load/store can be handled as single atomic operations, avoiding complex decomposition
- **Simplified Instruction Semantics**: Load/store instruction semantic implementation is greatly simplified
- **Symbolic Execution Efficiency**: Reduces generation of numerous intermediate expressions

**Solution Strategy 2: Efficient Implementation through Three-layer Function Composition**

`readBytes` is implemented through three-layer function composition: `Bytes2Int(pickFront(NUM, dropFront(I, SBS)), LE, Unsigned)`

To avoid expression explosion in symbolic execution, we designed **composite operation pattern recognition** optimization:

```k
// Most common pattern: direct extraction from #bytes segment
rule pickFront(I, #bytes(B +Bytes _) _) => substrBytes(B, 0, I)
  requires I >Int 0 andBool I <=Int lengthBytes(B) [simplification(45), preserves-definedness]

// Intelligent simplification for cross-segment reads: avoid full recursive expansion
rule pickFront(I, #bytes(B +Bytes BS) EF) => B +Bytes pickFront(I -Int lengthBytes(B), #bytes(BS) EF)
  requires I >Int lengthBytes(B) [simplification(45), preserves-definedness]

// dropFront intra-segment optimization: avoid recursion to next layer
rule dropFront(I, #bytes(B +Bytes BS) EF) => dropFront(0, #bytes(substrBytes(B, I, lengthBytes(B)) +Bytes BS) EF) 
  requires I >Int 0 andBool I <Int lengthBytes(B) [simplification(45), preserves-definedness]
```

**Solution Strategy**: Identify the most common access patterns and provide "shortcut" rules for these patterns to avoid full three-layer function expansion. For example, when reads fall exactly within a #bytes segment, convert directly to `substrBytes` operations.

**Solution Strategy 3: Write Buffer (#WB) Buffering Mechanism**

To address **the inability to determine symbolic index byte writes**, we introduced the **Write Buffer (#WB) buffering mechanism**:

```k
// Unified entry point for symbolic index writes
syntax SparseBytes ::= #WB(Bool, Int, Int, Int, SparseBytes) [function, total]
rule writeBytes(I, V, NUM, B:SparseBytes) => #WB(false, I, V, NUM, B) [simplification]

// Direct dispatch for concrete addresses
rule writeBytes(I, V, NUM, BF:SparseBytesBF) => writeBytesBF(I, V, NUM, BF) [simplification(45), concrete(I)]
rule writeBytes(I, V, NUM, EF:SparseBytesEF) => writeBytesEF(I, V, NUM, EF) [simplification(45), concrete(I)]
```

**Core Design Philosophy of #WB Mechanism**:
- **Uninterpreted delayed writes**: When index is symbolic, write location cannot be directly determined; #WB provides delayed write mechanism, maintaining symbolic write operations in uninterpreted form
- **Avoid term bloat**: Through a series of simplification rules, avoid term bloat caused by repeated writes
- **Symbolic index handling**: Buffering solution specifically designed for symbolic index write scenarios

**#WB simplification rules preventing term bloat**:

```k
// Write overwrite optimization: when new write completely covers old write, eliminate old write to avoid redundant #WB layer accumulation
rule #WB(false, I0, V0, NUM0, #WB(_, I1, _, NUM1, B:SparseBytes)) => #WB(false, I0, V0, NUM0, B)
  requires I0 <=Int I1 andBool I1 +Int NUM1 <=Int I0 +Int NUM0 [simplification(45)]

// Write reordering: order optimization for non-intersecting writes, maintaining compact term structure
rule #WB(false, I0, V0, NUM0, #WB(true, I1, V1, NUM1, B:SparseBytes)) => #WB(true, I1, V1, NUM1, #WB(false, I0, V0, NUM0, B)) 
  requires I0 +Int NUM0 <=Int I1 orBool I1 +Int NUM1 <=Int I0 [simplification]
```

**Core Strategy for Preventing Term Bloat**:
- **Write elimination**: When new write completely covers old write, directly eliminate old write to avoid meaningless #WB layer accumulation
- **Write merging**: Through reordering rules, optimize combination of mutually independent write operations
- **Delayed concretization**: Only concretize symbolic writes to actual memory modifications when necessary

**Solution Strategy 4: #WB-aware Read System**

We designed **#WB-aware read rules** and **>>SparseBytes operator** to handle symbolic read-write interactions:

```k
// Dedicated right-shift operator for offset calculation in #WB
syntax SparseBytes ::= SparseBytes ">>SparseBytes" Int [function, total]
rule SBS >>SparseBytes SHIFT => dropFront(SHIFT, SBS) [concrete]
rule #WB(FLAG, I, V, NUM, B:SparseBytes) >>SparseBytes SHIFT => #WB(FLAG, maxInt(0, I -Int SHIFT), V >>Int (SHIFT *Int 8), NUM, B >>SparseBytes SHIFT)
  requires SHIFT >=Int 0 [simplification(45), preserves-definedness]

// #WB-aware reads: directly extract data from Write Buffer
rule pickFront(PICK, #WB(_, I, V, NUM, B:SparseBytes)) => Int2Bytes(minInt(PICK, NUM), V, LE) +Bytes pickFront(maxInt(0, PICK -Int NUM), B >>SparseBytes NUM)
  requires 0 ==Int I [simplification(40)]

// Read position before write: bypass Write Buffer
rule pickFront(PICK, #WB(_, I, _, _, B:SparseBytes)) => pickFront(PICK, B) 
  requires PICK <=Int I [simplification(45)]
```

**Solution Strategy**: Enable read operations to "see through" #WB structure, directly extract required content from buffered write data, or intelligently bypass unrelated write operations. This avoids fully expanding every `readBytes` into complex three-layer nested expressions.

### Challenge 3: Complex Data Type Decomposition

#### 3.1 Root Cause Analysis

This challenge stems from the enormous abstraction level gap between EVM's 256-bit data types and RISC-V instruction set architecture:

**Root Cause Analysis**:
- **Architecture bit-width limitations**: RISC-V is 32-bit architecture; single instructions can only process 32-bit data, while EVM opcodes need to handle 256-bit data types
- **Abstraction level differences**: The gap between EVM's high-level data operations and RISC-V ISA is enormous, involving not just bit length but also data abstraction, type systems, memory management, and other aspects
- **Compilation transformation complexity**: From EVM opcode Rust implementations to RISC-V assembly code involves many compilation steps, each potentially introducing additional complexity

**Specific Impact on EVM Opcodes**:
- **Data decomposition**: EVM's 256-bit data types must decompose into 8 32-bit RISC-V operations
- **Operation decomposition**: Single EVM operations (like ADD opcode's 256-bit addition) need expansion into multiple 32-bit operations with carry
- **Control flow complications**: EVM opcodes' complex logic needs implementation using basic branch instructions

#### 3.2 Problem Description

When executing Rust programs on RISC-V, this abstraction gap causes complex instruction expansion:

```rust
// uint256 variable in Rust code
let a: u256 = 0x123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0;
```

On RISC-V, this decomposes into:
- **Storage decomposition**: Multiple 4-byte, 2-byte, 1-byte load/store instructions
- **Operation decomposition**: Complex arithmetic instruction sequences including carry handling
- **State management**: Numerous intermediate states and conversion operations
- **Expression explosion**: In symbolic execution, these decomposition operations cause exponential symbolic expression growth

**Issues not resolved by Challenges 1 and 2**:
- While Challenge 1 solved core type conversion issues, complex integer operations (like shift operations, bit mask operations) still need further optimization
- While Challenge 2 solved core memory access issues, byte sequence reorganization, concatenation, indexing operations remain complex
- Other type operations like boolean-to-word conversion, complex byte equality judgments still need handling

#### 3.3 Solution

This challenge spawned all semantic optimization work beyond Challenges 1 and 2, mainly including:

**Solution Strategy 1: Integer Operation Chain Optimization**

For complex integer operation decomposition, we designed **operation chain simplification rules**:

```k
// Shift operation chain optimization
rule [int-lsh-lsh]: (W0 <<Int N0) <<Int N1 => W0 <<Int (N0 +Int N1)
  requires 0 <=Int N0 andBool 0 <=Int N1 [simplification, preserves-definedness]

// Bit mask optimization
rule [chop-32bits]: X &Int 4294967295 => X 
  requires 0 <=Int X andBool X <Int 4294967296 [simplification]
rule [chop-16bits]: X &Int 65535 => X 
  requires 0 <=Int X andBool X <Int 65536 [simplification]

// Bitwise operation associativity optimization
rule [int-and-assoc]: (X &Int Y) &Int Z => X &Int (Y &Int Z) 
  [simplification, symbolic(X), concrete(Y,Z)]

// Addition and bit mask combination optimization
rule [int-and-add-assoc-32]: ((X &Int 4294967295) +Int Y) &Int 4294967295 => (X +Int Y) &Int 4294967295
  [simplification]
```

**Solution Strategy**: By identifying common integer operation patterns, simplify complex operation chains to more direct forms, avoiding lengthy intermediate expressions in symbolic execution.

**Solution Strategy 2: Byte Sequence Reorganization Optimization**

For complex byte sequence operations, we designed **byte reorganization rules**:

```k
// Consecutive byte concatenation optimization
rule [bytes-concat-substr]: substrBytes(A, I0, J0) +Bytes substrBytes(A, I1, J1) => substrBytes(A, I0, J1)
  requires I0 <=Int J0 andBool I1 <=Int J1 andBool J0 ==Int I1 andBool J1 <=Int lengthBytes(A)
  [simplification, preserves-definedness]

// Byte length calculation optimization
rule [bytes-length-concat]: lengthBytes(A +Bytes B) => lengthBytes(A) +Int lengthBytes(B) 
  [simplification]
rule [bytes-length-substr]: lengthBytes(substrBytes(B, I, J)) => J -Int I
  requires 0 <=Int I andBool I <=Int J andBool J <=Int lengthBytes(B) [simplification, preserves-definedness]

// Nested substr optimization
rule [substr-substr]: substrBytes(substrBytes(B, I, J), I0, J0) => substrBytes(B, I +Int I0, I +Int J0) 
  requires 0 <=Int I andBool I <=Int J andBool J <=Int lengthBytes(B)
   andBool 0 <=Int I0 andBool I0 <=Int J0 andBool J0 <=Int J -Int I [simplification, preserves-definedness]
```

**Solution Strategy**: By identifying repetitive patterns in byte sequences, merge multi-step byte operations into single-step operations, reducing intermediate states in symbolic execution.

**Solution Strategy 3: Data Equality Judgment Optimization**

For complex data equality judgments, we designed **equality simplification rules**:

```k
// Byte equality optimization by length
rule [bytes-not-equal-length]: BA1:Bytes ==K BA2:Bytes => false
  requires lengthBytes(BA1) =/=Int lengthBytes(BA2) [simplification]

// Byte concatenation equality decomposition
rule [bytes-equal-concat-split-k]: A:Bytes +Bytes B:Bytes ==K C:Bytes +Bytes D:Bytes => A ==K C andBool B ==K D
  requires lengthBytes(A) ==Int lengthBytes(C) orBool lengthBytes(B) ==Int lengthBytes(D) [simplification]

// Integer equality relationship with byte concatenation
rule [int-eq-bytes-concat-split]: X ==Int Bytes2Int(B0 +Bytes B1, LE, Unsigned) => 
  (X &Int ((1 <<Int (lengthBytes(B0) *Int 8)) -Int 1)) ==Int Bytes2Int(B0, LE, Unsigned) andBool
  (X >>Int (lengthBytes(B0) *Int 8)) ==Int Bytes2Int(B1, LE, Unsigned)
  [simplification, concrete(B0), preserves-definedness]
```

**Solution Strategy**: By early judgment of obvious inequality cases, avoid complex equality calculations, and decompose complex equality judgments into simpler sub-problems.

**Solution Strategy 4: Bool2Word Conversion Optimization**

For boolean-to-word conversion, we designed **Bool2Word simplification rules**:

```k
// Bool2Word basic properties
rule [bool2word-non-neg]: 0 <=Int Bool2Word(_) => true [simplification]
rule [bool2word-eq-0]: Bool2Word(B) ==Int 0 => notBool B [simplification]
rule [bool2word-eq-1]: Bool2Word(B) ==Int 1 => B [simplification]

// Bool2Word combination with bitwise operations
rule [int-bool2word-or-ineq]: 0 <Int (0 -Int Bool2Word(4294967295 <Int X)) &Int 4294967295 |Int X &Int 4294967295 => true 
  requires 0 <Int X [simplification(45)]
```

**Solution Strategy**: By identifying common usage patterns of Bool2Word, directly convert boolean conditions to simpler expression forms.

**Solution Strategy 5: Inequality Relationship Optimization**

For complex inequality judgments, we designed **inequality simplification rules**:

```k
// Non-negativity of bitwise operation results
rule [int-and-ineq]: 0 <=Int A &Int B => true requires 0 <=Int A andBool 0 <=Int B [simplification]
rule [int-rhs-ineq]: 0 <=Int A >>Int B => true requires 0 <=Int A andBool 0 <=Int B [simplification]

// Monotonicity of addition
rule [int-add-ineq]: A <=Int A +Int B => true requires 0 <=Int B [simplification]
rule [int-add-ineq-0]: 0 <=Int A +Int B => true requires 0 <=Int A andBool 0 <=Int B [simplification]

// Range judgment for byte integers
rule [bytes2int-upperbound]: Bytes2Int(B, _, Unsigned) <Int X => true
  requires 2 ^Int (lengthBytes(B) *Int 8) <=Int X [simplification]
rule [bytes2int-lowerbound]: 0 <=Int Bytes2Int(_, _, Unsigned) => true [simplification]
```

**Solution Strategy**: By identifying natural range constraints of data types, determine inequality relationships early to avoid complex numerical calculations.

## System Architecture

Optimization rules are organized through layered modular structure, with each module targeting specific EVM opcode requirements:

```k
module LEMMAS
  imports SPARSE-BYTES-SIMPLIFICATIONS  // Memory operation optimizations supporting EVM opcodes
  imports BYTES-SIMPLIFICATIONS         // Byte sequence processing optimizations supporting EVM opcodes
  imports INT-SIMPLIFICATIONS           // Integer operation optimizations supporting EVM opcodes
  imports WORD-SIMPLIFICATIONS          // Type conversion optimizations supporting EVM opcodes
endmodule
```

**Detailed Module Functionality**:
- **SPARSE-BYTES-SIMPLIFICATIONS**: Specifically optimizes EVM opcodes' (like MLOAD, MSTORE) memory access patterns, supporting symbolic execution of 256-bit data
- **BYTES-SIMPLIFICATIONS**: Optimizes byte sequence operations in EVM opcodes, ensuring complex data processing can proceed efficiently in symbolic execution
- **INT-SIMPLIFICATIONS**: Optimizes integer operations in EVM opcodes (like ADD, MUL, AND, OR, etc.), supporting symbolic execution of 256-bit arithmetic and logical operations
- **WORD-SIMPLIFICATIONS**: Optimizes type conversions in EVM opcode implementations, simplifying expression structures to improve symbolic execution efficiency

## Optimization Effects

Through implementing these semantic optimizations:

1. **EVM Arithmetic Operation Support**: Successfully implemented symbolic execution of EVM's ADD, MUL, SUB and other arithmetic opcodes on RISC-V, solving type conversion and expression explosion issues in 256-bit integer operations
2. **EVM Memory Operation Support**: Successfully implemented symbolic execution of EVM's MLOAD, MSTORE and other memory opcodes on RISC-V, effectively handling symbolic index access through #WB mechanism
3. **EVM Logical Operation Support**: Successfully implemented symbolic execution of EVM's AND, OR, XOR and other logical opcodes on RISC-V, optimizing complex bitwise operations and byte sequence operations

**Real-world Application Verification**:
- **REVM Compatibility**: All optimizations were verified against REVM's actual implementations, ensuring semantic correctness of EVM opcodes
- **Symbolic Execution Efficiency**: Dramatically improved smart contract symbolic execution efficiency, making formal verification of complex contracts feasible
- **Expression Management**: Effectively controlled expression complexity in symbolic execution, avoiding symbolic execution timeouts or memory overflow issues

## Technical Significance

1. **Filling Technical Gaps**: First complete implementation of EVM opcode symbolic execution support on RISC-V architecture, laying foundation for blockchain formal verification
2. **Modular Design**: Implemented extensible optimization solutions through layered architecture, facilitating future support for more EVM opcodes and new optimization requirements
3. **Practical Application Value**: Provides crucial technical support for Ethereum smart contract security analysis and formal verification, with significant practical application value

## Conclusion

The semantic optimization work presented in this report systematically addresses three core challenges in RISC-V symbolic execution, **successfully implementing symbolic execution support for REVM's EVM opcodes (including ADD opcode and others) after compilation to RISC-V**. These optimizations not only solve technical challenges but also provide important technical foundation for Ethereum smart contract formal verification and blockchain security analysis. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Semantic Optimizations for Symbolic Execution Efficiency - Technical Report #159

Executive Summary

Detailed Analysis of Core Challenges

Challenge 1: Type System Inconsistency

1.1 Root Cause Analysis

1.2 Problem Description

1.3 Solution

Challenge 2: SparseBytes Memory Model Complexity

2.1 Root Cause Analysis

2.2 Problem Description

2.3 Solution

Challenge 3: Complex Data Type Decomposition

3.1 Root Cause Analysis

3.2 Problem Description

3.3 Solution

System Architecture

Optimization Effects

Technical Significance

Conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Semantic Optimizations for Symbolic Execution Efficiency - Technical Report #159

Description

Executive Summary

Detailed Analysis of Core Challenges

Challenge 1: Type System Inconsistency

1.1 Root Cause Analysis

1.2 Problem Description

1.3 Solution

Challenge 2: SparseBytes Memory Model Complexity

2.1 Root Cause Analysis

2.2 Problem Description

2.3 Solution

Challenge 3: Complex Data Type Decomposition

3.1 Root Cause Analysis

3.2 Problem Description

3.3 Solution

System Architecture

Optimization Effects

Technical Significance

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions