A complete Python-to-EVM compiler implemented entirely in Solidity smart contracts. The compiler accepts Python source code as a string, processes it through a six-phase pipeline (Lexer → Parser → Semantic Analyzer → Code Generator → VM), and executes the resulting bytecode on a custom stack-based virtual machine.
| Suite | Tests | Status |
|---|---|---|
| Lexer | 54 | All passing |
| Parser | 33 | All passing |
| SemanticAnalyzer | 24 | All passing |
| CodeGenerator | 13 | All passing |
| VM | 12 | All passing |
| Integration | 17 | All passing |
| ForLoop | 13 | All passing |
| Demo | 11 | All passing |
| Total | 177 | All passing |
src/PythonCompiler.sol— Top-level orchestratorsrc/phases/Lexer.sol— Tokenizersrc/phases/Parser.sol— Recursive descent parser / AST buildersrc/phases/SemanticAnalyzer.sol— Symbol table, scope resolution, type inferencesrc/phases/CodeGenerator.sol— AST to bytecode emittersrc/phases/VM.sol— Stack machine bytecode interpretersrc/types/Token.sol— Token types and structssrc/types/ASTNode.sol— AST node types and structssrc/types/TypeInfo.sol— Semantic analysis type systemsrc/libraries/StringLib.sol— String/bytes utilities
test/Lexer.t.sol— 54 teststest/Parser.t.sol— 33 teststest/SemanticAnalyzer.t.sol— 24 teststest/CodeGenerator.t.sol— 13 teststest/VM.t.sol— 12 teststest/Integration.t.sol— 17 tests (including bubble sort and print string)test/ForLoop.t.sol— 13 tests (range, list iteration, nested loops, break, continue)test/Demo.t.sol— 11 tests
CLAUDE.md— Project guidanceGOAL.md— Project state trackerPROGRESS.md— Development logARCHITECTURE.md— System designISA.md— Instruction set definition
- Integer, float, string, boolean, and None literals
- Arithmetic operators:
+,-,*,/,//,%,** - Comparison operators:
==,!=,<,>,<=,>= - Boolean operators:
and,or,not - Variable assignment and augmented assignment (
+=,-=,*=,/=) if/elif/elsecontrol flowwhileloopsforloops withrange()(1-3 args) and list iterationbreakandcontinuestatements- Function definitions with parameters and
return - Recursive function calls
- List literals and index access
- Built-in functions:
print,len - Class definitions (basic — body executes inline)
- String output via
print("...")using PRINT_STR opcode
importstatements- Exception handling (
try/except) - Dict / set types
- String methods
- Generator / iterator protocol
- Nested classes
- Multiple assignment (
a, b = 1, 2)
- Separate storage arrays for tokens and AST nodes — Solidity 0.8.20 cannot copy
struct[] memorycontaining strings to storage. - Three-tier aux architecture in Parser —
aux[]for statements,exprAux[]for function args/list elements/params,bodyStackfor nested blocks merged after parsing. - Getter-based AST access — SemanticAnalyzer and CodeGenerator read AST nodes from Parser via public getters to avoid struct copy limitations.
- Backpatching for forward jumps — placeholder values emitted during code generation, patched after target offsets are known.
- Mapping-based VM frames —
mapping(uint256 => mapping(uint256 => uint256))for frame-local variable storage, avoiding dynamic memory allocation. - Custom bytecode format — header with magic bytes "PY", version, code length, followed by code section and string table.
- Body nesting stack — Parser uses a separate
_bodyNestingstack to track current nesting level, preventing statements from being pushed to wrong body level after nested blocks return. - For loop desugaring —
for x in range(n)is desugared into index-based while loop with temp variables__fi,__fs,__fz. - Unchecked arithmetic — VM uses unchecked blocks for ADD, SUB, MUL, NEG to support two's complement negative numbers.
- Continue backpatching — Continue targets are backpatched after loop body generation, since the increment code offset is unknown during body generation.