Erasmus X is a modular self-improving AI engineer that researches, writes software, tests itself, stores knowledge, and expands with installable skill packs.
Erasmus X is a modular autonomous agent framework designed to combine:
- LLM reasoning
- Deterministic code generation pipelines
- Project scaffolding packs
- Memory + retrieval
- Benchmarking + self-evaluation
- Tool use + web/search workflows
- Multi-language software generation
- NVIDIA NIM & Local LLM optimization
- V12/V14 Build Pipeline with Capability Contracts
It is built to operate as a practical software-building and research agent rather than only a chat model.
Supports:
- Simple factual Q&A
- Multi-step reasoning
- Context-aware dialogue
- Session continuity
- Fallback answers when models fail
- Dynamic routing between fast and deep reasoning modes
Examples:
- “What is HTTP?”
- “Explain Gödel’s incompleteness theorem simply.”
- “Compare capitalism vs socialism.”
For complex prompts, Erasmus X can escalate into deeper reasoning workflows:
- Multi-query search expansion
- Iterative crawling of sources
- Summarisation across sources
- Contradiction detection
- Long-form synthesis
- Source ingestion into memory/vector store
Examples:
- “Research the future of fusion energy.”
- “Compare all major open-source vector databases.”
- “Investigate UK housing market trends.”
Can generate complete software projects across stacks.
- Python CLI tools
- Flask / FastAPI apps
- Node / Express APIs
- Next.js full-stack apps
- React frontends
- SQLite / Prisma apps
- TypeScript projects
- Utility scripts
- Multi-file structured repos
- File generation
- Directory planning
- Dependency manifests
- Tests
- Verification scripts
- Syntax validation
- Repair loops
- Capability Contracts (V12)
- Build Critic (V13)
- Autonomous Synthesis (V14)
- Project packaging
Deterministic domain packs dramatically improve reliability.
- todo
- prisma_todo
- auth
- dashboard
- db
- api-routes
- validation
- booking-system
- express-api
- Known-good file structures
- Real implementations
- Tests
- Verify commands
- Faster builds
- Less hallucination
Supports generating tasks/projects in:
- Python
- JavaScript
- TypeScript
- C
- C++
- Java
- Go
- Rust
- Bash
- SQL
- HTML/CSS
Examples:
- “Write a C bubble sort.”
- “Create a Rust REST API.”
- “Generate a Go worker pool.”
Every serious coding task can include:
- Unit tests
- CLI execution checks
- Syntax validation
- Build verification
- Manifest checks
- Contract fidelity checks
Examples:
pytestnpm testcargo testgo testtsc --noEmit
Persistent memory systems may include:
- Semantic vector memory
- Extracted facts
- Benchmark history
- Failure learning
- Retrieved web knowledge
- Context grounding when windows fill
This allows improving over time.
Built-in benchmark suite supports:
- Simple Q&A
- Deep reasoning
- Search tasks
- Coding tasks
- Multi-file projects
- Long-context memory tests
- Python
- JS/TS
- C
- SQL
- Shell
- Latency
- Success rate
- Syntax pass rate
- Code execution pass rate
- Project completion
- Memory continuity
- Search quality
The system can route prompts to:
Low-latency answers.
Longer reasoning / difficult tasks.
Multi-file software builds.
Multi-source crawling + synthesis.
Supports replacing older local models (e.g. GPT-2) with stronger local options such as:
- Qwen
- Phi
- Mistral
- Gemma
- Llama-family models
Can run hybrid local + remote pipelines.
The runtime supports local, openai, anthropic, deepseek, and kimi.
Main builder model and internal agent model are configured separately so you can mix providers:
# Local/OpenAI-compatible main model
LOCAL_MODEL_PROVIDER=local
LOCAL_MODEL_TYPE=local-model
API_BASE_URL=http://localhost:12345/v1
API_KEY=local
# Optional local/internal agent model
LOCAL_AGENT_MODEL_PROVIDER=local
LOCAL_AGENT_MODEL_TYPE=tinyllama
AGENT_API_BASE_URL=http://localhost:12345/v1
AGENT_API_KEY=local
# Local helper LLM execution mode
# false = load Python/Transformers model directly
# true = call a running LM Studio or Ollama server
USE_LOCAL_LLM_SERVER=false
LOCAL_LLM_SERVER_TYPE=lmstudio
LMSTUDIO_API_BASE_URL=http://localhost:1234/v1
OLLAMA_API_BASE_URL=http://localhost:11434
LOCAL_LLM_SERVER_API_BASE_URL=
LOCAL_LLM_SERVER_API_KEY=local
# Remote main model overrides LOCAL_MODEL_TYPE when set
REMOTE_MODEL_PROVIDER=openai
REMOTE_MODEL_TYPE=gpt-4.1-mini
OPENAI_API_KEY=...
# Remote agent model overrides LOCAL_AGENT_MODEL_TYPE when set
REMOTE_AGENT_MODEL_PROVIDER=anthropic
REMOTE_AGENT_MODEL_TYPE=claude-3-5-haiku-latest
ANTHROPIC_API_KEY=...Base URLs can be overridden with MODEL_API_BASE_URL, AGENT_MODEL_API_BASE_URL, or provider-specific variables such as DEEPSEEK_API_BASE_URL.
Erasmus X is optimized for high-throughput reasoning using NVIDIA NIM (NVIDIA Inference Microservices).
To use NVIDIA NIM:
- Get an API key from NVIDIA build.
- Configure your
.envor shell:
REMOTE_MODEL_PROVIDER=openai
API_BASE_URL=https://integrate.api.nvidia.com/v1
OPENAI_API_KEY=nvapi-your-key-here
MODEL_NAME=meta/llama-3.1-405b-instruct # Or your preferred NIM modelFor the internal agent model (Cognitive Shards):
REMOTE_AGENT_MODEL_PROVIDER=openai
AGENT_API_BASE_URL=https://integrate.api.nvidia.com/v1
AGENT_API_KEY=nvapi-your-key-here
AGENT_MODEL_NAME=meta/llama-3.1-8b-instructErasmus Cell supports a Shard-Based Reasoning Architecture.
Shards are specialised internal reasoning modules that can be loaded, combined, or routed dynamically depending on the task. Instead of relying on one monolithic thinking style, the agent can distribute cognition across multiple focused personas.
A shard is a lightweight expert mode optimized for a specific domain or behavior.
Examples:
-
Code Architect Shard
Designs software structure, modular systems, APIs, file trees, refactors. -
Debugger Shard
Finds bugs, traces errors, repairs failing builds, improves test pass rates. -
Research Analyst Shard
Performs deep research, compares sources, extracts facts, synthesizes findings. -
Critic Shard
Reviews generated outputs for quality, completeness, stack fidelity, missing features. -
Product Manager Shard
Converts vague prompts into structured requirements, milestones, deliverables. -
Security Shard
Reviews auth, secrets, validation, vulnerabilities, safe defaults. -
Performance Shard
Optimizes speed, memory use, query efficiency, build latency.
When a task arrives, the router can activate one or more shards.
Examples:
Build a booking platform with login and admin dashboard
Activated shards:
- Product Manager
- Code Architect
- Auth/Security
- Critic
Research the future of fusion energy
Activated shards:
- Research Analyst
- Critic
- Memory Synthesizer
My TypeScript app crashes on deploy
Activated shards:
- Debugger
- Performance
- Critic
For harder tasks, shards can work in sequence or parallel.
Example software workflow:
- Product Manager defines requirements
- Architect designs structure
- Builder generates code
- Debugger repairs issues
- Critic scores output
- Security audits release
This creates a more human-like team workflow.
- Better reasoning quality
- Domain specialization
- Less hallucination
- More reliable project generation
- Faster routing to best thinking mode
- Easier future expansion
- More human team-like cognition
Shards turn Erasmus X from a single assistant into a modular digital organization capable of solving complex real-world tasks through coordinated specialist intelligence.
For best performance:
Qwen2.5-7B-InstructPhi-4-miniMistral 7B InstructGemma 2 9B- Quantized GGUF variants with llama.cpp
- Research all major browser engines and compare futures.
- Build a FastAPI CRM with tests.
- Build a Prisma booking system with login/admin dashboard.
- What is DNS?
- Design a scalable ride-sharing backend.
For a high-fidelity interactive experience with real-time streaming, status panels, and token tracking:
python erasmus_cli.pypython main.pyStartup menu options:
- Chat with agent
- Run seed ingestion
- Run HTTP API server
- Run test suite
- Run benchmark suite
- Show config
Direct commands:
python main.py --chat
python main.py --seed --seed-limit 10
python main.py --api --host 127.0.0.1 --port 8008
python main.py --tests
python main.py --benchmarksRun benchmarks:
python test/automated_benchmarks.pyErasmus X is not just a chatbot.
It is an evolving autonomous builder, researcher, debugger and reasoning system.
Current state:
V14: Autonomous Neurosymbolic Architect. Capable of high-fidelity project convergence through strict contract enforcement and multi-stage critic loops.
