Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 0 additions & 13 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,6 @@ POSTGRES_TABLE_NAME="documents"

# General Configuration
PORT="3001"
SIMILARITY_MEASURE="cosine"

# LLM Provider Configuration
DEFAULT_CHAT_PROVIDER="gemini"
DEFAULT_CHAT_MODEL="Gemini Flash 2.5"
DEFAULT_FAST_CHAT_PROVIDER="gemini"
DEFAULT_FAST_CHAT_MODEL="Gemini Flash 2.5"
DEFAULT_EMBEDDING_PROVIDER="openai"
DEFAULT_EMBEDDING_MODEL="Text embedding 3 large"

# LLM Provider API Keys
OPENAI_API_KEY=""
Expand All @@ -26,10 +17,6 @@ DEEPSEEK_API_KEY=""
GROQ_API_KEY=""
XAI_API_KEY=""

# Version Configuration
STARKNET_FOUNDRY_VERSION="0.47.0"
SCARB_VERSION="2.11.4"

# LangSmith Configuration (Optional)
LANGSMITH_TRACING="false"
LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
Expand Down
12 changes: 0 additions & 12 deletions .github/workflows/generate-embeddings.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,24 +64,12 @@ jobs:

# General Configuration
PORT="3001"
SIMILARITY_MEASURE="cosine"

# Provider Configuration
DEFAULT_CHAT_PROVIDER="gemini"
DEFAULT_CHAT_MODEL="Gemini Flash 2.5"
DEFAULT_FAST_CHAT_PROVIDER="gemini"
DEFAULT_FAST_CHAT_MODEL="Gemini Flash 2.5"
DEFAULT_EMBEDDING_PROVIDER="openai"
DEFAULT_EMBEDDING_MODEL="Text embedding 3 large"

# API Keys
OPENAI_API_KEY="${{ secrets.OPENAI }}"
ANTHROPIC_API_KEY="${{ secrets.ANTHROPIC }}"
GEMINI_API_KEY="${{ secrets.GEMINI }}"

# Version Configuration
STARKNET_FOUNDRY_VERSION="0.47.0"
SCARB_VERSION="2.11.4"
EOL

- name: Generate embeddings
Expand Down
62 changes: 45 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,10 @@ Using Docker is highly recommended for a streamlined setup.

Edit the `.env` file with your credentials:

- Database credentials (defaults provided for local development)
- LLM API keys (at least one required: OpenAI, Anthropic, or Gemini)
- Optional: LangSmith credentials for monitoring
- Database credentials (defaults provided above for local development)
- LLM API keys (at least one required: `GEMINI_API_KEY` is recommended)
- Optional: `XAI_API_KEY` for Grok search functionality
- Optional: `LANGSMITH_*` variables for monitoring

3. **Run the Ingester (First Time Setup)**

Expand Down Expand Up @@ -120,12 +121,11 @@ Cairo Coder uses a modern architecture based on Retrieval-Augmented Generation (

### Project Structure

The project is organized as a monorepo with multiple packages:
The project is organized as a monorepo with multiple components:

- **python/**: The core RAG agent and API server implementation using DSPy and FastAPI.
- **packages/ingester/**: (TypeScript) Data ingestion tools for Cairo documentation sources.
- **packages/typescript-config/**: Shared TypeScript configuration.
- **(Legacy)** `packages/agents`: The original Langchain-based TypeScript implementation.
- **ingesters/**: (TypeScript/Bun) Data ingestion tools for Cairo documentation sources.
- **docker-compose.yml**: Orchestrates postgres, backend, and ingester services.

### RAG Pipeline (Python/DSPy)

Expand All @@ -141,21 +141,48 @@ The RAG pipeline is implemented in the `python/src/cairo_coder/core/` directory

### Python Service

For local development of the Python service, navigate to `python/` and run the following commands`
For local development of the Python service:

1. **Setup Environment**:

```bash
cd python

# Install uv package manager (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies
uv sync
```

2. **Start Database** (from root directory):

```bash
# Install uv package manager
curl -LsSf https://astral.sh/uv/install.sh | sh
# From root directory
docker compose up postgres
```
2. **Run Server**:
> Note: make sure the database is running, and the ingesters have been run.

3. **Run Ingester** (first time setup, from root directory):

```bash
uv run cairo-coder --dev
# From root directory
cd ingesters
bun install
bun run generate-embeddings:yes
cd ..
```
3. **Run Tests & Linting**:

4. **Run Server** (from python directory):

```bash
cd python
uv run cairo-coder --dev
```

5. **Run Tests** (from python directory):
```bash
uv run pytest
cd python
uv run pytest
```

### Starklings Evaluation
Expand All @@ -165,11 +192,12 @@ A script is included to evaluate the agent's performance on the Starklings exerc
> Note: we recommend pre-warming the compilation cache by running `cd fixtures/runner_crate && scarb build` before running the evaluation.

```bash
# Run a single evaluation round
# From python directory
cd python
uv run starklings_evaluate
```

Results are saved in the `starklings_results/` directory.
Results are saved in the `python/starklings_results/` directory.

## Contribution

Expand Down
23 changes: 18 additions & 5 deletions python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,9 @@ Cairo Coder is an AI-powered code generation service specifically designed for t
## Installation

```bash
# Install uv package manager
cd python

# Install uv package manager (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies
Expand All @@ -31,8 +33,8 @@ Cairo Coder uses environment variables for all sensitive configuration like API

```bash
# From the root directory
cp .env.example .env
# Edit .env with your credentials
# Create .env file with required environment variables
# See main README.md for required variables
```

## Running the Service
Expand All @@ -56,15 +58,26 @@ docker compose up postgres

```bash
# From root directory
pnpm generate-embeddings
cd ingesters
bun install
bun run generate-embeddings:yes
cd ..
```

4. Start the FastAPI server:
4. Start the FastAPI server (from python directory):

```bash
cd python
uv run cairo-coder
```

Or with development mode (auto-reload):

```bash
cd python
uv run cairo-coder --dev
```

### Dockerized

All configuration is handled automatically via Docker Compose. From the root directory:
Expand Down
1 change: 0 additions & 1 deletion python/src/cairo_coder/config/manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ def load_config() -> Config:
user=os.getenv("POSTGRES_USER", "cairocoder"),
password=os.getenv("POSTGRES_PASSWORD", ""),
table_name=os.getenv("POSTGRES_TABLE_NAME", "documents"),
similarity_measure=os.getenv("SIMILARITY_MEASURE", "cosine"),
)

# Validate essential configuration
Expand Down
2 changes: 0 additions & 2 deletions python/src/cairo_coder/core/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,6 @@ class VectorStoreConfig:
user: str
password: str
table_name: str
embedding_dimension: int = 2048 # text-embedding-3-large dimension
similarity_measure: str = "cosine" # cosine, dot_product, euclidean

@property
def dsn(self) -> str:
Expand Down
4 changes: 2 additions & 2 deletions python/src/cairo_coder/dspy/pgvector_rm.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,8 @@ class PgVectorRM(dspy.Retrieve):
llm = dspy.LM("gemini/gemini-flash-latest")
dspy.configure(lm=llm)

# DATABASE_URL should be in the format postgresql://user:password@host/database
db_url = os.getenv("DATABASE_URL")
# DATABASE_URL should be in the format:
db_url = postgresql://user:password@host/database

# embedding_func will default to dspy.settings.embedder
retriever_model = PgVectorRM(db_url, "paragraphs", fields=["text", "document_id"], k=20)
Expand Down
1 change: 0 additions & 1 deletion python/src/cairo_coder/server/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -755,7 +755,6 @@ def get_vector_store_config() -> VectorStoreConfig:
user=config.vector_store.user,
password=config.vector_store.password,
table_name=config.vector_store.table_name,
similarity_measure=config.vector_store.similarity_measure,
)


Expand Down
2 changes: 0 additions & 2 deletions python/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -367,8 +367,6 @@ def clean_config_env_vars(monkeypatch):
"POSTGRES_USER",
"POSTGRES_PASSWORD",
"POSTGRES_TABLE_NAME",
"POSTGRES_ROOT_DB",
"SIMILARITY_MEASURE",
"HOST",
"PORT",
"DEBUG",
Expand Down
2 changes: 0 additions & 2 deletions python/tests/integration/test_config_integration.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ def test_load_configuration_from_env(self, monkeypatch: pytest.MonkeyPatch) -> N
monkeypatch.setenv("POSTGRES_USER", "test_user")
monkeypatch.setenv("POSTGRES_PASSWORD", "test_password")
monkeypatch.setenv("POSTGRES_TABLE_NAME", "test_documents")
monkeypatch.setenv("SIMILARITY_MEASURE", "cosine")
monkeypatch.setenv("HOST", "localhost")
monkeypatch.setenv("PORT", "8001")
monkeypatch.setenv("DEBUG", "true")
Expand All @@ -31,7 +30,6 @@ def test_load_configuration_from_env(self, monkeypatch: pytest.MonkeyPatch) -> N
assert config.vector_store.user == "test_user"
assert config.vector_store.password == "test_password"
assert config.vector_store.table_name == "test_documents"
assert config.vector_store.similarity_measure == "cosine"
assert config.host == "localhost"
assert config.port == 8001
assert config.debug is True
Expand Down
3 changes: 0 additions & 3 deletions python/tests/unit/test_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@ def test_load_config_with_defaults(self, monkeypatch: pytest.MonkeyPatch) -> Non
assert config.vector_store.user == "cairocoder"
assert config.vector_store.password == "test-password"
assert config.vector_store.table_name == "documents"
assert config.vector_store.similarity_measure == "cosine"
assert config.host == "0.0.0.0"
assert config.port == 3001
assert config.debug is False
Expand All @@ -44,7 +43,6 @@ def test_load_config_from_environment(self, monkeypatch: pytest.MonkeyPatch) ->
monkeypatch.setenv("POSTGRES_USER", "env-user")
monkeypatch.setenv("POSTGRES_PASSWORD", "env-pass")
monkeypatch.setenv("POSTGRES_TABLE_NAME", "env-table")
monkeypatch.setenv("SIMILARITY_MEASURE", "dot_product")
monkeypatch.setenv("HOST", "127.0.0.1")
monkeypatch.setenv("PORT", "8080")
monkeypatch.setenv("DEBUG", "true")
Expand All @@ -58,7 +56,6 @@ def test_load_config_from_environment(self, monkeypatch: pytest.MonkeyPatch) ->
assert config.vector_store.user == "env-user"
assert config.vector_store.password == "env-pass"
assert config.vector_store.table_name == "env-table"
assert config.vector_store.similarity_measure == "dot_product"
assert config.host == "127.0.0.1"
assert config.port == 8080
assert config.debug is True
Expand Down