krafton-ai
diff --git a/‎.env.example‎
Lines changed: 37 additions & 0 deletions b/‎.env.example‎
Lines changed: 37 additions & 0 deletions
diff --git a/‎.gitattributes‎
Lines changed: 2 additions & 0 deletions b/‎.gitattributes‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎.github/workflows/pages.yml‎
Lines changed: 33 additions & 0 deletions b/‎.github/workflows/pages.yml‎
Lines changed: 33 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 34 additions & 0 deletions b/‎.gitignore‎
Lines changed: 34 additions & 0 deletions
diff --git a/‎LICENSE‎
Lines changed: 21 additions & 0 deletions b/‎LICENSE‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 242 additions & 0 deletions b/‎README.md‎
Lines changed: 242 additions & 0 deletions
@@ -0,0 +1,37 @@
+# RLoop — Environment Variables
+# Copy to .env and fill in your values:  cp .env.example .env
+
+# Optional: Anthropic API key for LLM reward generation (Claude provider)
+ANTHROPIC_API_KEY=sk-ant-api03-YOUR_KEY_HERE
+
+# Required: Google Gemini API key — default LLM agent + VLM video judgment
+GEMINI_API_KEY=YOUR_GEMINI_KEY_HERE
+
+# Optional: OpenAI API key (for GPT/o-series models)
+# OPENAI_API_KEY=sk-YOUR_KEY_HERE
+
+# vLLM server settings (for native video judge)
+# VLLM_HOST=localhost
+# VLLM_MODEL=Qwen/Qwen3.5-27B
+# VLLM_PORT=8100
+
+# MuJoCo rendering backend
+# Linux (headless GPU): MUJOCO_GL=egl
+# macOS: do NOT set (uses native CGL)
+# MUJOCO_GL=egl
+
+# Optional: Frontend API base URL (default: http://localhost:8000)
+# Set this in frontend/.env.local if running frontend on a different host
+# NEXT_PUBLIC_API_URL=http://localhost:8000
+
+# Optional: Claude Code analyst skill — dashboard URL
+# P2P_LOCAL_URL=http://localhost:8000
+
+# CI release script — benchmark dashboard URLs
+# DASHBOARD_URL=http://localhost:3000/benchmark
+# SCHEDULER_URL=http://localhost:3000/scheduler
+
+# Optional: Video Judge Bench — human labeling server for eval video scoring
+# When set, a scoring UI appears on evaluation videos in the frontend.
+# LABELING_SERVER_URL=http://localhost:8765
+# LABELING_ANNOTATOR=your_username
@@ -0,0 +1,2 @@
+blog/**/*.mp4 filter=lfs diff=lfs merge=lfs -text
+blog/*.mp4 filter=lfs diff=lfs merge=lfs -text
@@ -0,0 +1,33 @@
+name: Deploy Project Page
+
+on:
+  push:
+    branches: [main]
+    paths: [blog/**]
+  workflow_dispatch:
+
+permissions:
+  contents: read
+  pages: write
+  id-token: write
+
+concurrency:
+  group: pages
+  cancel-in-progress: true
+
+jobs:
+  deploy:
+    runs-on: ubuntu-latest
+    environment:
+      name: github-pages
+      url: ${{ steps.deployment.outputs.page_url }}
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          lfs: true
+      - uses: actions/configure-pages@v5
+      - uses: actions/upload-pages-artifact@v3
+        with:
+          path: blog
+      - id: deployment
+        uses: actions/deploy-pages@v4
@@ -0,0 +1,34 @@
+.env
+.claude/*
+!.claude/rules/
+.claude/rules/*
+!.claude/rules/contracts.md
+!.claude/rules/quality.md
+!.claude/hookify.*.md
+!.claude/hookify.*.local.md
+!.claude/settings.json
+!.claude/skills/
+.claude/skills/*
+!.claude/skills/rloop-analyst/
+__pycache__/
+*.pyc
+*.egg-info/
+dist/
+build/
+.venv/
+/runs/
+*.mp4
+!blog/**/*.mp4
+!blog/*.mp4
+.ruff_cache/
+node_modules/
+.next/
+frontend/.next/
+frontend/node_modules/
+logs/
+*.log
+docs/cpu_stress_test/
+humanoid_walk/
+.rsync-exclude
+exclude/
+IsaacLab/
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2026 Prompt2Policy Contributors
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
@@ -0,0 +1,242 @@
+<p align="center">
+  <!-- TODO: logo image -->
+  <h1 align="center">Prompt2Policy</h1>
+</p>
+
+<p align="center">
+  <strong>Describe a behavior in a prompt. Get a trained policy.</strong><br/>
+  LLM-powered reward engineering that writes, trains, judges, and iterates — until your RL agent does what you asked.
+</p>
+
+<p align="center">
+  <a href="https://krafton-ai.github.io/Prompt2Policy"><img src="https://img.shields.io/badge/%F0%9F%8C%90%20Project-Page-4285F4?style=for-the-badge" alt="Project Page"/></a>
+</p>
+
+<p align="center">
+  <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.11+-blue.svg?style=flat-square" alt="Python 3.11+"/></a>
+  <a href="https://github.com/astral-sh/ruff"><img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json&style=flat-square" alt="Ruff"/></a>
+  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg?style=flat-square" alt="License"/></a>
+</p>
+
+<div align="center">
+  <img src="docs/demo_zoom_reveal.gif" alt="Prompt2Policy showcase: diverse learned behaviors from natural language intents" width="960"/>
+</div>
+
+## What It Does
+
+| | Feature | Description |
+|---|---|---|
+| 🎯 | **Intent to Reward** | Describe behavior in natural language — LLM writes the reward function |
+| 🏋️ | **Parallel Training** | PPO with multiple seeds and configs via Stable-Baselines3 |
+| 👁️ | **Dual Judgment** | Code-based judge + VLM video judge evaluate trained policies |
+| 🔄 | **Auto-Revision** | LLM diagnoses failures and rewrites reward + tunes hyperparameters |
+| 🤖 | **Multi-LLM** | Claude, Gemini, GPT — any model with tool use support |
+| 🦾 | **MuJoCo + IsaacLab** | 10 MuJoCo envs built-in, 90 IsaacLab envs optional |
+| 📊 | **Dashboard** | Real-time web UI for sessions, training curves, rollout videos |
+
+---
+
+## Quick Start
+
+### Install
+
+```bash
+git clone https://github.com/krafton-ai/Prompt2Policy.git
+cd Prompt2Policy
+uv sync --all-extras
+```
+
+<details>
+<summary>Don't have uv?</summary>
+
+```bash
+curl -LsSf https://astral.sh/uv/install.sh | sh
+```
+
+See [uv installation guide](https://docs.astral.sh/uv/getting-started/installation/) for other platforms.
+
+</details>
+
+### Configure
+
+```bash
+cp .env.example .env
+# Edit .env — set GEMINI_API_KEY (required), plus ANTHROPIC_API_KEY or OPENAI_API_KEY (optional)
+```
+
+### Run (Dashboard)
+
+```bash
+uv run uvicorn p2p.api.app:app --host 0.0.0.0 --port 8000 --reload --reload-dir src  # Terminal 1
+cd frontend && npm install && npm run dev                                                    # Terminal 2
+```
+
+Open **http://localhost:3000**, enter an intent like *"do a backflip"*, and hit run. See the [dashboard tutorial](https://krafton-ai.github.io/Prompt2Policy/) for a video walkthrough. For CLI usage, see [CLI Reference](#cli-reference).
+
+### Verify
+
+```bash
+uv run pytest tests/ -v
+```
+
+---
+
+## Pipeline
+
+<!-- TODO: pre-rendered SVG pipeline diagram -->
+
+```
+User Intent → Intent Elicitor → Reward Author + Judge Author
+                                        ↓
+                                   Code Review
+                                        ↓
+                              PPO Training (seeds × configs)
+                                        ↓
+                              Code Judge ∥ VLM Judge
+                                        ↓
+                                   Synthesizer
+                                    ↓         ↓
+                              [pass]  →  Done
+                              [fail]  →  Revise Agent → next iteration
+```
+
+---
+
+## Supported Environments
+
+<details>
+<summary><strong>MuJoCo (built-in)</strong> — 10 environments: all Gymnasium MuJoCo v5 locomotion</summary>
+
+| Environment | DOF | Example Intents |
+|-------------|-----|-----------------|
+| **HalfCheetah-v5** | 6 | *"run forward fast"*, *"do a backflip"* |
+| **Ant-v5** | 8 | *"walk in a circle"*, *"stand on rear legs"* |
+| **Hopper-v5** | 3 | *"hop forward"*, *"jump as high as possible"* |
+| **Walker2d-v5** | 6 | *"walk forward naturally"*, *"high knee sprinting"* |
+| **Humanoid-v5** | 17 | *"walk with natural gait"*, *"perform a deep squat"* |
+| **HumanoidStandup-v5** | 17 | *"stand up from the ground"* |
+| **Swimmer-v5** | 2 | *"swim forward"*, *"swim in a zigzag"* |
+| **Reacher-v5** | 2 | *"reach the target"* |
+| **InvertedPendulum-v5** | 1 | *"keep the pole balanced"* |
+| **InvertedDoublePendulum-v5** | 1 | *"balance both poles"* |
+
+</details>
+
+<details>
+<summary><strong>IsaacLab (optional)</strong> — 90 environments: locomotion, manipulation, dexterous</summary>
+
+[NVIDIA IsaacLab](https://github.com/isaac-sim/IsaacLab) environments are supported when Isaac Sim is installed.
+
+| Category | Count | Examples |
+|----------|-------|---------|
+| Manipulation (Lift/Stack) | 21 | Franka lift/stack, Galbot, UR10 |
+| Locomotion (Flat) | 12 | ANYmal B/C/D, Unitree Go1/Go2/A1, Cassie, Spot, H1, G1, Digit |
+| Locomotion (Rough) | 11 | Same robots, rough terrain |
+| Manipulation (Reach) | 8 | Franka, UR10, OpenArm |
+| Humanoid | 8 | Humanoid locomotion variants |
+| Assembly | 8 | AutoMate, Factory, Forge |
+| Dexterous | 7 | Shadow hand, Allegro |
+| Classic Control | 5 | Cartpole, Ant |
+| Pick & Place | 4 | Franka, UR10 |
+| Other | 6 | Quadcopter, Navigation |
+
+**Requirements**: NVIDIA GPU with CUDA 12+, driver 525+, Ubuntu 22.04+.
+
+</details>
+
+---
+
+## Configuration
+
+| Variable | Required | Default | Description |
+|----------|----------|---------|-------------|
+| `GEMINI_API_KEY` | **Yes** | — | Default LLM agent + VLM video judgment |
+| `ANTHROPIC_API_KEY` | No | — | Required when using Claude models as LLM |
+| `OPENAI_API_KEY` | No | — | Required when using GPT models as LLM |
+| `MUJOCO_GL` | No | *(unset)* | Set to `egl` on headless Linux |
+
+<details>
+<summary>Advanced settings</summary>
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `VLLM_HOST` | `localhost` | vLLM server host (local VLM inference) |
+| `VLLM_PORT` | `8100` | vLLM server port |
+| `VLLM_MODEL` | `Qwen/Qwen3.5-27B` | vLLM model name |
+
+</details>
+
+---
+
+## CLI Reference
+
+### E2E Loop
+
+```bash
+uv run python -m p2p.session.run_session \
+  --session-id my_session \
+  --prompt "do a backflip" \
+  --loop-config '{"train": {"env_id": "HalfCheetah-v5", "total_timesteps": 1000000}, "max_iterations": 5, "pass_threshold": 0.7, "hp_tuning": true}'
+```
+
+### Benchmark
+
+```bash
+uv run python -m p2p.benchmark.benchmark_cli \
+  --csv benchmark/test_cases_exotic_ant_halfcheetah_humanoid.csv \
+  --max-iterations 5 \
+  --total-timesteps 1000000 \
+  --max-parallel 4 \
+  --num-configs 3
+```
+
+See the [User Guide](docs/GUIDE.md) for full flag reference and API examples.
+
+---
+
+## Hardware
+
+| | MuJoCo (default) | IsaacLab |
+|---|-------------------|----------|
+| **CPU** | 8+ cores (16+ recommended for parallel seeds) | 8+ cores |
+| **RAM** | 16 GB (32+ recommended) | 32+ GB |
+| **GPU** | Optional — CUDA GPU for EGL rendering | Required — 24+ GB VRAM (varies by task) |
+| **Disk** | 20 GB | 100+ GB |
+
+MuJoCo training is CPU-bound (PPO with MLP policy). A GPU accelerates headless rendering (EGL) and local VLM inference but is not required. IsaacLab environments are GPU-vectorized and need at least 24 GB VRAM.
+
+---
+
+## Development
+
+```bash
+uv run ruff check src/ tests/          # lint
+uv run ruff format --check src/ tests/  # format
+uv run pytest tests/ -v                 # test
+cd frontend && npm run lint             # frontend lint
+```
+
+## Tech Stack
+
+- **Training** — Gymnasium, MuJoCo, Stable-Baselines3, IsaacLab (optional)
+- **LLM/VLM** — Anthropic Claude, Google Gemini, OpenAI GPT, vLLM
+- **Backend** — FastAPI, uvicorn
+- **Frontend** — Next.js, React, Tailwind CSS, Recharts, KaTeX
+- **Dev** — uv, ruff, pytest
+
+## Documentation
+
+- [User Guide](docs/GUIDE.md) — detailed setup, usage, intent tips, LLM models, IsaacLab installation
+- [Architecture](docs/ARCHITECTURE.md) — code-level module map and execution flow
+- [Related Work](docs/RELATED_WORK.md) — comparison with Eureka, Text2Reward, AutoResearch, and others
+- [v1.0 Release Notes](docs/v1-release-notes.html) — known limitations and roadmap
+
+---
+
+## License
+
+This project is licensed under the [MIT License](LICENSE).
+
+<p align="center">
+  <em>Whether you're an RL researcher tired of hand-tuning rewards or a newcomer who just wants to describe a behavior and get a trained policy — this is for you.</em>
+</p>
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+blog/*/.mp4 filter=lfs diff=lfs merge=lfs -text`
	`2`	`+blog/*.mp4 filter=lfs diff=lfs merge=lfs -text`