Skip to content

Commit 1961a4e

Browse files
author
suyoung.lee
committed
Prompt2Policy — open-source release
Clean export from GitLab dev (8272d78). Internal CI, agent configs, and dev tooling excluded per issue #417.
0 parents  commit 1961a4e

433 files changed

Lines changed: 103455 additions & 0 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.env.example

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# RLoop — Environment Variables
2+
# Copy to .env and fill in your values: cp .env.example .env
3+
4+
# Optional: Anthropic API key for LLM reward generation (Claude provider)
5+
ANTHROPIC_API_KEY=sk-ant-api03-YOUR_KEY_HERE
6+
7+
# Required: Google Gemini API key — default LLM agent + VLM video judgment
8+
GEMINI_API_KEY=YOUR_GEMINI_KEY_HERE
9+
10+
# Optional: OpenAI API key (for GPT/o-series models)
11+
# OPENAI_API_KEY=sk-YOUR_KEY_HERE
12+
13+
# vLLM server settings (for native video judge)
14+
# VLLM_HOST=localhost
15+
# VLLM_MODEL=Qwen/Qwen3.5-27B
16+
# VLLM_PORT=8100
17+
18+
# MuJoCo rendering backend
19+
# Linux (headless GPU): MUJOCO_GL=egl
20+
# macOS: do NOT set (uses native CGL)
21+
# MUJOCO_GL=egl
22+
23+
# Optional: Frontend API base URL (default: http://localhost:8000)
24+
# Set this in frontend/.env.local if running frontend on a different host
25+
# NEXT_PUBLIC_API_URL=http://localhost:8000
26+
27+
# Optional: Claude Code analyst skill — dashboard URL
28+
# P2P_LOCAL_URL=http://localhost:8000
29+
30+
# CI release script — benchmark dashboard URLs
31+
# DASHBOARD_URL=http://localhost:3000/benchmark
32+
# SCHEDULER_URL=http://localhost:3000/scheduler
33+
34+
# Optional: Video Judge Bench — human labeling server for eval video scoring
35+
# When set, a scoring UI appears on evaluation videos in the frontend.
36+
# LABELING_SERVER_URL=http://localhost:8765
37+
# LABELING_ANNOTATOR=your_username

.gitattributes

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
blog/**/*.mp4 filter=lfs diff=lfs merge=lfs -text
2+
blog/*.mp4 filter=lfs diff=lfs merge=lfs -text

.github/workflows/pages.yml

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
name: Deploy Project Page
2+
3+
on:
4+
push:
5+
branches: [main]
6+
paths: [blog/**]
7+
workflow_dispatch:
8+
9+
permissions:
10+
contents: read
11+
pages: write
12+
id-token: write
13+
14+
concurrency:
15+
group: pages
16+
cancel-in-progress: true
17+
18+
jobs:
19+
deploy:
20+
runs-on: ubuntu-latest
21+
environment:
22+
name: github-pages
23+
url: ${{ steps.deployment.outputs.page_url }}
24+
steps:
25+
- uses: actions/checkout@v4
26+
with:
27+
lfs: true
28+
- uses: actions/configure-pages@v5
29+
- uses: actions/upload-pages-artifact@v3
30+
with:
31+
path: blog
32+
- id: deployment
33+
uses: actions/deploy-pages@v4

.gitignore

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
.env
2+
.claude/*
3+
!.claude/rules/
4+
.claude/rules/*
5+
!.claude/rules/contracts.md
6+
!.claude/rules/quality.md
7+
!.claude/hookify.*.md
8+
!.claude/hookify.*.local.md
9+
!.claude/settings.json
10+
!.claude/skills/
11+
.claude/skills/*
12+
!.claude/skills/rloop-analyst/
13+
__pycache__/
14+
*.pyc
15+
*.egg-info/
16+
dist/
17+
build/
18+
.venv/
19+
/runs/
20+
*.mp4
21+
!blog/**/*.mp4
22+
!blog/*.mp4
23+
.ruff_cache/
24+
node_modules/
25+
.next/
26+
frontend/.next/
27+
frontend/node_modules/
28+
logs/
29+
*.log
30+
docs/cpu_stress_test/
31+
humanoid_walk/
32+
.rsync-exclude
33+
exclude/
34+
IsaacLab/

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2026 Prompt2Policy Contributors
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 242 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,242 @@
1+
<p align="center">
2+
<!-- TODO: logo image -->
3+
<h1 align="center">Prompt2Policy</h1>
4+
</p>
5+
6+
<p align="center">
7+
<strong>Describe a behavior in a prompt. Get a trained policy.</strong><br/>
8+
LLM-powered reward engineering that writes, trains, judges, and iterates — until your RL agent does what you asked.
9+
</p>
10+
11+
<p align="center">
12+
<a href="https://krafton-ai.github.io/Prompt2Policy"><img src="https://img.shields.io/badge/%F0%9F%8C%90%20Project-Page-4285F4?style=for-the-badge" alt="Project Page"/></a>
13+
</p>
14+
15+
<p align="center">
16+
<a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.11+-blue.svg?style=flat-square" alt="Python 3.11+"/></a>
17+
<a href="https://github.com/astral-sh/ruff"><img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json&style=flat-square" alt="Ruff"/></a>
18+
<a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg?style=flat-square" alt="License"/></a>
19+
</p>
20+
21+
<div align="center">
22+
<img src="docs/demo_zoom_reveal.gif" alt="Prompt2Policy showcase: diverse learned behaviors from natural language intents" width="960"/>
23+
</div>
24+
25+
## What It Does
26+
27+
| | Feature | Description |
28+
|---|---|---|
29+
| 🎯 | **Intent to Reward** | Describe behavior in natural language — LLM writes the reward function |
30+
| 🏋️ | **Parallel Training** | PPO with multiple seeds and configs via Stable-Baselines3 |
31+
| 👁️ | **Dual Judgment** | Code-based judge + VLM video judge evaluate trained policies |
32+
| 🔄 | **Auto-Revision** | LLM diagnoses failures and rewrites reward + tunes hyperparameters |
33+
| 🤖 | **Multi-LLM** | Claude, Gemini, GPT — any model with tool use support |
34+
| 🦾 | **MuJoCo + IsaacLab** | 10 MuJoCo envs built-in, 90 IsaacLab envs optional |
35+
| 📊 | **Dashboard** | Real-time web UI for sessions, training curves, rollout videos |
36+
37+
---
38+
39+
## Quick Start
40+
41+
### Install
42+
43+
```bash
44+
git clone https://github.com/krafton-ai/Prompt2Policy.git
45+
cd Prompt2Policy
46+
uv sync --all-extras
47+
```
48+
49+
<details>
50+
<summary>Don't have uv?</summary>
51+
52+
```bash
53+
curl -LsSf https://astral.sh/uv/install.sh | sh
54+
```
55+
56+
See [uv installation guide](https://docs.astral.sh/uv/getting-started/installation/) for other platforms.
57+
58+
</details>
59+
60+
### Configure
61+
62+
```bash
63+
cp .env.example .env
64+
# Edit .env — set GEMINI_API_KEY (required), plus ANTHROPIC_API_KEY or OPENAI_API_KEY (optional)
65+
```
66+
67+
### Run (Dashboard)
68+
69+
```bash
70+
uv run uvicorn p2p.api.app:app --host 0.0.0.0 --port 8000 --reload --reload-dir src # Terminal 1
71+
cd frontend && npm install && npm run dev # Terminal 2
72+
```
73+
74+
Open **http://localhost:3000**, enter an intent like *"do a backflip"*, and hit run. See the [dashboard tutorial](https://krafton-ai.github.io/Prompt2Policy/) for a video walkthrough. For CLI usage, see [CLI Reference](#cli-reference).
75+
76+
### Verify
77+
78+
```bash
79+
uv run pytest tests/ -v
80+
```
81+
82+
---
83+
84+
## Pipeline
85+
86+
<!-- TODO: pre-rendered SVG pipeline diagram -->
87+
88+
```
89+
User Intent → Intent Elicitor → Reward Author + Judge Author
90+
91+
Code Review
92+
93+
PPO Training (seeds × configs)
94+
95+
Code Judge ∥ VLM Judge
96+
97+
Synthesizer
98+
↓ ↓
99+
[pass] → Done
100+
[fail] → Revise Agent → next iteration
101+
```
102+
103+
---
104+
105+
## Supported Environments
106+
107+
<details>
108+
<summary><strong>MuJoCo (built-in)</strong> — 10 environments: all Gymnasium MuJoCo v5 locomotion</summary>
109+
110+
| Environment | DOF | Example Intents |
111+
|-------------|-----|-----------------|
112+
| **HalfCheetah-v5** | 6 | *"run forward fast"*, *"do a backflip"* |
113+
| **Ant-v5** | 8 | *"walk in a circle"*, *"stand on rear legs"* |
114+
| **Hopper-v5** | 3 | *"hop forward"*, *"jump as high as possible"* |
115+
| **Walker2d-v5** | 6 | *"walk forward naturally"*, *"high knee sprinting"* |
116+
| **Humanoid-v5** | 17 | *"walk with natural gait"*, *"perform a deep squat"* |
117+
| **HumanoidStandup-v5** | 17 | *"stand up from the ground"* |
118+
| **Swimmer-v5** | 2 | *"swim forward"*, *"swim in a zigzag"* |
119+
| **Reacher-v5** | 2 | *"reach the target"* |
120+
| **InvertedPendulum-v5** | 1 | *"keep the pole balanced"* |
121+
| **InvertedDoublePendulum-v5** | 1 | *"balance both poles"* |
122+
123+
</details>
124+
125+
<details>
126+
<summary><strong>IsaacLab (optional)</strong> — 90 environments: locomotion, manipulation, dexterous</summary>
127+
128+
[NVIDIA IsaacLab](https://github.com/isaac-sim/IsaacLab) environments are supported when Isaac Sim is installed.
129+
130+
| Category | Count | Examples |
131+
|----------|-------|---------|
132+
| Manipulation (Lift/Stack) | 21 | Franka lift/stack, Galbot, UR10 |
133+
| Locomotion (Flat) | 12 | ANYmal B/C/D, Unitree Go1/Go2/A1, Cassie, Spot, H1, G1, Digit |
134+
| Locomotion (Rough) | 11 | Same robots, rough terrain |
135+
| Manipulation (Reach) | 8 | Franka, UR10, OpenArm |
136+
| Humanoid | 8 | Humanoid locomotion variants |
137+
| Assembly | 8 | AutoMate, Factory, Forge |
138+
| Dexterous | 7 | Shadow hand, Allegro |
139+
| Classic Control | 5 | Cartpole, Ant |
140+
| Pick & Place | 4 | Franka, UR10 |
141+
| Other | 6 | Quadcopter, Navigation |
142+
143+
**Requirements**: NVIDIA GPU with CUDA 12+, driver 525+, Ubuntu 22.04+.
144+
145+
</details>
146+
147+
---
148+
149+
## Configuration
150+
151+
| Variable | Required | Default | Description |
152+
|----------|----------|---------|-------------|
153+
| `GEMINI_API_KEY` | **Yes** || Default LLM agent + VLM video judgment |
154+
| `ANTHROPIC_API_KEY` | No || Required when using Claude models as LLM |
155+
| `OPENAI_API_KEY` | No || Required when using GPT models as LLM |
156+
| `MUJOCO_GL` | No | *(unset)* | Set to `egl` on headless Linux |
157+
158+
<details>
159+
<summary>Advanced settings</summary>
160+
161+
| Variable | Default | Description |
162+
|----------|---------|-------------|
163+
| `VLLM_HOST` | `localhost` | vLLM server host (local VLM inference) |
164+
| `VLLM_PORT` | `8100` | vLLM server port |
165+
| `VLLM_MODEL` | `Qwen/Qwen3.5-27B` | vLLM model name |
166+
167+
</details>
168+
169+
---
170+
171+
## CLI Reference
172+
173+
### E2E Loop
174+
175+
```bash
176+
uv run python -m p2p.session.run_session \
177+
--session-id my_session \
178+
--prompt "do a backflip" \
179+
--loop-config '{"train": {"env_id": "HalfCheetah-v5", "total_timesteps": 1000000}, "max_iterations": 5, "pass_threshold": 0.7, "hp_tuning": true}'
180+
```
181+
182+
### Benchmark
183+
184+
```bash
185+
uv run python -m p2p.benchmark.benchmark_cli \
186+
--csv benchmark/test_cases_exotic_ant_halfcheetah_humanoid.csv \
187+
--max-iterations 5 \
188+
--total-timesteps 1000000 \
189+
--max-parallel 4 \
190+
--num-configs 3
191+
```
192+
193+
See the [User Guide](docs/GUIDE.md) for full flag reference and API examples.
194+
195+
---
196+
197+
## Hardware
198+
199+
| | MuJoCo (default) | IsaacLab |
200+
|---|-------------------|----------|
201+
| **CPU** | 8+ cores (16+ recommended for parallel seeds) | 8+ cores |
202+
| **RAM** | 16 GB (32+ recommended) | 32+ GB |
203+
| **GPU** | Optional — CUDA GPU for EGL rendering | Required — 24+ GB VRAM (varies by task) |
204+
| **Disk** | 20 GB | 100+ GB |
205+
206+
MuJoCo training is CPU-bound (PPO with MLP policy). A GPU accelerates headless rendering (EGL) and local VLM inference but is not required. IsaacLab environments are GPU-vectorized and need at least 24 GB VRAM.
207+
208+
---
209+
210+
## Development
211+
212+
```bash
213+
uv run ruff check src/ tests/ # lint
214+
uv run ruff format --check src/ tests/ # format
215+
uv run pytest tests/ -v # test
216+
cd frontend && npm run lint # frontend lint
217+
```
218+
219+
## Tech Stack
220+
221+
- **Training** — Gymnasium, MuJoCo, Stable-Baselines3, IsaacLab (optional)
222+
- **LLM/VLM** — Anthropic Claude, Google Gemini, OpenAI GPT, vLLM
223+
- **Backend** — FastAPI, uvicorn
224+
- **Frontend** — Next.js, React, Tailwind CSS, Recharts, KaTeX
225+
- **Dev** — uv, ruff, pytest
226+
227+
## Documentation
228+
229+
- [User Guide](docs/GUIDE.md) — detailed setup, usage, intent tips, LLM models, IsaacLab installation
230+
- [Architecture](docs/ARCHITECTURE.md) — code-level module map and execution flow
231+
- [Related Work](docs/RELATED_WORK.md) — comparison with Eureka, Text2Reward, AutoResearch, and others
232+
- [v1.0 Release Notes](docs/v1-release-notes.html) — known limitations and roadmap
233+
234+
---
235+
236+
## License
237+
238+
This project is licensed under the [MIT License](LICENSE).
239+
240+
<p align="center">
241+
<em>Whether you're an RL researcher tired of hand-tuning rewards or a newcomer who just wants to describe a behavior and get a trained policy — this is for you.</em>
242+
</p>

0 commit comments

Comments
 (0)