FLAPPY v0.1

FLAPPY is a research platform for continually learning web agents on MiniWoB++ tasks via BrowserGym. It compares three baselines:

Pure reinforcement learning agent (PPO + Random Network Distillation)
Coach-guided random agent (LLM coach for masks/subgoals, random policy)
Hybrid coach/learner agent (LLM coach + PPO/RND driver policy)

The platform targets PyTorch 2.x, Python 3.11, and runs headless Chrome via BrowserGym.

Repository layout

envs/        # BrowserGym wrappers and selector helpers
llm/         # GPT-5 mini client, coach prompts, episodic memory
rl/          # PPO+RND learner, context encoders, buffers
agents/      # Agent interfaces (RL baseline, hybrid coach-driven)
eval/        # Task lists, metrics, evaluation harness
scripts/     # CLI entrypoints for exploration, eval, ablations
configs/     # YAML configs for seeds and hyperparameters
tests/       # Unit tests

Quick start

Install system dependencies (Python 3.11, poetry or pip, Chrome/Chromium).
Copy .env.example to .env and set required keys, including OPENAI_API_KEY and a reachable MINIWOB_URL (see BrowserGym docs for self-hosting). The CLI scripts load this file automatically via python-dotenv.
Install Python dependencies:
```
pip install -r requirements.txt
```

Launch MiniWoB++ via the Farama miniwob-plusplus assets. Clone and serve the HTML root:

git clone [email protected]:Farama-Foundation/miniwob-plusplus.git
cd miniwob-plusplus/miniwob/html
python3 -m http.server 8890

Set MINIWOB_URL=http://127.0.0.1:8890/miniwob/ in .env (note the /miniwob/ suffix) and run dry evals:

python scripts/run_eval.py --agent coach_random --env browsergym/miniwob.click-checkboxes
python scripts/run_eval.py --agent hybrid --env browsergym/miniwob.click-checkboxes

Monitor TensorBoard logs in runs/.

Core components

BrowserGym integration via envs/browsergym_client.py, with DOM-derived action candidates from envs/selectors.py.
LLM coach in llm/coach.py with advisory prompts in llm/prompts.py, and episodic memory in llm/memory.py.
PPO+RND learner in rl/rnd_ppo_agent.py, with observation embeddings from rl/features.py and masked policy utilities in rl/policy.py.
Agent interfaces under agents/ implementing the Pure RL and Hybrid coach-driven controllers.
Evaluation harness in eval/harness.py with metrics defined in eval/metrics.py and task configs in eval/tasks.yaml.

Development setup

Format with ruff and black (configured via pyproject.toml, forthcoming).
Run unit tests with pytest.
Use configs/default.yaml to control seeds, logging dirs, and training params.

Roadmap

v0.1: MiniWoB++ support, PPO+RND, Reflexion memory.
v0.2: WebArena backend, advanced curiosity modules, Dreamer-style world models.

License

See `LICENSE`. Training & evaluation workflow

Train / explore with the hybrid agent. The example below collects 200k steps, logs progress, and writes checkpoints/metrics:
```
python scripts/run_explore.py \
  --agent hybrid \
  --env browsergym/miniwob.click-checkboxes \
  --steps 200000 \
  --save-path checkpoints/hybrid_click.pt \
  --save-every 50000 \
  --log-interval 5 \
  --log-file logs/hybrid_click.csv \
  --tensorboard logs/tb/hybrid \
  --action-trace-file logs/hybrid_click_trace.jsonl \
  --no-headless
```
Use --resume-from checkpoints/hybrid_click.pt to continue training from an existing checkpoint.

Episode metrics are emitted to stdout, the CSV (if provided), and TensorBoard summaries:
- reward – extrinsic task reward per episode (should rise toward success)
- intrinsic_reward – RND curiosity bonus (decays as the policy covers the DOM)
- success – 1 if the task succeeded, 0 otherwise
- coach_interventions – coach calls per episode (drops as the policy internalises subgoals)
Launch TensorBoard in another terminal to inspect the curves live:
```
tensorboard --logdir logs/tb/hybrid
```
If you supply --action-trace-file, each episode’s low-level actions (clicks, types, etc.) are appended as JSON lines so you can audit what the agent attempted.

Evaluate any agent (including the trained hybrid) with frozen weights:

python scripts/run_eval.py \
  --agent hybrid \
  --env browsergym/miniwob.click-checkboxes \
  --episodes 100 \
  --checkpoint checkpoints/hybrid_click.pt \
  --frozen \
  --no-headless

Compare baselines by swapping --agent for coach_random or baseline_rl.
Generate day-dreaming ideas offline and surface them during runs (opt-in):
```
python scripts/run_daydream.py \
  --memory memory.jsonl \
  --notes notes.jsonl \
  --ideas ideas.jsonl \
  --pairs 200 \
  --max-accept 25
```
Accepted hypotheses are appended to ideas.jsonl. Pass --ddl-inject and --idea-store ideas.jsonl to scripts/run_explore.py or scripts/run_eval.py to let the coach see the latest ideas.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FLAPPY v0.1

Repository layout

Quick start

Core components

Development setup

Roadmap

License

See `LICENSE`. Training & evaluation workflow

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
agents		agents
configs		configs
docs		docs
envs		envs
eval		eval
flappy		flappy
llm		llm
rl		rl
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

lydakis/flappy

Folders and files

Latest commit

History

Repository files navigation

FLAPPY v0.1

Repository layout

Quick start

Core components

Development setup

Roadmap

License

See LICENSE. Training & evaluation workflow

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

See `LICENSE`. Training & evaluation workflow

Packages