FLAPPY is a research platform for continually learning web agents on MiniWoB++ tasks via BrowserGym. It compares three baselines:
- Pure reinforcement learning agent (PPO + Random Network Distillation)
- Coach-guided random agent (LLM coach for masks/subgoals, random policy)
- Hybrid coach/learner agent (LLM coach + PPO/RND driver policy)
The platform targets PyTorch 2.x, Python 3.11, and runs headless Chrome via BrowserGym.
envs/ # BrowserGym wrappers and selector helpers
llm/ # GPT-5 mini client, coach prompts, episodic memory
rl/ # PPO+RND learner, context encoders, buffers
agents/ # Agent interfaces (RL baseline, hybrid coach-driven)
eval/ # Task lists, metrics, evaluation harness
scripts/ # CLI entrypoints for exploration, eval, ablations
configs/ # YAML configs for seeds and hyperparameters
tests/ # Unit tests
-
Install system dependencies (Python 3.11, poetry or pip, Chrome/Chromium).
-
Copy
.env.exampleto.envand set required keys, includingOPENAI_API_KEYand a reachableMINIWOB_URL(see BrowserGym docs for self-hosting). The CLI scripts load this file automatically viapython-dotenv. -
Install Python dependencies:
pip install -r requirements.txt
-
Launch MiniWoB++ via the Farama miniwob-plusplus assets. Clone and serve the HTML root:
git clone [email protected]:Farama-Foundation/miniwob-plusplus.git cd miniwob-plusplus/miniwob/html python3 -m http.server 8890
Set
MINIWOB_URL=http://127.0.0.1:8890/miniwob/in.env(note the/miniwob/suffix) and run dry evals:python scripts/run_eval.py --agent coach_random --env browsergym/miniwob.click-checkboxes python scripts/run_eval.py --agent hybrid --env browsergym/miniwob.click-checkboxes
-
Monitor TensorBoard logs in
runs/.
- BrowserGym integration via
envs/browsergym_client.py, with DOM-derived action candidates fromenvs/selectors.py. - LLM coach in
llm/coach.pywith advisory prompts inllm/prompts.py, and episodic memory inllm/memory.py. - PPO+RND learner in
rl/rnd_ppo_agent.py, with observation embeddings fromrl/features.pyand masked policy utilities inrl/policy.py. - Agent interfaces under
agents/implementing the Pure RL and Hybrid coach-driven controllers. - Evaluation harness in
eval/harness.pywith metrics defined ineval/metrics.pyand task configs ineval/tasks.yaml.
- Format with
ruffandblack(configured viapyproject.toml, forthcoming). - Run unit tests with
pytest. - Use
configs/default.yamlto control seeds, logging dirs, and training params.
- v0.1: MiniWoB++ support, PPO+RND, Reflexion memory.
- v0.2: WebArena backend, advanced curiosity modules, Dreamer-style world models.
-
Train / explore with the hybrid agent. The example below collects 200k steps, logs progress, and writes checkpoints/metrics:
python scripts/run_explore.py \ --agent hybrid \ --env browsergym/miniwob.click-checkboxes \ --steps 200000 \ --save-path checkpoints/hybrid_click.pt \ --save-every 50000 \ --log-interval 5 \ --log-file logs/hybrid_click.csv \ --tensorboard logs/tb/hybrid \ --action-trace-file logs/hybrid_click_trace.jsonl \ --no-headless
Use
--resume-from checkpoints/hybrid_click.ptto continue training from an existing checkpoint.Episode metrics are emitted to stdout, the CSV (if provided), and TensorBoard summaries:
reward– extrinsic task reward per episode (should rise toward success)intrinsic_reward– RND curiosity bonus (decays as the policy covers the DOM)success– 1 if the task succeeded, 0 otherwisecoach_interventions– coach calls per episode (drops as the policy internalises subgoals)
Launch TensorBoard in another terminal to inspect the curves live:
tensorboard --logdir logs/tb/hybrid
If you supply
--action-trace-file, each episode’s low-level actions (clicks, types, etc.) are appended as JSON lines so you can audit what the agent attempted. -
Evaluate any agent (including the trained hybrid) with frozen weights:
python scripts/run_eval.py \ --agent hybrid \ --env browsergym/miniwob.click-checkboxes \ --episodes 100 \ --checkpoint checkpoints/hybrid_click.pt \ --frozen \ --no-headless
-
Compare baselines by swapping
--agentforcoach_randomorbaseline_rl. -
Generate day-dreaming ideas offline and surface them during runs (opt-in):
python scripts/run_daydream.py \ --memory memory.jsonl \ --notes notes.jsonl \ --ideas ideas.jsonl \ --pairs 200 \ --max-accept 25
Accepted hypotheses are appended to
ideas.jsonl. Pass--ddl-injectand--idea-store ideas.jsonltoscripts/run_explore.pyorscripts/run_eval.pyto let the coach see the latest ideas.