Skip to content

Covariate Std Err with baselines #245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -275,6 +275,15 @@ dynamic benchmarks.
between the two executions. **Note**: this is a beta feature and will need some adaptation for your
own agent.

## Variables
Here's a list of relevant env. variables that are used by AgentLab:
- `OPEAI_API_KEY` which is used by default for OpenAI LLMs.
- `AZURE_OPENAI_API_KEY`, used by default for AzureOpenAI LLMs.
- `AZURE_OPENAI_ENDPOINT` to specify your Azure endpoint.
- `OPENAI_API_VERSION` for the Azure API.
- `OPENROUTER_API_KEY` for the Openrouter API
- `AGENTLAB_EXP_ROOT`, desired path for your experiments to be stored, defaults to `~/agentlab-results`.
- `AGENTXRAY_SHARE_GRADIO`, which prompts AgentXRay to open a public tunnel on launch.

## Misc

Expand Down
5 changes: 3 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,11 @@ authors = [
{name = "Alex Lacoste", email = "[email protected]"},
{name = "Tom Marty", email = "[email protected]"},
{name = "Massimo Caccia", email = "[email protected]"},
{name = "Thibault Le Sellier de Chezelles", email = "[email protected]"}
{name = "Thibault Le Sellier de Chezelles", email = "[email protected]"},
{name = "Aman Jaiswal", email = "[email protected]"},
]
readme = "README.md"
requires-python = ">3.7"
requires-python = ">3.10"
license = {text = "Apache-2.0"}
classifiers = [
"Development Status :: 2 - Pre-Alpha",
Expand Down
2 changes: 2 additions & 0 deletions reproducibility_journal.csv
Original file line number Diff line number Diff line change
Expand Up @@ -74,3 +74,5 @@ Leo Boisvert,GenericAgent-openai_o1-mini-2024-09-12,workarena_l1,0.4.1,2025-02-0
M: src/agentlab/analyze/agent_xray.py
M: src/agentlab/llm/llm_configs.py",0.13.3,1d2d7160e5b7ec9954ecb48988f71eb56288dd29,"
Leo Boisvert,GenericAgent-anthropic_claude-3.7-sonnet,workarena_l1,0.4.1,2025-02-25_02-32-09,d4f900c2-1de1-4e4b-a3ab-495ff2675fff,0.515,0.028,0,330/330,None,Linux (#68-Ubuntu SMP Mon Oct 7 14:34:20 UTC 2024),3.12.3,1.44.0,v0.4.0,c9d2ef9648435ef1119950ecb1a0734497ccc33b,,0.13.3,1d2d7160e5b7ec9954ecb48988f71eb56288dd29,
agentlabtraces,GenericAgent-meta-llama_llama-4-maverick,workarena_l1,0.4.1,2025-04-14_17-15-56,a6dc4022-2bb7-4b46-8b37-f62c010defc1,0.27,0.024,0,330/330,None,Linux (#135-Ubuntu SMP Fri Sep 27 13:53:58 UTC 2024),3.12.7,1.39.0,v0.4.0,5eb2ecb5e5b293170230bcbed8b17fe192af214a,,0.13.3,70dac253628c476aff1af6a975f27f8563453ad2,
agentlabtraces,GenericAgent-meta-llama_llama-4-maverick,workarena_l2_agent_curriculum_eval,0.4.1,2025-04-22_15-38-44,d62fed39-caac-4ef3-92ac-b29897c69f88,0.085,0.018,1,235/235,None,Linux (#68-Ubuntu SMP Mon Oct 7 14:34:20 UTC 2024),3.12.7,1.39.0,v0.4.0,43bafbcfbe398fca39e4ffdc57b2f226d2c6d3e1,,0.13.3,70dac253628c476aff1af6a975f27f8563453ad2,
2 changes: 2 additions & 0 deletions src/agentlab/agents/generic_agent/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
AGENT_3_5,
AGENT_8B,
AGENT_CUSTOM,
AGENT_LLAMA4_17B_INSTRUCT,
AGENT_LLAMA3_70B,
AGENT_LLAMA31_70B,
RANDOM_SEARCH_AGENT,
Expand All @@ -31,6 +32,7 @@
"AGENT_4o_VISION",
"AGENT_o3_MINI",
"AGENT_o1_MINI",
"AGENT_LLAMA4_17B_INSTRUCT",
"AGENT_LLAMA3_70B",
"AGENT_LLAMA31_70B",
"AGENT_8B",
Expand Down
6 changes: 5 additions & 1 deletion src/agentlab/agents/generic_agent/agent_configs.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@

from .generic_agent import GenericAgentArgs
from .generic_agent_prompt import GenericPromptFlags
from .tmlr_config import BASE_FLAGS

FLAGS_CUSTOM = GenericPromptFlags(
obs=dp.ObsFlags(
Expand Down Expand Up @@ -296,7 +297,10 @@
chat_model_args=CHAT_MODEL_ARGS_DICT["openrouter/anthropic/claude-3.5-sonnet:beta"],
flags=FLAGS_GPT_4o_VISION,
)

AGENT_LLAMA4_17B_INSTRUCT = GenericAgentArgs(
chat_model_args=CHAT_MODEL_ARGS_DICT["openrouter/meta-llama/llama-4-maverick"],
flags=BASE_FLAGS,
)

DEFAULT_RS_FLAGS = GenericPromptFlags(
flag_group="default_rs",
Expand Down
2 changes: 1 addition & 1 deletion src/agentlab/analyze/agent_xray.py
Original file line number Diff line number Diff line change
Expand Up @@ -550,7 +550,7 @@ def tag_screenshot_with_action(screenshot: Image, action: str) -> Image:
try:
coords = action[action.index("(") + 1 : action.index(")")].split(",")
coords = [c.strip() for c in coords]
if len(coords) != 2:
if len(coords) not in [2, 3]:
raise ValueError(f"Invalid coordinate format: {coords}")
if coords[0].startswith("x="):
coords[0] = coords[0][2:]
Expand Down
Loading
Loading