feat: auto-detect SWE-bench instances in harbor task preparation#951
feat: auto-detect SWE-bench instances in harbor task preparation#951
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces auto-detection for SWE-bench instances, allowing the system to automatically derive the correct Docker image based on the instance ID. The feedback highlights a potential breaking change where new Dockerfile instructions are applied globally rather than conditionally for SWE-bench tasks. Additionally, it is recommended to extract hardcoded Docker image components into constants to improve maintainability.
| setup_block = f"RUN {setup_cmds}\n" if setup_cmds else "" | ||
|
|
||
| (env_dir / "Dockerfile").write_text(f"FROM {docker_image}\n{setup_block}") | ||
| (env_dir / "Dockerfile").write_text(f"FROM {docker_image}\nWORKDIR /testbed\nRUN mkdir -p /logs\n{setup_block}") |
There was a problem hiding this comment.
The WORKDIR /testbed and RUN mkdir -p /logs commands are added to the Dockerfile for all tasks, not just for SWE-bench instances. This could be an unintended breaking change for non-SWE-bench tasks that might not expect this working directory or the /logs directory. It would be safer to apply these Dockerfile instructions conditionally, only when a SWE-bench instance is detected.
There was a problem hiding this comment.
Agree, this would impact other cases as well.
There was a problem hiding this comment.
May be conditional setup:
is_swebench = not metadata.get("docker_image") and _is_swebench_instance(metadata)
if is_swebench:
docker_image = _swebench_docker_image(instance_id)
logger.debug(f"SWE-bench auto-detected: {instance_id} -> {docker_image}")
extra_lines = "WORKDIR /testbed\nRUN mkdir -p /logs\n"
else:
docker_image = metadata.get("docker_image", "ubuntu:24.04")
extra_lines = ""
(env_dir / "Dockerfile").write_text(f"FROM {docker_image}\n{extra_lines}{setup_block}")
| def _swebench_docker_image(instance_id: str) -> str: | ||
| """Derive the pre-built SWE-bench Docker image from instance_id. | ||
|
|
||
| Image naming convention (xingyaoww registry): | ||
| instance_id: getmoto__moto-7365 | ||
| image: xingyaoww/sweb.eval.x86_64.getmoto_s_moto-7365:latest | ||
|
|
||
| The ``__`` in the instance_id maps to ``_s_`` in the image name. | ||
| """ | ||
| slug = instance_id.replace("__", "_s_") | ||
| return f"xingyaoww/sweb.eval.x86_64.{slug}:latest" |
There was a problem hiding this comment.
For better maintainability, consider extracting the hardcoded parts of the Docker image name (like the registry xingyaoww, prefix sweb.eval.x86_64, and tag latest) into constants at the module level. This makes it easier to update if the naming convention changes in the future. This aligns with the repository rule to avoid hardcoding configuration values.
References
- Avoid hardcoding model dimensions or configuration values; derive them from configuration or input tensor shapes instead.
maocheng23
left a comment
There was a problem hiding this comment.
Please test before PR to make sure terminus tasks still work
| setup_block = f"RUN {setup_cmds}\n" if setup_cmds else "" | ||
|
|
||
| (env_dir / "Dockerfile").write_text(f"FROM {docker_image}\n{setup_block}") | ||
| (env_dir / "Dockerfile").write_text(f"FROM {docker_image}\nWORKDIR /testbed\nRUN mkdir -p /logs\n{setup_block}") |
There was a problem hiding this comment.
Agree, this would impact other cases as well.
| setup_block = f"RUN {setup_cmds}\n" if setup_cmds else "" | ||
|
|
||
| (env_dir / "Dockerfile").write_text(f"FROM {docker_image}\n{setup_block}") | ||
| (env_dir / "Dockerfile").write_text(f"FROM {docker_image}\nWORKDIR /testbed\nRUN mkdir -p /logs\n{setup_block}") |
There was a problem hiding this comment.
May be conditional setup:
is_swebench = not metadata.get("docker_image") and _is_swebench_instance(metadata)
if is_swebench:
docker_image = _swebench_docker_image(instance_id)
logger.debug(f"SWE-bench auto-detected: {instance_id} -> {docker_image}")
extra_lines = "WORKDIR /testbed\nRUN mkdir -p /logs\n"
else:
docker_image = metadata.get("docker_image", "ubuntu:24.04")
extra_lines = ""
(env_dir / "Dockerfile").write_text(f"FROM {docker_image}\n{extra_lines}{setup_block}")
Add `_is_swebench_instance()` to detect SWE-bench metadata and `_swebench_docker_image()` to derive the pre-built Docker image name from instance_id (xingyaoww registry convention). When no explicit `docker_image` is set and the metadata matches SWE-bench fields, the correct image is auto-selected. Also add `WORKDIR /testbed` and `RUN mkdir -p /logs` to the generated Dockerfile for SWE-bench compatibility. Made-with: Cursor
d16cd57 to
e9e1dcf
Compare
Non-SWE-bench tasks should not have /testbed workdir and /logs directory forced in the Dockerfile. Addresses PR #951 review feedback. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Summary
_is_swebench_instance()to detect SWE-bench metadata (checks forrepo,version,base_commit,test_patch)_swebench_docker_image()to derive the pre-built Docker image name from instance_id (xingyaoww registry convention:__->_s_)docker_imageis set and metadata matches SWE-bench fields, the correct image is auto-selectedWORKDIR /testbedandRUN mkdir -p /logsto generated Dockerfile for SWE-bench compatibility:WORKDIR /testbed: SWE-bench pre-built images clone the target repo into/testbed. Without this directive the container starts in/, and any relative-path operations (apply patch, run tests) fail because they can't find the repo.RUN mkdir -p /logs: SWE-bench evaluation harness writes test results to/logs/. The directory does not exist in the base image, so the eval script crashes with a "No such file or directory" error if it is missing.Test plan
ubuntu:24.04docker_imageare not overriddenMade with Cursor