feat: auto-detect SWE-bench instances in harbor task preparation by guapisolo · Pull Request #951 · radixark/miles

guapisolo · 2026-04-07T20:36:51Z

Summary

Add _is_swebench_instance() to detect SWE-bench metadata (checks for repo, version, base_commit, test_patch)
Add _swebench_docker_image() to derive the pre-built Docker image name from instance_id (xingyaoww registry convention: __ -> _s_)
When no explicit docker_image is set and metadata matches SWE-bench fields, the correct image is auto-selected
Add WORKDIR /testbed and RUN mkdir -p /logs to generated Dockerfile for SWE-bench compatibility:
- WORKDIR /testbed: SWE-bench pre-built images clone the target repo into /testbed. Without this directive the container starts in /, and any relative-path operations (apply patch, run tests) fail because they can't find the repo.
- RUN mkdir -p /logs: SWE-bench evaluation harness writes test results to /logs/. The directory does not exist in the base image, so the eval script crashes with a "No such file or directory" error if it is missing.

Test plan

Verify SWE-bench instances get auto-detected docker images
Verify non-SWE-bench instances still default to ubuntu:24.04
Verify instances with explicit docker_image are not overridden

Made with Cursor

gemini-code-assist

Code Review

This pull request introduces auto-detection for SWE-bench instances, allowing the system to automatically derive the correct Docker image based on the instance ID. The feedback highlights a potential breaking change where new Dockerfile instructions are applied globally rather than conditionally for SWE-bench tasks. Additionally, it is recommended to extract hardcoded Docker image components into constants to improve maintainability.

gemini-code-assist · 2026-04-07T20:41:40Z

examples/experimental/swe-agent-v2/prepare_harbor_tasks.py

    setup_block = f"RUN {setup_cmds}\n" if setup_cmds else ""

-    (env_dir / "Dockerfile").write_text(f"FROM {docker_image}\n{setup_block}")
+    (env_dir / "Dockerfile").write_text(f"FROM {docker_image}\nWORKDIR /testbed\nRUN mkdir -p /logs\n{setup_block}")


The WORKDIR /testbed and RUN mkdir -p /logs commands are added to the Dockerfile for all tasks, not just for SWE-bench instances. This could be an unintended breaking change for non-SWE-bench tasks that might not expect this working directory or the /logs directory. It would be safer to apply these Dockerfile instructions conditionally, only when a SWE-bench instance is detected.

Agree, this would impact other cases as well.

May be conditional setup:

is_swebench = not metadata.get("docker_image") and _is_swebench_instance(metadata) if is_swebench: docker_image = _swebench_docker_image(instance_id) logger.debug(f"SWE-bench auto-detected: {instance_id} -> {docker_image}") extra_lines = "WORKDIR /testbed\nRUN mkdir -p /logs\n" else: docker_image = metadata.get("docker_image", "ubuntu:24.04") extra_lines = "" (env_dir / "Dockerfile").write_text(f"FROM {docker_image}\n{extra_lines}{setup_block}")

gemini-code-assist · 2026-04-07T20:41:40Z

examples/experimental/swe-agent-v2/prepare_harbor_tasks.py

+def _swebench_docker_image(instance_id: str) -> str:
+    """Derive the pre-built SWE-bench Docker image from instance_id.
+
+    Image naming convention (xingyaoww registry):
+        instance_id:  getmoto__moto-7365
+        image:        xingyaoww/sweb.eval.x86_64.getmoto_s_moto-7365:latest
+
+    The ``__`` in the instance_id maps to ``_s_`` in the image name.
+    """
+    slug = instance_id.replace("__", "_s_")
+    return f"xingyaoww/sweb.eval.x86_64.{slug}:latest"


For better maintainability, consider extracting the hardcoded parts of the Docker image name (like the registry xingyaoww, prefix sweb.eval.x86_64, and tag latest) into constants at the module level. This makes it easier to update if the naming convention changes in the future. This aligns with the repository rule to avoid hardcoding configuration values.

References

Avoid hardcoding model dimensions or configuration values; derive them from configuration or input tensor shapes instead.

maocheng23

Please test before PR to make sure terminus tasks still work

maocheng23 · 2026-04-08T21:56:29Z

examples/experimental/swe-agent-v2/prepare_harbor_tasks.py

    setup_block = f"RUN {setup_cmds}\n" if setup_cmds else ""

-    (env_dir / "Dockerfile").write_text(f"FROM {docker_image}\n{setup_block}")
+    (env_dir / "Dockerfile").write_text(f"FROM {docker_image}\nWORKDIR /testbed\nRUN mkdir -p /logs\n{setup_block}")


Agree, this would impact other cases as well.

maocheng23 · 2026-04-08T21:56:58Z

examples/experimental/swe-agent-v2/prepare_harbor_tasks.py

    setup_block = f"RUN {setup_cmds}\n" if setup_cmds else ""

-    (env_dir / "Dockerfile").write_text(f"FROM {docker_image}\n{setup_block}")
+    (env_dir / "Dockerfile").write_text(f"FROM {docker_image}\nWORKDIR /testbed\nRUN mkdir -p /logs\n{setup_block}")


May be conditional setup:

is_swebench = not metadata.get("docker_image") and _is_swebench_instance(metadata) if is_swebench: docker_image = _swebench_docker_image(instance_id) logger.debug(f"SWE-bench auto-detected: {instance_id} -> {docker_image}") extra_lines = "WORKDIR /testbed\nRUN mkdir -p /logs\n" else: docker_image = metadata.get("docker_image", "ubuntu:24.04") extra_lines = "" (env_dir / "Dockerfile").write_text(f"FROM {docker_image}\n{extra_lines}{setup_block}")

Add `_is_swebench_instance()` to detect SWE-bench metadata and `_swebench_docker_image()` to derive the pre-built Docker image name from instance_id (xingyaoww registry convention). When no explicit `docker_image` is set and the metadata matches SWE-bench fields, the correct image is auto-selected. Also add `WORKDIR /testbed` and `RUN mkdir -p /logs` to the generated Dockerfile for SWE-bench compatibility. Made-with: Cursor

Non-SWE-bench tasks should not have /testbed workdir and /logs directory forced in the Dockerfile. Addresses PR #951 review feedback. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

gemini-code-assist bot reviewed Apr 7, 2026

View reviewed changes

maocheng23 requested changes Apr 8, 2026

View reviewed changes

guapisolo force-pushed the feat/swebench-auto-detect branch from d16cd57 to e9e1dcf Compare April 10, 2026 00:24

fix: only add WORKDIR/logs lines for SWE-bench instances

d6dcc06

Non-SWE-bench tasks should not have /testbed workdir and /logs directory forced in the Dockerfile. Addresses PR #951 review feedback. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: auto-detect SWE-bench instances in harbor task preparation#951

feat: auto-detect SWE-bench instances in harbor task preparation#951
guapisolo wants to merge 2 commits intomainfrom
feat/swebench-auto-detect

guapisolo commented Apr 7, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

maocheng23 Apr 8, 2026

Uh oh!

maocheng23 Apr 8, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

maocheng23 left a comment

Uh oh!

maocheng23 Apr 8, 2026

Uh oh!

maocheng23 Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

guapisolo commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

maocheng23 Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

maocheng23 Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

maocheng23 left a comment

Choose a reason for hiding this comment

Uh oh!

maocheng23 Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

maocheng23 Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

guapisolo commented Apr 7, 2026 •

edited

Loading