Skip to content

feat: auto-detect SWE-bench instances in harbor task preparation#951

Open
guapisolo wants to merge 2 commits intomainfrom
feat/swebench-auto-detect
Open

feat: auto-detect SWE-bench instances in harbor task preparation#951
guapisolo wants to merge 2 commits intomainfrom
feat/swebench-auto-detect

Conversation

@guapisolo
Copy link
Copy Markdown
Collaborator

@guapisolo guapisolo commented Apr 7, 2026

Summary

  • Add _is_swebench_instance() to detect SWE-bench metadata (checks for repo, version, base_commit, test_patch)
  • Add _swebench_docker_image() to derive the pre-built Docker image name from instance_id (xingyaoww registry convention: __ -> _s_)
  • When no explicit docker_image is set and metadata matches SWE-bench fields, the correct image is auto-selected
  • Add WORKDIR /testbed and RUN mkdir -p /logs to generated Dockerfile for SWE-bench compatibility:
    • WORKDIR /testbed: SWE-bench pre-built images clone the target repo into /testbed. Without this directive the container starts in /, and any relative-path operations (apply patch, run tests) fail because they can't find the repo.
    • RUN mkdir -p /logs: SWE-bench evaluation harness writes test results to /logs/. The directory does not exist in the base image, so the eval script crashes with a "No such file or directory" error if it is missing.

Test plan

  • Verify SWE-bench instances get auto-detected docker images
  • Verify non-SWE-bench instances still default to ubuntu:24.04
  • Verify instances with explicit docker_image are not overridden

Made with Cursor

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces auto-detection for SWE-bench instances, allowing the system to automatically derive the correct Docker image based on the instance ID. The feedback highlights a potential breaking change where new Dockerfile instructions are applied globally rather than conditionally for SWE-bench tasks. Additionally, it is recommended to extract hardcoded Docker image components into constants to improve maintainability.

setup_block = f"RUN {setup_cmds}\n" if setup_cmds else ""

(env_dir / "Dockerfile").write_text(f"FROM {docker_image}\n{setup_block}")
(env_dir / "Dockerfile").write_text(f"FROM {docker_image}\nWORKDIR /testbed\nRUN mkdir -p /logs\n{setup_block}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The WORKDIR /testbed and RUN mkdir -p /logs commands are added to the Dockerfile for all tasks, not just for SWE-bench instances. This could be an unintended breaking change for non-SWE-bench tasks that might not expect this working directory or the /logs directory. It would be safer to apply these Dockerfile instructions conditionally, only when a SWE-bench instance is detected.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, this would impact other cases as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be conditional setup:

is_swebench = not metadata.get("docker_image") and _is_swebench_instance(metadata)
if is_swebench:
    docker_image = _swebench_docker_image(instance_id)
    logger.debug(f"SWE-bench auto-detected: {instance_id} -> {docker_image}")
    extra_lines = "WORKDIR /testbed\nRUN mkdir -p /logs\n"
else:
    docker_image = metadata.get("docker_image", "ubuntu:24.04")
    extra_lines = ""

(env_dir / "Dockerfile").write_text(f"FROM {docker_image}\n{extra_lines}{setup_block}")

Comment on lines +50 to +60
def _swebench_docker_image(instance_id: str) -> str:
"""Derive the pre-built SWE-bench Docker image from instance_id.

Image naming convention (xingyaoww registry):
instance_id: getmoto__moto-7365
image: xingyaoww/sweb.eval.x86_64.getmoto_s_moto-7365:latest

The ``__`` in the instance_id maps to ``_s_`` in the image name.
"""
slug = instance_id.replace("__", "_s_")
return f"xingyaoww/sweb.eval.x86_64.{slug}:latest"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better maintainability, consider extracting the hardcoded parts of the Docker image name (like the registry xingyaoww, prefix sweb.eval.x86_64, and tag latest) into constants at the module level. This makes it easier to update if the naming convention changes in the future. This aligns with the repository rule to avoid hardcoding configuration values.

References
  1. Avoid hardcoding model dimensions or configuration values; derive them from configuration or input tensor shapes instead.

Copy link
Copy Markdown
Contributor

@maocheng23 maocheng23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please test before PR to make sure terminus tasks still work

setup_block = f"RUN {setup_cmds}\n" if setup_cmds else ""

(env_dir / "Dockerfile").write_text(f"FROM {docker_image}\n{setup_block}")
(env_dir / "Dockerfile").write_text(f"FROM {docker_image}\nWORKDIR /testbed\nRUN mkdir -p /logs\n{setup_block}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, this would impact other cases as well.

setup_block = f"RUN {setup_cmds}\n" if setup_cmds else ""

(env_dir / "Dockerfile").write_text(f"FROM {docker_image}\n{setup_block}")
(env_dir / "Dockerfile").write_text(f"FROM {docker_image}\nWORKDIR /testbed\nRUN mkdir -p /logs\n{setup_block}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be conditional setup:

is_swebench = not metadata.get("docker_image") and _is_swebench_instance(metadata)
if is_swebench:
    docker_image = _swebench_docker_image(instance_id)
    logger.debug(f"SWE-bench auto-detected: {instance_id} -> {docker_image}")
    extra_lines = "WORKDIR /testbed\nRUN mkdir -p /logs\n"
else:
    docker_image = metadata.get("docker_image", "ubuntu:24.04")
    extra_lines = ""

(env_dir / "Dockerfile").write_text(f"FROM {docker_image}\n{extra_lines}{setup_block}")

Add `_is_swebench_instance()` to detect SWE-bench metadata and
`_swebench_docker_image()` to derive the pre-built Docker image name
from instance_id (xingyaoww registry convention).

When no explicit `docker_image` is set and the metadata matches
SWE-bench fields, the correct image is auto-selected. Also add
`WORKDIR /testbed` and `RUN mkdir -p /logs` to the generated
Dockerfile for SWE-bench compatibility.

Made-with: Cursor
@guapisolo guapisolo force-pushed the feat/swebench-auto-detect branch from d16cd57 to e9e1dcf Compare April 10, 2026 00:24
Non-SWE-bench tasks should not have /testbed workdir and /logs
directory forced in the Dockerfile.

Addresses PR #951 review feedback.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants