fix(swtbench): align docker workspace image building with SWE-bench#437
Closed
simonrosenberg wants to merge 3 commits intomainfrom
Closed
fix(swtbench): align docker workspace image building with SWE-bench#437simonrosenberg wants to merge 3 commits intomainfrom
simonrosenberg wants to merge 3 commits intomainfrom
Conversation
The SWT-bench docker workspace mode was failing because it attempted to start a container from a GHCR image tag that did not exist. The issue was that SWT-bench's prepare_workspace method used DockerDevWorkspace for image building (when SKIP_BUILD=0), while SWE-bench uses the build_image() function from build_utils. This commit aligns SWT-bench's prepare_workspace with SWE-bench's implementation: 1. Import get_official_docker_image and extract_custom_tag from swebench.build_images instead of defining local versions 2. Import build_image from utils.build_utils for local image building 3. Remove DockerDevWorkspace import and usage 4. Use build_image() when SKIP_BUILD=0 (same as SWE-bench) 5. Always use DockerWorkspace (not DockerDevWorkspace) The fix ensures that when running with --workspace docker and SKIP_BUILD=0, the image is built locally using the standard SDK build infrastructure, producing the correct image tag format. Fixes #436 Co-authored-by: openhands <openhands@all-hands.dev>
When SKIP_BUILD is not explicitly set, detect whether the agent-server image exists in the local Docker daemon. If it's missing, build it automatically instead of failing with "image not found". This gives users a zero-config experience with `--workspace docker`. Behavior: - SKIP_BUILD=1: always skip build (explicit opt-in, pre-built images) - SKIP_BUILD=0: always build (explicit opt-in, force rebuild) - SKIP_BUILD unset: auto-detect via `docker image inspect` Fixes #436 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve conflicts in benchmarks/swtbench/run_infer.py by taking main's refactored version which already incorporates the PR's auto-build feature via create_docker_workspace from image_utils. Update tests to match the new API surface (local_image_exists in image_utils, create_docker_workspace, IMAGE_TAG_PREFIX). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Collaborator
Author
|
Closing because the fix was solved by #456 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #436
The SWT-bench docker workspace mode was failing because it attempted to start a container from an image tag that doesn't exist locally or in GHCR. The root cause:
SKIP_BUILDdefaulted to1, so the image was never built, and no documentation guided users to build images first (unlike SWE-bench's README "Step 1").What changed
Auto-detect missing images (
benchmarks/swtbench/run_infer.py):When
SKIP_BUILDis not explicitly set, the runner now checks whether the agent-server image exists in the local Docker daemon viadocker image inspect. If it's missing, it builds automatically. This gives a zero-config experience:SKIP_BUILDvalue1/true/yes0/false/noStructural improvements (from the original PR, preserved):
get_official_docker_imageandextract_custom_tagfromswebench.build_imagesinstead of duplicatingbuild_image()fromutils.build_utilsfor consistent build behaviorDockerDevWorkspace— always useDockerWorkspacewith pre-built or just-built imagesTesting
tests/test_swtbench_run_infer.py(4 new for auto-detect behavior)test_auto_builds_when_skip_build_unset_and_image_missing— verifies auto-build triggerstest_skips_build_when_skip_build_unset_and_image_exists_locally— verifies no rebuild when image existstest_returns_true/false_when_image_exists/missing— unit tests for_local_docker_image_existstest_instance_timeout.py)Usage