Skip to content

Conversation

@CybotTM
Copy link
Member

@CybotTM CybotTM commented Nov 28, 2025

Summary

This PR completes the migration from go-dockerclient to the official Docker SDK (github.com/docker/docker/client), implementing a clean hexagonal architecture with domain-driven design.

Key Changes

  • Domain Models (core/domain/): SDK-agnostic types for containers, images, networks, services, events
  • Port Interfaces (core/ports/): Clean abstractions for Docker operations
  • SDK Adapter (core/adapters/docker/): Full implementation using official Docker SDK
  • Mock Adapter (core/adapters/mock/): Comprehensive mock for testing
  • Provider Layer (core/docker_sdk_provider.go): High-level provider with metrics and error handling

Performance Analysis

Operation Change Status
ContainerCreate -43% faster
ImageExists -34% faster
SystemPing -29% faster
SystemInfo -23% faster
ImageList -21% faster
ContainerInspect -16% faster
ExecRun ~same
ContainerList +163% slower 🟡 (cold path only)
RunJobSimulation +14% slower 🟡 (acceptable)

ContainerList regression is acceptable: Called every 30s (polling interval), adding only +0.13% CPU overhead.

Benefits

  • Official SDK with long-term support and security patches
  • Clean architecture with testable components
  • 72.3% test coverage
  • Better error handling with domain-specific errors

Test Plan

  • All 1,035 unit tests passing
  • Integration tests with real Docker daemon
  • Benchmark comparison complete
  • E2E scheduler lifecycle tests passing

Addresses:

…ation

Phase 1 of Docker SDK migration (issue #269):

- Add domain models: Container, Exec, Image, Event, Service, Network, System
- Add port interfaces: DockerClient with sub-services for each resource type
- Add mock adapter with callback-based testing support

This establishes the foundation for migrating from go-dockerclient
to the official Docker SDK using clean architecture principles.

The EventService interface uses context-based cancellation to fix
go-dockerclient issue #911 (panic on event channel close).
Phase 2 of Docker SDK migration (issue #269):
- ContainerServiceAdapter: Full container lifecycle management
- ExecServiceAdapter: Exec creation, start, inspect with Run helper
- EventServiceAdapter: Context-based event subscription (fixes go-dockerclient #911)
- ImageServiceAdapter: Pull, list, inspect with auth encoding
- NetworkServiceAdapter: Connect, disconnect, create, list, inspect
- SwarmServiceAdapter: Service/task management with wait helpers
- SystemServiceAdapter: Info, ping, version, disk usage
- Type conversion layer for SDK ↔ domain mapping
- Error conversion with proper domain error types
Phase 3 of Docker SDK migration - Core integration layer:
- DockerProvider interface defining standard Docker operations
- SDKDockerProvider implementing interface with official Docker SDK
- LegacyDockerProvider wrapping go-dockerclient for compatibility
- Support for container, exec, image, network, and event operations
Phase 4 of Docker SDK migration - CLI integration:
- Add GetDockerProvider() method to DockerHandler
- Initialize SDK provider alongside legacy client
- Proper cleanup of SDK provider on shutdown
- Clean integration without feature flags
Major test coverage improvements across all packages:

Docker Adapter (7.7% → 62.0%):
- Add integration_test.go with real Docker daemon tests
- Add convert_test.go for all conversion functions (0% → 100%)
- Add client_test.go with CI-aware skip/fail pattern
- Test container, exec, image, network, service, system operations

Mock Adapter (45.5% → 97.4%):
- Comprehensive callback mechanism testing
- Error injection and concurrent access patterns
- All service mocks fully tested

CLI (60.9% → 66.0%):
- Add daemon_execute_test.go, config_show_test.go
- Add docker_handler_shutdown_test.go for event watching
- Add config_jobsource_test.go, config_comprehensive_test.go
- Cover all previously 0% functions

Web Server (62.6% → 68.8%):
- Test runJobHandler, disableJobHandler, enableJobHandler
- Test Shutdown(), RegisterHealthEndpoints()
- Add edge case and error path coverage

Logging (70.6% → 100%):
- Complete coverage for all log levels and formatters
- JobLogger metrics integration tests

Metrics (92.1% → 100%):
- Edge cases, concurrent access, histogram validation

Also fixes:
- Convert bug-hiding t.Skip() to t.Fatal() in annotation tests
- Add CI-aware skipOrFailDockerUnavailable() helper
- Remove obsolete go-dockerclient references from comments
Add comprehensive benchmark tests to compare performance between
go-dockerclient and the new Docker SDK adapter:

- core/adapters/docker/benchmark_test.go: SDK adapter benchmarks
- core/docker_benchmark_test.go: go-dockerclient baseline benchmarks

Benchmark results show net positive migration:
- 8 operations faster (ContainerCreate -43%, ImageExists -34%)
- 2 operations with acceptable regression (ContainerList, RunJobSimulation)
Copilot AI review requested due to automatic review settings November 28, 2025 11:27
@CybotTM CybotTM added this pull request to the merge queue Nov 28, 2025
Merged via the queue into main with commit 8d88f72 Nov 28, 2025
12 of 14 checks passed
@CybotTM CybotTM deleted the feature/docker-sdk-migration branch November 28, 2025 11:28
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates Ofelia from the legacy go-dockerclient library to the official Docker SDK, implementing a clean hexagonal architecture with domain-driven design principles. The migration delivers significant performance improvements for most operations while maintaining backward compatibility.

Key Changes:

  • Complete replacement of go-dockerclient with official Docker SDK
  • Implementation of hexagonal architecture with domain models, port interfaces, and adapters
  • Introduction of comprehensive mock adapters for improved testing
  • Performance improvements ranging from -43% to -34% for critical operations

Reviewed changes

Copilot reviewed 88 out of 97 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
core/runservice_integration_test.go Migrated integration tests to use mock Docker client instead of test server
core/runservice.go Refactored to use DockerProvider interface with SDK-based implementation
core/runjob_simple_test.go Updated unit tests to use mock provider instead of direct Docker client
core/runjob_search_test.go Deleted obsolete test file for legacy image search functionality
core/runjob_monitor_test.go Deleted obsolete container monitoring tests (replaced by provider tests)
core/runjob_integration_test.go Migrated to mock-based testing with comprehensive behavior setup
core/runjob_annotations_test.go Converted from real Docker integration to mock-based testing
core/runjob.go Refactored to use DockerProvider with context-based operations
core/resilient_job.go Updated error handling to use string-based checks instead of Docker-specific types
core/ports/*.go New port interfaces defining clean abstractions for Docker operations
core/domain/*.go New SDK-agnostic domain models for containers, images, networks, services
core/docker_sdk_provider*.go New SDK-based provider implementation with metrics and error handling
core/docker_interface.go New DockerProvider interface definition
core/docker_benchmark_test.go New benchmark suite for performance comparison
core/execjob*.go Refactored exec job implementation to use provider pattern
core/common*.go Removed legacy Docker-specific helper functions
core/adapters/mock/*.go New comprehensive mock implementations for testing


services.OnCreate = func(ctx context.Context, spec domain.ServiceSpec, opts domain.ServiceCreateOptions) (string, error) {
serviceCounter++
serviceID := "service-" + string(rune('0'+serviceCounter))
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integer-to-rune conversion will fail for serviceCounter >= 10. When serviceCounter is 10 or greater, '0'+10 equals ':' (ASCII 58), not '10'. Use fmt.Sprintf(\"service-%d\", serviceCounter) instead.

Copilot uses AI. Check for mistakes.
Comment on lines +229 to +236
func isNotFoundError(err error) bool {
if err == nil {
return false
}
// Check for common "not found" error patterns
errStr := strings.ToLower(err.Error())
return strings.Contains(errStr, "not found") || strings.Contains(errStr, "no such") || strings.Contains(errStr, "404")
}
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The isNotFoundError function uses string matching for error detection, which is fragile. Consider using errors.Is() with domain-specific error types (e.g., domain.ErrNotFound) for more robust error handling, as shown in the domain package.

Copilot uses AI. Check for mistakes.
Comment on lines +185 to +187
if resp.Error != nil && resp.Error.Message != "" {
p.recordError("wait_container")
return resp.StatusCode, WrapContainerError("wait", containerID, errors.New(resp.Error.Message))
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition checks both resp.Error != nil and resp.Error.Message != \"\". If resp.Error is nil, accessing resp.Error.Message would panic. Consider checking only resp.Error != nil or adding proper nil guard: if resp.Error != nil { if resp.Error.Message != \"\" { ... } }.

Suggested change
if resp.Error != nil && resp.Error.Message != "" {
p.recordError("wait_container")
return resp.StatusCode, WrapContainerError("wait", containerID, errors.New(resp.Error.Message))
if resp.Error != nil {
if resp.Error.Message != "" {
p.recordError("wait_container")
return resp.StatusCode, WrapContainerError("wait", containerID, errors.New(resp.Error.Message))
}

Copilot uses AI. Check for mistakes.
CybotTM added a commit that referenced this pull request Nov 28, 2025
Address CI failures from Docker SDK migration PR #270:
- Fix errorlint issues using errors.Is for error comparison
- Add missing switch cases for exhaustive linter in TaskState
- Fix gci import formatting (standard, default, prefix sections)
- Fix misspellings: cancelled → canceled
- Fix revive line-length and import-shadowing issues
- Fix integer-to-rune conversion bug in test (Copilot review)
- Remove unused execID field from ExecJob struct
- Add linter exclusions for transitional SA1019 and wrapcheck errors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants