Skip to content

CDRIVER-5971 Use Amazon ECR to obtain OCI images in EVG #2058

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jul 17, 2025

Conversation

eramongodb
Copy link
Contributor

@eramongodb eramongodb commented Jul 11, 2025

Resolves CDRIVER-5971. These changes should improve the performance and reliability of Earthly tasks on Evergreen by avoiding both DockerHub rate limits and Artifactory's limited bandwidth.

@mdb-ad and @rcsanchez97: even if not review-requested, I recommended attempting the documented instructions to authenticate with Amazon ECR and pull images on your local machine and verify if it works as expected, since these changes and instructions are meant to preserve/support the ability to run Earthly tasks locally (not Evergreen only).


DevProd plans to decommission the Artifactory instance currently providing DevProd Release Tools images (e.g. garasign-gpg, silkbomb, etc.). We temporarily used Artifactory as a workaround for DockerHub rate limits (#1794) but reverted shortly after (#1927). The new DevProd-provided Amazon ECR instance is intended to the proper long-term solution going forward.

The Amazon ECR instance is both a host for internal images (e.g. DevProd Release Tools) as well as a pull-through cache for DockerHub images. Therefore, it serves all our present OCI image acquisition needs. DockerHub images <name> and <orgname>/<name> are specified as library/<name> and <orgname>/<name> respectively. DevProd Release Tools images release-tools-container-registry-local/<name> are specified as release-infrastructure/<name>.

However, using this Amazon ECR instance requires performing IAM Identity Center authentication with the devprod-platforms-ecr AWS account. This is simple in Evergreen using ec2.assume_role, but the steps are a bit more involved on local developer machines. These steps are documented in earthly.rst, but reiterated below for locality.


Using Amazon ECR requires authenticating with the devprod-platforms-ecr AWS account. Before pulling any OCI images (via docker, podman, or earthly), you must obtain AWS short-term credentials using SSO authentication with AWS CLI v2 (note: not v1, which does not support aws sso). AWS CLI v2 can be obtained by running snap install awscli --classic (see: install docs). Initial configuration can be done by running aws configure sso or directly setting $HOME/.aws/config to the following:

[profile <profile>]
sso_session = <session_name>
sso_account_id = 901841024863
sso_role_name = ECRScopedAccess
region = us-east-1
[sso-session <session_name>]
sso_start_url = https://d-9067613a84.awsapps.com/start#/
sso_region = us-east-1
sso_registration_scopes = sso:account:access

Note

"<profile>" is the AWS account, role, etc. "<session_name>" is the short-term credentials. I recommend setting them to something like "amazon-ecr" and "<your username>" respectively.

To modify an existing configuration, run aws configure sso --profile <profile>.

When short-term credentials expire, the following error is emitted:

denied: Your authorization token has expired. Reauthenticate and try again.

The expired credentials be refreshed by running either one of:

aws sso login --profile <profile>
aws sso login --sso-session <session_name>

The AWS credentials must be passed to podman and/or docker after (re)authentication by running:

aws ecr get-login-password --profile <profile> | podman login --username AWS --password-stdin 901841024863.dkr.ecr.us-east-1.amazonaws.com

At this point, you should be able to pull images from Amazon ECR, e.g. by running podman pull 901841024863.dkr.ecr.us-east-1.amazonaws.com/release-infrastructure/garasign-gpg.


Note

Update: per feedback, implemented a $default_search_registry Earthly argument with a default value of docker.io to avoid unconditionally using Amazon ECR for images hosted on DockerHub. Amazon ECR can be used by default by passing the flag:

podman run ... +target ... --default_search_registry=901841024863.dkr.ecr.us-east-1.amazonaws.com/dockerhub

DevProd Release Tools images remain fully-qualified with the Amazon ECR registry name.


I was unable to find a method of configuring the default container image registry that preserves the use of short-names (e.g. alpine:3.20), but which doesn't involve system-level or user-level configuration, which would conflict with Evergreen task reproducibility. I am open to suggestions or a followup PR; otherwise, all images require the 901841024863.dkr.ecr.us-east-1.amazonaws.com/ prefix to be pulled from Amazon ECR. This is not necessary when manually running pull commands on local machines (such as to pre-pull images from DockerHub prior to their use by Earthly), e.g. podman pull alpine should resolve to DockerHub (by default) whereas podman pull garasign-gpg should resolve to Amazon ECR (when authenticated).

@eramongodb eramongodb requested a review from a team as a code owner July 11, 2025 17:49
Copy link
Contributor

@vector-of-bool vector-of-bool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really hate that this is necessary. AFAIK, the "proper" way to handle container image proxying is to use a container regisries configuration file that redirects image requests, and a credentials helper that runs on the system to automatically authenticate against the registry.

I asked in devprod and, to my surprise (not), Docker is the limiting factor as it still refuses to implement years-old consensus among containerization tools. I don't want to sound too accusatory, but I will anyway, as Docker (the tool) seems to consistently drag its feet on any feature that would make it easier to remove dependence on Docker (the org) infrastructure. IIUC, if we could ensure everything ran via podman, devprod would be able to deploy a registries config that could do all of this for us and we wouldn't need to do any of these changes at all.

Anyway... my biggest objection would be the hardcoding of our infrastructure specifics into our main tooling scripts (Earthfiles, specifically), as it renders them unusable outside of our org and puts an "expiration date" on the file, which will become unusable when the referenced ECR host is inevitably moved or decommissioned (I wish we could at least use a CNAME rather than 901841024863.dkr.ecr.us-east-1.amazonaws.com (??)).

I thought about it for a while and have come up with a solution that will remove the "expiration date" from Earthfile: Just parameterize on the container registry, whith docker.io/library being the default, and setting that parameter when running in EVG. See the suggested changes for reference.

EVG tasks that specify this may want to use the .arg file or EARTHLY_ARGS environment variable for setting this parameter, since it would probably be used everywhere and its easier than trying to pipe it through all the build commands in the file.

Aside: Earthly appears to have some functionality for dealing with registries and credential helpers, but I'm not sure if it's relevant or useful at the moment.

@@ -1,4 +1,4 @@
VERSION --arg-scope-and-set --pass-args 0.7
VERSION --arg-scope-and-set --pass-args --use-function-keyword 0.7
LOCALLY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a --global build arg to specify where to get containers by default:

Suggested change
LOCALLY
LOCALLY
ARG --global default_container_registry = "docker.io/library"

(I'm not set on this parameter name, but just an example.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chose the name default_search_registry in reference to podman's unqualified-search-registries setting.

Earthfile Outdated
@@ -148,7 +148,7 @@ multibuild:
# release-archive :
# Create a release archive of the source tree. (Refer to dev docs)
release-archive:
FROM alpine:3.20
FROM 901841024863.dkr.ecr.us-east-1.amazonaws.com/dockerhub/library/alpine:3.20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameterize the registry:

Suggested change
FROM 901841024863.dkr.ecr.us-east-1.amazonaws.com/dockerhub/library/alpine:3.20
FROM $default_container_registry/alpine:3.20

and so on for other updated image references

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced all DockerHub-provided image names with $default_search_registry/library/<image>. DevProd-provided release tool images remain fully-qualified with the Amazon ECR registry name. Omitted the /library suffix to support orgname-qualified image short-names.

tools/earthly.sh Outdated
@@ -36,7 +36,7 @@ if ! is-file "$EARTHLY_EXE"; then
fi

run-earthly() {
"$EARTHLY_EXE" "$@"
"$EARTHLY_EXE" --buildkit-image 901841024863.dkr.ecr.us-east-1.amazonaws.com/dockerhub/earthly/buildkitd:v0.8.3 "$@"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the suggested refactor, specifying --buildkit-image=... would need to be done in the generated tools/earthly.sh commands for the EVG tasks, rather than hardcoded in this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EVG tasks that specify this may want to use the .arg file or EARTHLY_ARGS environment variable for setting this parameter, since it would probably be used everywhere and its easier than trying to pipe it through all the build commands in the file.

Moving both into earthly_exec() as an additional set of CLI arguments to earthly.sh seems to be sufficient.

@eramongodb
Copy link
Contributor Author

@vector-of-bool Thank you for the suggestions! Amazon ECR is no longer unconditionally required to obtain images which are hosted on DockerHub. However, authentication with Amazon ECR remains a requirement to obtain DevProd Release Tools images used during a release (e.g. silkbomb and garasign-gpg), replacing prior authentication with Artifactory.

Copy link
Contributor

@vector-of-bool vector-of-bool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, LGTM

@@ -148,7 +151,7 @@ multibuild:
# release-archive :
# Create a release archive of the source tree. (Refer to dev docs)
release-archive:
FROM alpine:3.20
FROM $default_search_registry/library/alpine:3.20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the trailing /library into the parameter, as that's not universal to container registries (e.g. quay.io doesn't use the /library suffix, and quay.io is a reasonable alternative default).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Embedding the /library suffix in the variable rules out its use with image names containing an orgname, e.g. if the /library suffix is embedded:

  • $default_search_registry/ubuntu:24.04 -> docker.io/library/ubuntu:24.04 (correct)
  • $default_search_registry/earthly/buildkitd -> docker.io/library/earthly/buildkitd (incorrect)
  • $default_search_registry/mongodb/mongo-cxx-driver -> docker.io/library/mongodb/mongo-cxx-driver (incorrect)

I was expecting use of non-DockerHub-provided images to use fully-qualified names which include the registry name, as currently done for DevProd Release Tools images, e.g. quay.io/centos/centos:stream10 rather than centos:stream10, which is also consistent with Red Hat recommendations:

Users pulling without specifying the full path, commonly referred to as a short name, leaves them open to a particularly nasty form of attack, called image squatting. [...] Red Hat recommends that users always use fully qualified names when referring to container images in any context. [...] Consider removing all registries from the search list so that short names don’t resolve.

I do not think the ability to set $default_search_registry to anything but the default (DockerHub) or Amazon ECR (in CI) is really in scope for these changes, which are primarily to avoid Artifactory without hitting DockerHub rate limits again. Let me know if you'd like the suffix to be embedded regardless.

@@ -136,6 +145,34 @@ def suffix(self) -> str:
return _SEPARATOR.join(f"{k}={v}" for k, v in self._asdict().items())


# Use DevProd-provided Amazon ECR instance to obtain earthly-buildkitd in advance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears unrelated to this class's purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, outdated comment from old changes. Fixed.

@eramongodb eramongodb changed the title CDRIVER-5971 Use Amazon ECR to obtain all OCI images CDRIVER-5971 Use Amazon ECR to obtain OCI images in EVG Jul 16, 2025
Copy link
Collaborator

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for investigating and making these changes. I expect this will help the reliability of Evergreen tasks. LGTM with similar changes applied to the SBOM scripts.

above.

.. seealso:: `"DevProd Platforms Container Registry"
<https://docs.devprod.prod.corp.mongodb.com/devprod-platforms-ecr>`_ and `"Configuring IAM Identity Center authentication with the AWS CLI" <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-sso.html>_`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<https://docs.devprod.prod.corp.mongodb.com/devprod-platforms-ecr>`_ and `"Configuring IAM Identity Center authentication with the AWS CLI" <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-sso.html>_`.
<https://docs.devprod.prod.corp.mongodb.com/devprod-platforms-ecr>`_ and `"Configuring IAM Identity Center authentication with the AWS CLI" <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-sso.html>`_.

@@ -136,6 +145,34 @@ def suffix(self) -> str:
return _SEPARATOR.join(f"{k}={v}" for k, v in self._asdict().items())


# Authenticate with DevProd-provided Amazon ECR instance to use as pull-through cache for DockerHub.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest applying similar changes to sbom.py / sbom.sh since sbom.sh still refers to an artifactory image.

@eramongodb eramongodb merged commit 5d362cc into mongodb:master Jul 17, 2025
1 check was pending
@eramongodb eramongodb deleted the cdriver-5971 branch July 17, 2025 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants