-
Notifications
You must be signed in to change notification settings - Fork 455
CDRIVER-5971 Use Amazon ECR to obtain OCI images in EVG #2058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really hate that this is necessary. AFAIK, the "proper" way to handle container image proxying is to use a container regisries configuration file that redirects image requests, and a credentials helper that runs on the system to automatically authenticate against the registry.
I asked in devprod and, to my surprise (not), Docker is the limiting factor as it still refuses to implement years-old consensus among containerization tools. I don't want to sound too accusatory, but I will anyway, as Docker (the tool) seems to consistently drag its feet on any feature that would make it easier to remove dependence on Docker (the org) infrastructure. IIUC, if we could ensure everything ran via podman, devprod would be able to deploy a registries config that could do all of this for us and we wouldn't need to do any of these changes at all.
Anyway... my biggest objection would be the hardcoding of our infrastructure specifics into our main tooling scripts (Earthfile
s, specifically), as it renders them unusable outside of our org and puts an "expiration date" on the file, which will become unusable when the referenced ECR host is inevitably moved or decommissioned (I wish we could at least use a CNAME rather than 901841024863.dkr.ecr.us-east-1.amazonaws.com
(??)).
I thought about it for a while and have come up with a solution that will remove the "expiration date" from Earthfile
: Just parameterize on the container registry, whith docker.io/library
being the default, and setting that parameter when running in EVG. See the suggested changes for reference.
EVG tasks that specify this may want to use the .arg
file or EARTHLY_ARGS
environment variable for setting this parameter, since it would probably be used everywhere and its easier than trying to pipe it through all the build commands in the file.
Aside: Earthly appears to have some functionality for dealing with registries and credential helpers, but I'm not sure if it's relevant or useful at the moment.
@@ -1,4 +1,4 @@ | |||
VERSION --arg-scope-and-set --pass-args 0.7 | |||
VERSION --arg-scope-and-set --pass-args --use-function-keyword 0.7 | |||
LOCALLY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a --global
build arg to specify where to get containers by default:
LOCALLY | |
LOCALLY | |
ARG --global default_container_registry = "docker.io/library" |
(I'm not set on this parameter name, but just an example.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chose the name default_search_registry
in reference to podman's unqualified-search-registries
setting.
Earthfile
Outdated
@@ -148,7 +148,7 @@ multibuild: | |||
# release-archive : | |||
# Create a release archive of the source tree. (Refer to dev docs) | |||
release-archive: | |||
FROM alpine:3.20 | |||
FROM 901841024863.dkr.ecr.us-east-1.amazonaws.com/dockerhub/library/alpine:3.20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameterize the registry:
FROM 901841024863.dkr.ecr.us-east-1.amazonaws.com/dockerhub/library/alpine:3.20 | |
FROM $default_container_registry/alpine:3.20 |
and so on for other updated image references
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced all DockerHub-provided image names with $default_search_registry/library/<image>
. DevProd-provided release tool images remain fully-qualified with the Amazon ECR registry name. Omitted the /library
suffix to support orgname-qualified image short-names.
tools/earthly.sh
Outdated
@@ -36,7 +36,7 @@ if ! is-file "$EARTHLY_EXE"; then | |||
fi | |||
|
|||
run-earthly() { | |||
"$EARTHLY_EXE" "$@" | |||
"$EARTHLY_EXE" --buildkit-image 901841024863.dkr.ecr.us-east-1.amazonaws.com/dockerhub/earthly/buildkitd:v0.8.3 "$@" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the suggested refactor, specifying --buildkit-image=...
would need to be done in the generated tools/earthly.sh
commands for the EVG tasks, rather than hardcoded in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EVG tasks that specify this may want to use the
.arg
file orEARTHLY_ARGS
environment variable for setting this parameter, since it would probably be used everywhere and its easier than trying to pipe it through all the build commands in the file.
Moving both into earthly_exec()
as an additional set of CLI arguments to earthly.sh
seems to be sufficient.
@vector-of-bool Thank you for the suggestions! Amazon ECR is no longer unconditionally required to obtain images which are hosted on DockerHub. However, authentication with Amazon ECR remains a requirement to obtain DevProd Release Tools images used during a release (e.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, LGTM
@@ -148,7 +151,7 @@ multibuild: | |||
# release-archive : | |||
# Create a release archive of the source tree. (Refer to dev docs) | |||
release-archive: | |||
FROM alpine:3.20 | |||
FROM $default_search_registry/library/alpine:3.20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Embedding the /library
suffix in the variable rules out its use with image names containing an orgname, e.g. if the /library
suffix is embedded:
$default_search_registry/ubuntu:24.04
->docker.io/library/ubuntu:24.04
(correct)$default_search_registry/earthly/buildkitd
->docker.io/library/earthly/buildkitd
(incorrect)$default_search_registry/mongodb/mongo-cxx-driver
->docker.io/library/mongodb/mongo-cxx-driver
(incorrect)
I was expecting use of non-DockerHub-provided images to use fully-qualified names which include the registry name, as currently done for DevProd Release Tools images, e.g. quay.io/centos/centos:stream10
rather than centos:stream10
, which is also consistent with Red Hat recommendations:
Users pulling without specifying the full path, commonly referred to as a short name, leaves them open to a particularly nasty form of attack, called image squatting. [...] Red Hat recommends that users always use fully qualified names when referring to container images in any context. [...] Consider removing all registries from the search list so that short names don’t resolve.
I do not think the ability to set $default_search_registry
to anything but the default (DockerHub) or Amazon ECR (in CI) is really in scope for these changes, which are primarily to avoid Artifactory without hitting DockerHub rate limits again. Let me know if you'd like the suffix to be embedded regardless.
@@ -136,6 +145,34 @@ def suffix(self) -> str: | |||
return _SEPARATOR.join(f"{k}={v}" for k, v in self._asdict().items()) | |||
|
|||
|
|||
# Use DevProd-provided Amazon ECR instance to obtain earthly-buildkitd in advance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment appears unrelated to this class's purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, outdated comment from old changes. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for investigating and making these changes. I expect this will help the reliability of Evergreen tasks. LGTM with similar changes applied to the SBOM scripts.
docs/dev/earthly.rst
Outdated
above. | ||
|
||
.. seealso:: `"DevProd Platforms Container Registry" | ||
<https://docs.devprod.prod.corp.mongodb.com/devprod-platforms-ecr>`_ and `"Configuring IAM Identity Center authentication with the AWS CLI" <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-sso.html>_`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<https://docs.devprod.prod.corp.mongodb.com/devprod-platforms-ecr>`_ and `"Configuring IAM Identity Center authentication with the AWS CLI" <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-sso.html>_`. | |
<https://docs.devprod.prod.corp.mongodb.com/devprod-platforms-ecr>`_ and `"Configuring IAM Identity Center authentication with the AWS CLI" <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-sso.html>`_. |
@@ -136,6 +145,34 @@ def suffix(self) -> str: | |||
return _SEPARATOR.join(f"{k}={v}" for k, v in self._asdict().items()) | |||
|
|||
|
|||
# Authenticate with DevProd-provided Amazon ECR instance to use as pull-through cache for DockerHub. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest applying similar changes to sbom.py
/ sbom.sh
since sbom.sh
still refers to an artifactory image.
Resolves CDRIVER-5971. These changes should improve the performance and reliability of Earthly tasks on Evergreen by avoiding both DockerHub rate limits and Artifactory's limited bandwidth.
@mdb-ad and @rcsanchez97: even if not review-requested, I recommended attempting the documented instructions to authenticate with Amazon ECR and pull images on your local machine and verify if it works as expected, since these changes and instructions are meant to preserve/support the ability to run Earthly tasks locally (not Evergreen only).
DevProd plans to decommission the Artifactory instance currently providing DevProd Release Tools images (e.g. garasign-gpg, silkbomb, etc.). We temporarily used Artifactory as a workaround for DockerHub rate limits (#1794) but reverted shortly after (#1927). The new DevProd-provided Amazon ECR instance is intended to the proper long-term solution going forward.
The Amazon ECR instance is both a host for internal images (e.g. DevProd Release Tools) as well as a pull-through cache for DockerHub images. Therefore, it serves all our present OCI image acquisition needs. DockerHub images
<name>
and<orgname>/<name>
are specified aslibrary/<name>
and<orgname>/<name>
respectively. DevProd Release Tools imagesrelease-tools-container-registry-local/<name>
are specified asrelease-infrastructure/<name>
.However, using this Amazon ECR instance requires performing IAM Identity Center authentication with the
devprod-platforms-ecr
AWS account. This is simple in Evergreen usingec2.assume_role
, but the steps are a bit more involved on local developer machines. These steps are documented inearthly.rst
, but reiterated below for locality.Using Amazon ECR requires authenticating with the
devprod-platforms-ecr
AWS account. Before pulling any OCI images (viadocker
,podman
, orearthly
), you must obtain AWS short-term credentials using SSO authentication with AWS CLI v2 (note: not v1, which does not supportaws sso
). AWS CLI v2 can be obtained by runningsnap install awscli --classic
(see: install docs). Initial configuration can be done by runningaws configure sso
or directly setting$HOME/.aws/config
to the following:Note
"<profile>" is the AWS account, role, etc. "<session_name>" is the short-term credentials. I recommend setting them to something like "amazon-ecr" and "<your username>" respectively.
To modify an existing configuration, run
aws configure sso --profile <profile>
.When short-term credentials expire, the following error is emitted:
The expired credentials be refreshed by running either one of:
The AWS credentials must be passed to
podman
and/ordocker
after (re)authentication by running:At this point, you should be able to pull images from Amazon ECR, e.g. by running
podman pull 901841024863.dkr.ecr.us-east-1.amazonaws.com/release-infrastructure/garasign-gpg
.Note
Update: per feedback, implemented a
$default_search_registry
Earthly argument with a default value ofdocker.io
to avoid unconditionally using Amazon ECR for images hosted on DockerHub. Amazon ECR can be used by default by passing the flag:DevProd Release Tools images remain fully-qualified with the Amazon ECR registry name.
I was unable to find a method of configuring the default container image registry that preserves the use of short-names (e.g.
alpine:3.20
), but which doesn't involve system-level or user-level configuration, which would conflict with Evergreen task reproducibility. I am open to suggestions or a followup PR; otherwise, all images require the901841024863.dkr.ecr.us-east-1.amazonaws.com/
prefix to be pulled from Amazon ECR. This is not necessary when manually runningpull
commands on local machines (such as to pre-pull images from DockerHub prior to their use by Earthly), e.g.podman pull alpine
should resolve to DockerHub (by default) whereaspodman pull garasign-gpg
should resolve to Amazon ECR (when authenticated).