-
Notifications
You must be signed in to change notification settings - Fork 439
Add apt clean
to reduce layer size
#1298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The downloaded packages do not need to be kept in the layer. Running `apt clean` will remove them from the cache. To quote from apt's man page: > clean clears out the local repository of retrieved package files. It removes everything but the lock file from /var/cache/apt/archives/ and /var/cache/apt/archives/partial/.
cc @Kaniska244 I'm linking https://github.com/orgs/devcontainers/discussions/206 as well, would love your input when you get a moment! |
@@ -27,6 +27,7 @@ set -e | |||
|
|||
# Clean up | |||
rm -rf /var/lib/apt/lists/* | |||
apt clean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this not make sense as it only removes the installation stuff in the previous layer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't quite understand why this script cleans at the start and then again at the end 🤷
As a start I see an existing clean up, I add my change. If someone with a better overview of the project points me in a different direction to solve this at a higher level so that the apt cache is never stored in the image (e.g. using RUN --mount=type=cache,target=/var/cache/apt --mount=type=cache,target=/var/lib/apt/lists
everywhere), I'd be happy to also do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the removal of /var/lib/apt/lists/*
is done because subsequent installations may fail if the cache remains in the lower layers.
The size of the image is irrelevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on "The size of the image is irrelevant"? My team is paying for image storage by the byte and devcontainer download time is a regular annoyance, so image size is very much of concern for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the context #210.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm trying to say is that I'm wondering if it's worth adding this at the beginning of the script.
(Because it doesn't affect the layer size)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @DavidS-ovm ,
Thank you for the PR. I was trying to test docker-in-docker feature with this PR change. In fact I see that with apt clean
right at the beginning of the install.sh
, package installation failing. It doesn't remove the directory structure or the lock file, so later on the following condition in apt_get_update()
function body doesn't satisfy & apt-get update -y
doesn't execute before installing the additional packages for docker-in-docker feature.
apt_get_update()
{
if [ "$(find /var/lib/apt/lists/* | wc -l)" = "0" ]; then
echo "Running apt-get update..."
apt-get update -y
fi
}
It would be best to have the clean-up done once & done right at the end of the install.sh
execution & that should serve the purpose of reducing the layer size in the resultant image. Kindly let me know in case of further concern.
With Regards,
Kaniska
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the analysis! This is indeed more complex than I initially expected. Meanwhile I've done some local optimizations for other containers in my work and I'm now using
RUN --mount=type=cache,target=/var/cache/apt \
--mount=type=cache,target=/var/lib/apt/lists \
[...]
any time apt is getting called. This ensures that
- apt caches and metadata are never baked into any layer
- individual steps can re-use previous steps' already downloaded data, and
- if buildx is appropriately configured, the cache can be used across multiple builds, further improving efficiency if requested
- individual steps don't need to care about refreshing and cleaning apt metadata
Poking around in the devcontainer CLI, this seems to be the appropriate location to add that:
additionally there's getContainerFeaturesBaseDockerFile
in the same file that looks like a good place to apt update
once for the entire build.
Does that sound sensible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
devcontainers/cli#989 as an experiment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @DavidS-ovm ,
Thank you for the contribution & the detailed analysis on this issue. I will check this further & update you.
This change uses cache mounts in the generated Dockerfiles for features to ensure that apt metadata and downloaded packages remain available throughout the build process without ever getting baked into layers. To make life easier for feature implementations, this also runs `apt update` once before the feature build, so that the metadata cache is up-to-date. This has the following benefits: * reduced build-time costs: apt metadata is only downloaded once * smaller layer size: features cannot accidentally include downloaded packages in their layer (see devcontainers/features#1298) * reduced complexity: features do not have to worry about apt caching and cache cleaning * optionally, if building on a caching buildkit service like depot.dev, the cache is re-used across builds, further reducing build-time
This change uses cache mounts in the generated Dockerfiles for features to ensure that apt metadata and downloaded packages remain available throughout the build process without ever getting baked into layers. To make life easier for feature implementations, this also runs `apt update` once before the feature build, so that the metadata cache is up-to-date. This has the following benefits: * reduced build-time costs: apt metadata is only downloaded once * smaller layer size: features cannot accidentally include downloaded packages in their layer (see devcontainers/features#1298) * reduced complexity: features do not have to worry about apt caching and cache cleaning * optionally, if building on a caching buildkit service like depot.dev, the cache is re-used across builds, further reducing build-time
The downloaded packages do not need to be kept in the layer. Running
apt clean
will remove them from the cache.To quote from apt's man page: