Skip to content

Proposal: additional Pre-Defined Annotation Keys #1046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
evankanderson opened this issue Apr 5, 2023 · 21 comments · May be fixed by #1062
Open

Proposal: additional Pre-Defined Annotation Keys #1046

evankanderson opened this issue Apr 5, 2023 · 21 comments · May be fixed by #1062

Comments

@evankanderson
Copy link

evankanderson commented Apr 5, 2023

While attempting to use the Pre-Defined Annotation Keys to track container images back to the corresponding source code, we (re)discovered that org.opencontainers.image.source (URL to get source code for building the image) is insufficient to determine the source code associated with the image when building from a monorepo. (See #886 for an earlier, less action-oriented version of this issue.)

The particular flavor of this issue is that org.opencontainers.image.source is likely to be something like https://github.com/opencontainers/image-spec.git, but those URLs don't provide a way to specify a sub-path within a git repository, which may be useful when building multiple images from a single repo. This is a common practice for many projects, including Kubernetes, Tekton, and Knative (three projects I've interacted with the most).

I'd like to propose documenting the following additional Pre-Defined Annotation Key:

org.opencontainers.image.source.subpath: A relative path within the source repository used as the base directory to build the container (string)

A few questions on definition which we can either ignore, document, or finesse to avoid defining:

  • Windows: do windows image builds use windows path conventions?
  • Default / empty value: Does this mean ., or unknown? If it means "unknown", does that mean tools should generally set .? If it means ., how much do all the existing images mess with this?

I'm happy to propose a PR if this would be useful. @imjasonh @sudo-bmitch as the last two to shepherd changes to this file.

@imjasonh
Copy link
Member

imjasonh commented Apr 5, 2023

In theory there's nothing stopping you from including subpaths in your image.source value:

"org.opencontainers.image.source": "https://github.com/opencontainers/image-spec/path/to/sub/thing"

The spec language is pretty loose about how that's structured, it just says "URL to get source code", so if producers and consumers can agree on the format to make it useful to them, I interpret that as allowed under the spec.

@evankanderson
Copy link
Author

Given that source can be published as a .tgz or .zip, it feels like it's still useful to have a mechanism to indicate a path within the source code which is the basis for the build. I don't have a strong feeling about whether this would indicate the directory containing something like a Dockerfile or main.go, but making org.opencontainers.image.source be a URL which can't actually be fetched without special knowledge of e.g. GitHub URL structures doesn't feel great, especially since some other Git hosting providers like GitLab allow arbitrarily-nested paths to repos.

@sudo-bmitch
Copy link
Contributor

Docker has a syntax where they use # and : to separate the git repo name from the path and tag/ref. I'm not sure how I feel about that for this use case, at the very least the tag/ref is going to conflict with org.opencontainers.image.revision. Given that we have a revision annotation already, I get the logic of adding a path, but the value may be lower than hoped.

First, the value of having the source and revision annotations has a dual use. For builders they know where to look to rebuild. But more importantly, for image users, they can easily see if the image is stale by looking at the commit history.

The diminishing return we are going to see here is that there is more than one build tool, and commands to run to generate a build will differ by project. So just knowing the directory is not enough to create the image. Even if there's a Dockerfile, there may be commands that need to run on the host to setup the build, and args that need to be passed in.

Without knowing the path, the second use case of detecting stale images is possible with what we have now. And even if we add the path, the first use case of knowing how to rebuild the image falls short in various use cases. As an alternative, perhaps we need a free form build command, but that could result in a quine style issue since the annotation to build the image may be included in the command itself.

@imjasonh
Copy link
Member

imjasonh commented Apr 6, 2023

In general I interpret source and even revision as hints intended for humans, and not binding contracts for computers. If you happen to control both the producer and consumer of an image, you can stuff contractually-helpful context into it for both to use, but I don't think it's OCI's place to enforce that.

If your goal is source provenance, or reproducibility, which you'd maybe like to use to automate things, there's much more expressive and robust places to express that, e.g., in in-toto attestations that state the full build steps to reproduce, including where exactly to get source.

It occurs to me I probably should have asked "what are you trying to do" earlier in this thread. 😄

@evankanderson
Copy link
Author

Imagine you have a big pile of images... no not quite that big. You'd like to understand which images are sequential updates to other images, for example to determine if a certain image is stale.

This works great as @sudo-bmitch points out in terms of determining if an individual image is stale, but becomes more difficult to do if you have e.g. 12 independently-released functions which are developed in a monorepo style. source and revision are no longer sufficient to determine whether each function image is up-to-date without starting to take dependencies on conventions like pushing each function to a separate image name (and not copying them later into a shared repo).

Basically, we're attempting to correlate built images in a repository with earlier versions of the same image from the same supply chain / build tool, and we've discovered that Git monorepos (vs a Piper-style "I am the world" monorepo) don't have enough annotation metadata to distinguish e.g. the "fetch" and "ingest" functions in this repo:

Both containers have:

org.opencontainers.image.source: https://github.com/evankanderson/function-weather-demo.git
org.opencontainers.image.revision: 318fcc8

I want:

org.opencontainers.image.source: https://github.com/evankanderson/function-weather-demo.git
org.opencontainers.image.source.subpath: fetch
org.opencontainers.image.revision: 318fcc8

and

org.opencontainers.image.source: https://github.com/evankanderson/function-weather-demo.git
org.opencontainers.image.source.subpath: ingest
org.opencontainers.image.revision: 318fcc8

@evankanderson
Copy link
Author

Obviously, I could make up my own annotation key, but it would be nice to document it in a way that other tools could also benefit. Also obviously, many tools aren't setting these annotations today, but that doesn't stop us from trying to do better.

@sudo-bmitch
Copy link
Contributor

For that problem, the directory may not be enough. From the same directory, multiple images may be generated. E.g. different build tools, docker can point to different Dockerfiles, build args can change the build. Some kind of additional identifier is needed but the directory for your use case may end up being a build arg value for another.

@evankanderson
Copy link
Author

I defined the argument as subPath to allow for tools which might need to reference a file instead of a directory.

I'm not sure how I'd handle "this make target in this subdirectory", but "this Dockerfile" should be referencable. I'd be willing to make the string suitably generic to support "a bazel target at this bazel path" if someone wants to help wordsmith the description.

Maybe:

A tool-specific path within the source repository which may be used to distinguish different build targets in the same repository.

For my purposes, I'd like source + subpath to be a primary key for "equivalent" artifacts over time, and source + subPath + revision to identify specific instances of those artifacts. This can be used (for example) with container scan results or SBOMs to track vulnerability exposure over time.

@evankanderson
Copy link
Author

It doesn't sound like there's a strong objection (though perhaps a bit of "who will use that") for adding the following pre-defined annotation key:

key meaning
org.opencontainers.image.source.subpath A tool-specific path within the source repository which may be used to distinguish different build targets in the same repository. For example, the path to a Dockerfile or a directory to invoke a CNCF Buildpack in.

If that seems acceptable, I'll send a PR for that shortly.

@imjasonh
Copy link
Member

I'm not wholly opposed to adding the annotation key, but I would remove the example. The rest of the proposed meaning makes sense and is I think the the right amount of vague.

@evankanderson evankanderson linked a pull request May 12, 2023 that will close this issue
@tianon
Copy link
Member

tianon commented May 13, 2023

Sorry, but my opinion here is aligned with Brandon's in #1046 (comment); namely, I don't think a single additional key is sufficient for most tools, even if it covers your use case, and I would still suggest solving your specific problem with your own annotation. 🙈

@evankanderson
Copy link
Author

@tianon -- what do you think about the proposition that org.opencontainers.image.source is not sufficient to distinguish whether two containers were built from the same source code? Is there a different mechanism you would suggest to correlate container images from the same source code over time (or do you think that's not a good general purpose use case)?

@tianon
Copy link
Member

tianon commented Jun 23, 2023

I don't think I would interpret org.opencontainers.image.source to be unique, nor that it was intended to be. I think it was intended to be a hint or clue, and that for a given build, more data is definitely necessary in order to reproduce it, and just how much additional data is necessary is going to differ from tool to tool and even build to build within a given tool. 😅

In Docker's BuildKit tooling, this manifests as full provenance objects, for example: https://explore.ggcr.dev/?blob=docker/dockerfile@sha256:5bb344bbbc250f42b6cf85904aaec1feb8125af97d0e3f0302620e17d54224cc&mt=application%2Fvnd.in-toto%2Bjson&size=14542

@evankanderson
Copy link
Author

I'm currently trying to consume annotations to determine, for example "is container X a replacement for container Y" across different tools. While I certainly can invent new metadata and parsing code for each build tool in order to make that determination, I was hoping that this group could recommend a standard annotation so that build tool authors and container-curation tool authors could answer the question about past/future relationship between two container images.

Is that a reasonable thing to attempt, regardless of whether or not this annotation is the right mechanism for implementing it?

@imjasonh
Copy link
Member

I don't think it's a good fit for OCI to define an annotation for "are these the same thing" or "is this a replacement". The specifics of those statements are nebulous and shifting and context-dependent.

I'd recommend defining your own annotation, where you can define the semantics of it yourself.

@vbatts
Copy link
Member

vbatts commented Jun 23, 2023 via email

@evankanderson
Copy link
Author

I don't think it's a good fit for OCI to define an annotation for "are these the same thing" or "is this a replacement". The specifics of those statements are nebulous and shifting and context-dependent.

Sorry, I was trying to explain a specific use-case, but the annotation would effectively be a correlation ID in a stream-processing sense -- image X and image Y are both related to the same underlying application built out of a (git) monorepo. Considering a case like e.g. https://github.com/tektoncd/pipeline/tree/main/cmd, where the source would be https://github.com/tektoncd/pipeline.git, I would like to be able to use image metadata to determine whether a given image is a release of ./cmd/entrypoint or ./cmd/controller (for example). Practical examples of this correlation include understanding the release cadence of specific images or trends in image size or included vulnerabilities over time. In the particular case of Tekton, I imagine that the ko tool would need to store this metadata (today it stores it in a history[n].created_by string as e.g. "ko build ko://github.com/tektoncd/pipeline/cmd/events", which doesn't seem standard).

I agree with not putting this on a wiki -- if we don't think that it's worth adding an attribute to enable correlation of images built from a mono-repo, then I think it's better to drop this feature altogether.

@sudo-bmitch
Copy link
Contributor

To expand on my previous concerns since this was raised in #1252, take an example that has:

  • A directory of multiple Dockerfiles, each building a different command/image.
  • Each Dockerfile has build args that allow various base images to be overridden.
  • Each Dockerfile contains multiple stages, some of which can be used as developer images, artifact generation, or changing the base from scratch to one that contains a full shell.
  • Dockerfiles could also ingest generated artifacts rather than generating them itself.
  • Other files in the same directory are used by other build tooling, like buildpacks, ko, jib, stacker, bazel, etc.

To know "this image was built from this thing" requires a more input than a directory name, or even a filename, and quickly gets into build tooling specific details. E.g. to make my own builds reproducible, there is post processing done on the generated image.

I think the best we could do is define an annotation this is an instance or key that would be specific to a given source. And it would be up to each source repository to define their own instance structure to uniquely differentiate their own builds, OCI wouldn't have any opinion on the contents of the annotation since there are too many variables to attempt to capture. That could be:

org.opencontainers.image.instance: Used to differentiate multiple generated images within a single source URL and revision. Different revisions of the same image should have the same instance value. (string)

@tianon
Copy link
Member

tianon commented Mar 13, 2025

How do you define "the same image" without ending up defining the Ship of Theseus? 🙈

FWIW, I've been using Brandon's idea from #1046 (comment) for just shy of two years now in increasing frequency and have been pretty happy with it (it doesn't solve the "different Dockerfile" or "different build args" or "different build flags" problems, but it does make the existing annotation more useful than it would otherwise be without becoming a fractal of complexity).

@eran-medan
Copy link

eran-medan commented Mar 13, 2025

Please forgive me if this has already been raised. I agree with @sudo-bmitch's point, but I believe specifying an application subpath (rather than a specific OCI definition file) in large monorepo environments has some merit from ownership, operational, security, and automation perspectives.

Security and Automation:

One benefit of standardizing the specification of an application subpath is improving vulnerability management and supply chain security automation. Tools used for security scanning often report findings at the monorepo subpath level (e.g., locations of pom.xml, package-lock.json, requirements.txt). Being able to automatically correlate these findings to specific container images can significantly enhance security and compliance by reducing duplicates and clarifying the source of vulnerabilities. While an image may include multiple application subpaths, in most cases there's typically one main entry point, with others serving as internal transitive dependencies.
Annotations like org.opencontainers.image.source could theoretically serve this purpose too, but they're infrequently used this way. Furthermore, consistently specifying paths across various Source SCMs (as discussed in issue #1252) is less straightforward than simply appending the subpath to the repository URL. A standardized subpath annotation would provide security automation tools a simpler and more consistent approach. I understand that solutions such as https://github.com/in-toto/attestation might be great alternative if indeed it helps to clearly define the source origin, but it's also a question of adoption. Having attestations is great, but I believe setting up a subpath annotation is much easier to set up and is better than nothing.

Ownership and Operational Clarity:

While an OCI image can be built from multiple contexts, specifying an application subpath can help quickly determine ownership during operational incidents. Application subpaths usually have clear owners, (documented in CODEOWNERS files, recent committers, or READMEs). Including these subpaths consistently in deployed images can expedite ownership identification, especially during incidents. While alternatives exist, such as using annotations like org.opencontainers.image.url or org.opencontainers.image.documentation, these are rarely adopted in practice for ownership tracking from what I've seen in the wild. org.opencontainers.image.authors Seems ideal for this purpose but is not dynamic unless specifying a URL, which doesn't seem intuitive. Additionally, dynamically referencing the current path is technically simpler than expecting build tooling authors to reference specific files like CODEOWNERS. For example, the current path can be easily added by default using tools such as docker metadata github action, whereas linking to a CODEOWNERS file requires explicit parameter passing and relies on the serendipity that someone recognizes the value of doing so. I realize this aspect may sound like a long stretch and perhaps a weak argument, but operators would highly value if they had it consistent across all images deployed.

@sudo-bmitch
Copy link
Contributor

How do you define "the same image" without ending up defining the Ship of Theseus? 🙈

That's the best part, we don't. Each repository gets to decide that for themselves. I didn't even capitalize the "should". 😄

One benefit of standardizing the specification of an application subpath is improving vulnerability management and supply chain security automation. Tools used for security scanning often report findings at the monorepo subpath level (e.g., locations of pom.xml, package-lock.json, requirements.txt). Being able to automatically correlate these findings to specific container images can significantly enhance security and compliance by reducing duplicates and clarifying the source of vulnerabilities.

Integrating with external tooling would be solving a very different problem. That may be a good one to solve, but I'd want to first spend some time seeing how potential solutions work in real world scenarios. Most likely we'll need a separate annotation for each external tool, and at that point, those tools would define their own annotation.

My concern is that different groups would want to use the same annotation to handle integration with different external tools. E.g. if I'm building a Go project, do I set the path to the location of the go.mod, the location of the Dockerfile, or the location of the main package being built?

While an OCI image can be built from multiple contexts, specifying an application subpath can help quickly determine ownership during operational incidents. Application subpaths usually have clear owners, (documented in CODEOWNERS files, recent committers, or READMEs). Including these subpaths consistently in deployed images can expedite ownership identification, especially during incidents. While alternatives exist, such as using annotations like org.opencontainers.image.url or org.opencontainers.image.documentation, these are rarely adopted in practice for ownership tracking from what I've seen in the wild.

Little that we do here is going to affect usage in the wild. OCI is typically a trailing spec, so it's more that we document common usage that we see in a way that helps with interoperability. Implementations would still need to adopt the standard, and annotations in general are rarely used, and when they are used they are easily misused.

org.opencontainers.image.authors Seems ideal for this purpose but is not dynamic unless specifying a URL, which doesn't seem intuitive.

I've always assumed this identified a group or organization, unless the image is really maintained by a single person. For most companies, that seems like it would be more useful than a path since one group may own lots of projects and looking up who owns a path is an extra layer of indirection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants