-
Notifications
You must be signed in to change notification settings - Fork 705
Proposal: additional Pre-Defined Annotation Keys #1046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
In theory there's nothing stopping you from including subpaths in your
The spec language is pretty loose about how that's structured, it just says "URL to get source code", so if producers and consumers can agree on the format to make it useful to them, I interpret that as allowed under the spec. |
Given that source can be published as a |
Docker has a syntax where they use First, the value of having the source and revision annotations has a dual use. For builders they know where to look to rebuild. But more importantly, for image users, they can easily see if the image is stale by looking at the commit history. The diminishing return we are going to see here is that there is more than one build tool, and commands to run to generate a build will differ by project. So just knowing the directory is not enough to create the image. Even if there's a Dockerfile, there may be commands that need to run on the host to setup the build, and args that need to be passed in. Without knowing the path, the second use case of detecting stale images is possible with what we have now. And even if we add the path, the first use case of knowing how to rebuild the image falls short in various use cases. As an alternative, perhaps we need a free form build command, but that could result in a quine style issue since the annotation to build the image may be included in the command itself. |
In general I interpret If your goal is source provenance, or reproducibility, which you'd maybe like to use to automate things, there's much more expressive and robust places to express that, e.g., in in-toto attestations that state the full build steps to reproduce, including where exactly to get source. It occurs to me I probably should have asked "what are you trying to do" earlier in this thread. 😄 |
Imagine you have a big pile of images... no not quite that big. You'd like to understand which images are sequential updates to other images, for example to determine if a certain image is stale. This works great as @sudo-bmitch points out in terms of determining if an individual image is stale, but becomes more difficult to do if you have e.g. 12 independently-released functions which are developed in a monorepo style. Basically, we're attempting to correlate built images in a repository with earlier versions of the same image from the same supply chain / build tool, and we've discovered that Git monorepos (vs a Piper-style "I am the world" monorepo) don't have enough annotation metadata to distinguish e.g. the "fetch" and "ingest" functions in this repo: Both containers have: org.opencontainers.image.source: https://github.com/evankanderson/function-weather-demo.git
org.opencontainers.image.revision: 318fcc8 I want: org.opencontainers.image.source: https://github.com/evankanderson/function-weather-demo.git
org.opencontainers.image.source.subpath: fetch
org.opencontainers.image.revision: 318fcc8 and org.opencontainers.image.source: https://github.com/evankanderson/function-weather-demo.git
org.opencontainers.image.source.subpath: ingest
org.opencontainers.image.revision: 318fcc8 |
Obviously, I could make up my own annotation key, but it would be nice to document it in a way that other tools could also benefit. Also obviously, many tools aren't setting these annotations today, but that doesn't stop us from trying to do better. |
For that problem, the directory may not be enough. From the same directory, multiple images may be generated. E.g. different build tools, docker can point to different Dockerfiles, build args can change the build. Some kind of additional identifier is needed but the directory for your use case may end up being a build arg value for another. |
I defined the argument as I'm not sure how I'd handle "this make target in this subdirectory", but "this Dockerfile" should be referencable. I'd be willing to make the string suitably generic to support "a bazel target at this bazel path" if someone wants to help wordsmith the description. Maybe:
For my purposes, I'd like |
It doesn't sound like there's a strong objection (though perhaps a bit of "who will use that") for adding the following pre-defined annotation key:
If that seems acceptable, I'll send a PR for that shortly. |
I'm not wholly opposed to adding the annotation key, but I would remove the example. The rest of the proposed meaning makes sense and is I think the the right amount of vague. |
Sorry, but my opinion here is aligned with Brandon's in #1046 (comment); namely, I don't think a single additional key is sufficient for most tools, even if it covers your use case, and I would still suggest solving your specific problem with your own annotation. 🙈 |
@tianon -- what do you think about the proposition that |
I don't think I would interpret In Docker's BuildKit tooling, this manifests as full provenance objects, for example: https://explore.ggcr.dev/?blob=docker/dockerfile@sha256:5bb344bbbc250f42b6cf85904aaec1feb8125af97d0e3f0302620e17d54224cc&mt=application%2Fvnd.in-toto%2Bjson&size=14542 |
I'm currently trying to consume annotations to determine, for example "is container X a replacement for container Y" across different tools. While I certainly can invent new metadata and parsing code for each build tool in order to make that determination, I was hoping that this group could recommend a standard annotation so that build tool authors and container-curation tool authors could answer the question about past/future relationship between two container images. Is that a reasonable thing to attempt, regardless of whether or not this annotation is the right mechanism for implementing it? |
I don't think it's a good fit for OCI to define an annotation for "are these the same thing" or "is this a replacement". The specifics of those statements are nebulous and shifting and context-dependent. I'd recommend defining your own annotation, where you can define the semantics of it yourself. |
There were efforts from various parts of the community to maintain a
registry of "common" annotations used, but last it came up, it seems
those communities have disappeared. :-\
We've intentionally kept the "wiki" features turned off here because it
would be another thing to track, and is effectively ephemeral. Each
release of the OCI spec ought to be self contained, and not lean on
something like a wiki content. But likewise, having to rev the spec just
to keep track of constantly changing "common" usage is less than ideal.
|
Sorry, I was trying to explain a specific use-case, but the annotation would effectively be a correlation ID in a stream-processing sense -- image X and image Y are both related to the same underlying application built out of a (git) monorepo. Considering a case like e.g. https://github.com/tektoncd/pipeline/tree/main/cmd, where the I agree with not putting this on a wiki -- if we don't think that it's worth adding an attribute to enable correlation of images built from a mono-repo, then I think it's better to drop this feature altogether. |
To expand on my previous concerns since this was raised in #1252, take an example that has:
To know "this image was built from this thing" requires a more input than a directory name, or even a filename, and quickly gets into build tooling specific details. E.g. to make my own builds reproducible, there is post processing done on the generated image. I think the best we could do is define an annotation this is an instance or key that would be specific to a given source. And it would be up to each source repository to define their own instance structure to uniquely differentiate their own builds, OCI wouldn't have any opinion on the contents of the annotation since there are too many variables to attempt to capture. That could be:
|
How do you define "the same image" without ending up defining the Ship of Theseus? 🙈 FWIW, I've been using Brandon's idea from #1046 (comment) for just shy of two years now in increasing frequency and have been pretty happy with it (it doesn't solve the "different |
Please forgive me if this has already been raised. I agree with @sudo-bmitch's point, but I believe specifying an application subpath (rather than a specific OCI definition file) in large monorepo environments has some merit from ownership, operational, security, and automation perspectives. Security and Automation: One benefit of standardizing the specification of an application subpath is improving vulnerability management and supply chain security automation. Tools used for security scanning often report findings at the monorepo subpath level (e.g., locations of Ownership and Operational Clarity: While an OCI image can be built from multiple contexts, specifying an application subpath can help quickly determine ownership during operational incidents. Application subpaths usually have clear owners, (documented in |
That's the best part, we don't. Each repository gets to decide that for themselves. I didn't even capitalize the "should". 😄
Integrating with external tooling would be solving a very different problem. That may be a good one to solve, but I'd want to first spend some time seeing how potential solutions work in real world scenarios. Most likely we'll need a separate annotation for each external tool, and at that point, those tools would define their own annotation. My concern is that different groups would want to use the same annotation to handle integration with different external tools. E.g. if I'm building a Go project, do I set the path to the location of the go.mod, the location of the Dockerfile, or the location of the main package being built?
Little that we do here is going to affect usage in the wild. OCI is typically a trailing spec, so it's more that we document common usage that we see in a way that helps with interoperability. Implementations would still need to adopt the standard, and annotations in general are rarely used, and when they are used they are easily misused.
I've always assumed this identified a group or organization, unless the image is really maintained by a single person. For most companies, that seems like it would be more useful than a path since one group may own lots of projects and looking up who owns a path is an extra layer of indirection. |
While attempting to use the Pre-Defined Annotation Keys to track container images back to the corresponding source code, we (re)discovered that
org.opencontainers.image.source
(URL to get source code for building the image) is insufficient to determine the source code associated with the image when building from a monorepo. (See #886 for an earlier, less action-oriented version of this issue.)The particular flavor of this issue is that
org.opencontainers.image.source
is likely to be something likehttps://github.com/opencontainers/image-spec.git
, but those URLs don't provide a way to specify a sub-path within a git repository, which may be useful when building multiple images from a single repo. This is a common practice for many projects, including Kubernetes, Tekton, and Knative (three projects I've interacted with the most).I'd like to propose documenting the following additional Pre-Defined Annotation Key:
org.opencontainers.image.source.subpath
: A relative path within the source repository used as the base directory to build the container (string)A few questions on definition which we can either ignore, document, or finesse to avoid defining:
.
, or unknown? If it means "unknown", does that mean tools should generally set.
? If it means.
, how much do all the existing images mess with this?I'm happy to propose a PR if this would be useful. @imjasonh @sudo-bmitch as the last two to shepherd changes to this file.
The text was updated successfully, but these errors were encountered: