-
Notifications
You must be signed in to change notification settings - Fork 54
Provide clear definition of what is an "artifact" #32
Comments
Hi @Silvanoc, Does the latest update to the distribution-spec help? Open to suggestions on what we'd add. |
IMO the naming issue cannot be fixed now, and it's not that it important that it needs fixing. It simply contributes to the confusion: what is an artifact? each single file you can store? each set of files referenced by a manifest? If you look at ORAS documentation then you realize what it means (roughly speaking a set of files being referenced by a manifest). What shows that a definition in the definitions section is needed. I can try to contribute one, but since I'm not a native speaker, it might result in a weird formulation 🙂 |
Thanks @Silvanoc, let me see if I can make some proposed tweaks. |
To put some thoughts around "what is an artifact"? Let me try and put something here as an iterative draft that will turn into a pr: An artifact is an object stored in an OCI Distribution registry that a user reasons over.
The above are independent artifacts.
The user interacts with these artifacts by named references. Either through a tag ( The artifact is backed by a manifest. When a user requests a tag or digest, they are requesting a manifest, by which the client can negotiate how to fetch the blob(s) that represent that artifact. Each of those artifacts are represented as one ore more blobs. A layer, as defined in the oci image spec, is an ordinal collection of blobs. The ORAS Artifacts-spec takes the concept of artifacts to a next level. The reference type artifact may not have any blobs. A user may simply wish to add annotations to an existing artifact In practical code terms, an artifact = a manifest. A manifest may be an oci image manifest, an oci index, or an oras artifact manifest. I'm sure this needs more tweaking, but wanted to put it out for discussion. |
Sorry, I was trying to get some clarity to be able to answer with a 'yes' or a 'no', but time flies!
The word "artifact" appears 4x in the above quoted text. But what's an "artifact" within the scope of the specification? I've found right now in ORAS a new term to make it all even more confusing 🙂 "subject artifact". Let me refer to the ORAS quick-start documentation to better understand what I mean. In that guide an SBOM and signature are being associated to a container image. But what if I want to associate an SBOM and signature to an archive? Are the archive, SBOM and signature "artifacts"? Then it's "repository" the name for the whole? And what's the name for "repo":"tag"? IMO the fact that different terms for things without a clear separation and without a definition (e.g. the variables Perhaps it's obvious for a native speaker what you mean with |
Thanks @Silvanoc, To your point question,
I believe you mean a tar archive as a blob? A tar archive [layer | blob] is a good thing to define. Personally, I believe a blob, by itself is just a blob, without a definition. Meaning, the manifest is what makes a blob an artifact. An artifact may only be one blob. But, blobs without a manifest are likely deleted in most registries, although the spec doesn’t define lifecycle or garbage collection, nearly every production registry does implement lifecycle management. Those that don’t implement lifecycle management wouldn’t be opposed to having it, they may just not use it. This is a bit of a tangent, but a blob without a manifest might get an inferred manifest, just like pushing an artifact without a tag becomes :latest subject: This is a née term, and one that has been changed multiple times. As defined in the oras artifacts spec, it a way to define a reference between two artifacts in a registry. We did initially call it reference, but it wasn’t clear if it was a parent:child relationship or child:parent relationship. The term reference didn’t infer direction. Most recently, it was changed from subjectManifest to subject. The oras artifacts spec today limits references to other manifests. But, we do see a future where an sbom or signature could be associated with a blob. Do any of the above references help that I could use to clarify these better? |
I don't have the feeling that we are on the same page 🙂 Perhaps I'm not making my point clear enough. The more I look at other related projects trying to get some clarity, the more I've the impression that the whole terminology throughout all OCI-specifications is not very consistent and can be hard for outsiders. Let me illustrate with references to different related projects what I mean to make sure that we agree on the problem space before moving to the solution space (which isn't easy). I think that I'll give it a try for the solution space with PRs. ORASSee the ORAS pushing artifacts with multiple files documentation:
OCI distribution specificationI'll refer to the terms found in the definitions of the OCI-distribution specification with bold letters and starting with a capital letter. I've noticed that a big longer than 1 year ago the specification was modified to make it content agnostic (decoupling from images), still the widest extended use of the distribution specification is container images. Therefore I've confronted the definitions with container images. When I pull a container image like
To make it even more confusing, if we confront the above analysis with Dockers documentation of Here the discussion is about how to address the content or how to name the different parts on an artifact name, but not about the names of the individual parts of an artifact. Anyway, I think this comment has become confusing enough to close it here and leave the names of the individual parts of an artifact for another comment... |
Thanks @Silvanoc, I actually think this discussion is super helpful to help create some clarity. Some additional thoughts:
Can you explain why you think ORAS has a misuse? The image spec uses the term
Minor clarification: tags aren't actually mandatory. The distribution spec does support pushing and pulling manifests by digest only. This is one of the things the oras artifacts-spec takes advantage of. Tags are a way to make human readable references or to have a higher level artifact viewed. As to, "what is an artifact": I'd suggest it's the thing a user wants to focus on. All the details around blobs, layers, annotations, signatures, scan results, sboms are supporting information, to that primary artifact. The end-user wants to push, discover, verify, pull, delete an artifact. What do you think? |
As a developer myself I can fully understand it. Keeping consistency among projects written on different times is very difficult to accomplish. I don't pretend to blame anybody. Only to bring the inconsistencies to the surface. Since as an engineer/architect I know how important clarity and terminology are.
I hope so and I'm glad that you see it so.
Perhaps "misuse" is too negative. What I mean is that the distribution specification hasn't been written from scratch, but derived from a software that was originally written to handle container images and not artifacts. The generalization from "specification for the distribution of container images" to "specification for the distribution of artifacts" isn't easy to accomplish while the names existing for historical reasons (e.g. layers) are kept. And I see keeping backwards compatibility as a very good decision.
You're right. I rather meant either a digest or a tag (being the tag 'latest' if nothing is specified). The tag or digest in the end gives versioning support (not necessarily as numerical versions).
I agree on this. My initial intention was to focus only on what is an "artifact", but trying to understand the whole I've fallen down the rabbit hole... Let me try to write a couple of small PRs with formulation proposals here and there. |
Although I've touched varied concepts and definitions in this issue, I've provided PR #50 focusing on its original goal. I don't expect it to be accepted on the first try, but I'd prefer to move the discussion from this thread to that concrete proposal to better focus it. I might open separate issues for the different inconsistencies I've found (many of them on other projects) trying to understand what is an artifact. |
Agreed. Can you call out where you think backwards compact is a challenge?
I think the discussion really helps surface some things that can be clarifed. As for the overall design, I'd pull @stevvooe, @dmcgowan, @vbatts, @mikebrow in as some of the original folks that could add more context. |
wrt this repo.. opencontainers/artifacts currently means "an OCI repository for Artifact Guidance Documents" |
wrt the question what is an artifact .. that is defined in the distribution spec:
|
@mikebrow I found that definition some days ago and I failed trying to find it again the day over 😞 I think it's very good and a reference to it could be added here. I still miss following definitions (probably in the distribution specification):
For those of you involved into the specifications is obvious what is meant, by those terms. But not for outsiders. For me (outsider, but with some knowledge about the implementation details) terms like "repository", "namespace", "reference",... aren't clear until I read through the specification. Consumers of the implementations (in fact I started wondering how to call things playing around with ORAS) are missing those definitions. Instead of having the implementations defining them in a possible diverting way, I'd rather fix them at the specification and let the implementations refer to them. I can make PRs with those definitions on the distribution spec, if you agree they would be useful. |
yes I missed some.. we can work them in your pr.. Will be away for a couple weeks will review help define them when I get back. Cheers! |
I find the Additionally it's unclear if 'image indexes' are somehow covered by any of the definitions. It'll become a more obvious problem once the future scope of the artifacts spec has become present. |
On 05/11/21 09:52 -0700, Silvano Cirujano Cuesta wrote:
I find the `artifact` definition of the distro spec very good, but it probably needs to be changed to accommodate to the artifacts specification, by removing the need of having a config file (as already remarked by @SteveLasker in [this comment](#50 (comment))
Additionally it's unclear if 'image indexes' are somehow covered by any of the definitions. It'll become a more obvious problem once [the future scope of the artifacts spec](https://github.com/opencontainers/artifacts/blob/d9afcdd395525fb59ebb1dc4e27aa63a97f4b606/artifact-authors.md#future-scope) has become present.
The name image-index was never good. "manifest-list" was much better.
|
Mission for artifacts is moving to the image and distribution specifications.. and this repo is being archived. I you believe more is needed please reopen in image or distribution! |
There's no clear definition about what's an artifact, although the meaning can be found when reading the specification. It's nevertheless meaningful having a clear definition, because the term "artifact" is extremely overloaded.
I tend to call the files of an artifact "artifacts" themselves and then started calling the artifact "artifact set", until realizing the nonsense. But it should illustrate the need for a clear statement of what is an artifact.
What probably also makes it misleading is the plural in the name: artifacts. There's the OCI Image Specification (notice the singular in the name) and the OCI ArtifactS Specification (notice the plural), what sounds like specifying how to use a registry to store an "artifact set or bundle" containing multiple artifacts.
The text was updated successfully, but these errors were encountered: