Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore pagination information #496

Closed
wants to merge 1 commit into from

Conversation

jonjohnsonjr
Copy link
Contributor

Notably "breaking" changes here are that the registry may impose a limit on the number of tags it returns.

This was restored from b4e9833, where it applied to Catalog, which was removed, but given that many registries have `adopted this behavior already, we should document it correctly.

This also restores language that clients SHOULD always use the Link header when present.

This was restored from c90b0f1.

Notably "breaking" changes here are that the registry may impose a limit
on the number of tags it returns.

This was restored from b4e9833, where
it applied to Catalog, which was removed, but given that many registries
have `adopted this behavior already, we should document it correctly.

This also restores language that clients SHOULD always use the Link
header when present.

This was restored from c90b0f1.

Signed-off-by: Jon Johnson <[email protected]>
@sudo-bmitch
Copy link
Contributor

Does this text now allow a registry to return more than n results? That could be an issue for clients that assume a fixed size array for the results.

@sudo-bmitch sudo-bmitch mentioned this pull request Nov 30, 2023
@jonjohnsonjr
Copy link
Contributor Author

Does this text now allow a registry to return more than n results?

I don't think so. The relevant text is:

The registry implementation MAY impose a maximum limit and return a partial set with pagination links.

This would only allow for <= n.

Copy link
Contributor

@sudo-bmitch sudo-bmitch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a first pass, it may still need some tweaking later.

The big issue is with the deletion of:

The response to such a request MAY return fewer than <int> results, but only when the total number of tags attached to the repository is less than <int>.
Otherwise, the response MUST include <int> results.

and addition of:

Paginated results can be retrieved by adding an n parameter to the request URL, declaring that the response SHOULD be limited to n results.

This reads as if registries are allowed to return more than n results.

I think we should specify that registries can return at most n results. If a registry returns fewer than n results, either it MUST be the end of the list, or there MUST be a pagination Link header. That would be the least breaking of the changes.

I think it's also worth calling out that a registry that returns a Link header on some tag listing requests MUST do so for all tag listing requests that do not return the end of the list. Otherwise it is not safe for clients to assume:

If the header is not present, the client can assume that all results have been received.

The entries in the response start _after_ the term specified by `last`, up to `n` entries.

The client can then issue the request with the above value from the `Link` header, receiving the values _v3_ and _v4_.
Note that `n` may change on the second to last response or be fully omitted, depending on the server implementation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "second to last response"? Why does omitting the parameter by the client depend on the server implementation?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the second to last response, the n parameter could be adjusted by the server to match the remaining number of tags to be delivered in the last response. I would, however, say that this is more difficult for a registry to implement than just using the name of the last tag in the current result set.

Comment on lines +686 to +688
```
GET /v2/<name>/tags/list?n=<n from the request>&last=<last tag value from previous response>
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This GET request isn't using the Link header as the above paragraph suggests. It might be easier to include this as an option for starting the request in the middle of the list, around line 658 and then follow with two sections, one for the Link header (clients should prefer this if servers implement it), or manually generating requests with the last parameter (for clients with 1.0 server compatibility).

@@ -639,6 +644,57 @@ If the [referrers API](#listing-referrers) returns a 404, the client MUST fallba
The response SHOULD be an image index with the same content that would be expected from the referrers API.
If the response to the [referrers API](#listing-referrers) is a 404, and the tag schema does not return a valid image index, the client SHOULD assume there are no referrers to the manifest.

##### Pagination

Paginated results can be retrieved by adding an `n` parameter to the request URL, declaring that the response SHOULD be limited to `n` results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pagination can also occur without the n parameter if servers limit the number of entries in a response.

@rchincha
Copy link
Contributor

rchincha commented Dec 4, 2023

Also, guidelines on what the expected behavior is if the tag list changes in between, due to new images being pushed or deleted for that matter?

If there are indeed more results, the URL for the next block is encoded in an [RFC5988](https://tools.ietf.org/html/rfc5988) `Link` header, as a "next" relation.
If the header is not present, the client can assume that all results have been received.

> __NOTE:__ In the request template above, note that the brackets are required. For example, if the url is `http://example.com/v2/hello-world/tags/list?n=20&last=b`, the value of the header would be `<http://example.com/v2/hello-world/tags/list?n=20&last=b>; rel="next"`.
Copy link

@andaaron andaaron Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe include also an example with a relative reference, such as the server returning just </v2/hello-world/tags/list?n=20&last=b>; rel="next". Which I would expect to be more common, as the reply may be routed a few times until getting to the client and the ip/hostname/port the server is aware of may be different from the one the client should make the request to.

@sudo-bmitch sudo-bmitch added this to the v1.2.0 milestone Jan 11, 2024
Copy link

@dmesser dmesser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like this change. For a public registry service, the wording of the current spec is essentially impossible to implement without opening an easy to exploit DoS vector:

The response to such a request MAY return fewer than results, but only when the total number of tags attached to the repository is less than . Otherwise, the response MUST include results.

So the proposed changes are alleviating that. The one thing I am missing here is details around how a server can/should tell the client that it requested too many tags and in which circumstances.

In line 533 it says:

The registry implementation MAY impose a maximum limit and return a partial set with pagination links.
See Pagination for information on how to handle pagination links.

But there is no detail provided in that section about the return code. I would suggest to include the verbiage from @sudo-bmitch's earlier proposal: https://github.com/opencontainers/distribution-spec/pull/470/files#diff-bc6661da34ecae62fbe724bb93fd69b91a7f81143f2683a81163231de7e3b545R548

Allow to return a HTTP 400 response and make it so that clients rely on a reasonable minimum amount of tags to never trigger this response, like 20.

The entries in the response start _after_ the term specified by `last`, up to `n` entries.

The client can then issue the request with the above value from the `Link` header, receiving the values _v3_ and _v4_.
Note that `n` may change on the second to last response or be fully omitted, depending on the server implementation.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the second to last response, the n parameter could be adjusted by the server to match the remaining number of tags to be delivered in the last response. I would, however, say that this is more difficult for a registry to implement than just using the name of the last tag in the current result set.

@jonjohnsonjr
Copy link
Contributor Author

jonjohnsonjr commented Feb 1, 2024

I don't think any of this is ambiguous and we should just merge it. If someone else wants to take a stab at it, I'll approve anything.

@dmesser
Copy link

dmesser commented Feb 2, 2024

@jonjohnsonjr Was this PR accidentally closed, or did you have something else in mind?

@tianon
Copy link
Member

tianon commented Feb 14, 2024

I'm interpreting between the lines a little bit (so massive apologies to Jon if I'm inaccurate here), but I think it was probably closed because this isn't actually a change to the spec, but restoring content that was in the spec and got removed during refactoring (see #443; so he probably didn't want to spend too many cycles discussing/nit picking the previous behavior/design choices that are already implemented in many registries, and documenting that interop is really why this spec exists 😅).

@dmesser
Copy link

dmesser commented Feb 15, 2024

Got it. Has this revert happened yet? Because the spec in its current form is currently really problematic with regards to pagination: https://github.com/opencontainers/distribution-spec/blob/main/spec.md?plain=1#L532-L533

is the namespace of the repository, and is an integer specifying the number of tags requested. The response to such a request MAY return fewer than results, but only when the total number of tags attached to the repository is less than . Otherwise, the response MUST include results.

This essentially opens up a DoS vector if the client can expect to put arbitrarily high values in <int> and the registry is expected to respond with it, with no means of pushing back with an error.

@jdolitsky
Copy link
Member

Can we re-address this with a fresh PR and then afterwards cut a 1.1.1?

@sudo-bmitch sudo-bmitch removed this from the v1.2.0 milestone Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants