Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequential (Streaming) media types and link to registry #4518

Closed
wants to merge 2 commits into from

Conversation

handrews
Copy link
Member

This adds a link to the forthcoming media type registry (PR #4517), and also adds support for various sequential media types:

  • application/json-seq
  • application/jsonl
  • application/x-ndjson
  • text/event-stream

Given how various modeling and encoding techniques are scattered throughout the specification, the Media Types section seemed like the best place to add these, preceded by a link to he new Media Type Registry which will essentially be a catalog of where to find the existing guidance for various media types.

Also paging @robertlagrant and @disintegrator

  • schema changes are included in this pull request
  • schema changes are needed for this pull request but not done yet
  • no schema changes are needed for this pull request

Sorry, something went wrong.

@handrews handrews added the media and encoding Issues regarding media type support and how to encode data (outside of query/path params) label Mar 29, 2025
@handrews handrews added this to the v3.2.0 milestone Mar 29, 2025
@handrews handrews requested review from a team as code owners March 29, 2025 02:00
@handrews handrews changed the title Sequential media types and link to registry Sequential (Streaming) media types and link to registry Mar 29, 2025
@handrews
Copy link
Member Author

BTW I don't know that under "Media Types" is the right place for the sequential media type requirements. I could see it going as a new subsection under "Data Types", maybe? Or next to it?

It really does not feel like it should go under the Schema Object, as it's more about what you do or don't use with the Schema Object rather than how the Schema Object works in general. Once you map from the sequential format to the JSON Schema data model, the Schema Object behaves normally.

Several media types exist to transport a sequence of values, separated by some delimiter, either as a single document or as multiple documents representing chunks of a logical stream.
Depending on the media type, the values could either be in another existing format such as JSON, or in a custom format specific to the sequential media type.

Implementations MUST support modeling sequential media types with the [Schema Object](#schema-object) by treating the sequence as an array with the same items and ordering as the sequence.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wording is confusing to me, and doesn't seem to reflect the requirement that the Schema Object modeling the sequence must itself be of type: array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no requirement that the Schema Object include type: array, although it would be a good practice.

What we're talking about here is not so much what to put in the Schema Object, but what data structure to convert the document to in order to use that data structure with the document.

Implementations don't get that from the Schema Object, they get that from these requirements, so it would be an error on the part of the implementation to pass anything but an array here. Of course, it's good practice to put the type: array in, and if you have other tools that depend on the type keyword and aren't paying attention to the media type with which the Schema Object is used, then you have to do that. But there's no requirement for it to be in the Schema Object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duncanbeevers my most recent commit (after a force-push that was a re-base of the unchanged original commit) added some clarification here, please see if that helps!

@handrews
Copy link
Member Author

The force-push just rebases the unchanged commit in order to get the syntax highlighting for text/event-stream.

In such use cases, either the client or server makes a decision to work with one or more elements in the sequence at a time, but this subsequence is not a complete array in the sense of normal JSON arrays.

OpenAPI Description authors are responsible for avoiding the use of JSON Schema keywords such as `prefixItems`, `minItems`, `maxItems`, `contains`, `minContains`, or `maxContains` that rely on a beginning (for relative positioning) or an ending (to determine if a threshold has been reached or a limit has been exceeded) when the sequence is intended to represent a subsequence of a larger stream.
If such keywords are used, their behavior remains well-defined but may be counter-intuitive for users that expect them to apply to the stream as a whole rather than each subsequence as it is processed.
Copy link

@ThomasRooney ThomasRooney Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally wonder if this trade-off in slight confusion is worth it. The modelling of jsonl/sse in OpenAPI I've personally seen has always been for an indefinite-length stream, and I feel it might be a bit confusing for OAS authors and tool vendors to represent those as a type: array.

An alternative modelling is to have the schema model purely the JSON within the stream, and to validate the schema against each entry.

paths:
  /users/export:
    get:
      tags:
        - Users
      summary: Export user data in JSONL format
      description: >
        This endpoint returns user data in JSONL format, with each line containing a complete user record.
        This format is ideal for large datasets that need to be processed one record at a time.
      responses:
        '200':
          description: User data in JSONL format
          content:
            application/jsonl:
              schema:
                $ref: '#/components/schemas/User'
        '400':
          description: Invalid request
        '500':
          description: Internal server error
components:
  schemas:
    User:
      type: object
      required: [id, name, email]
      properties:
        id:
          type: string
          format: uuid
          description: Unique identifier for the user
        name:
          type: string
          description: User's full name
        email:
          type: string
          format: email
          description: User's email address
        age:
          type: integer
          description: User's age
        city:
          type: string
          description: User's city of residence

This approach has a few advantages for both JSONL and SSE. For JSONL, it:

  1. Matches the majority (I think all?) of examples I've come across in the wild from internal APIs.
  2. Is slightly simpler for tooling vendors to reason about.

Note

E.g. as Speakeasy, one of the client SDK generators as we convert the schema into a native type in each language, with application/jsonl indicative of purely the serialization/deserialization layer and wrapping of an the operation into some kind of Stream<T> response (where T is the subschema) in an SDK method. I.e. since type: array isn't directly exposed to users of an SDK, going with the proposed modelling we'd need to "unwrap"/special-case schemas at the top level as those impact the Stream, rather than the JSON within the stream. It might become similarly "messy" to implement for other vendors like API Gateway and documentation vendors.

An alternative modelling that supports/indicates a finite length JSONL response (note: we haven't actually seen any of these APIs yet, but my variant proposal otherwise closes the door on them) could be to represent that information within a new entry under the media type object, perhaps by following the example set by the encoding object:

paths:
  /users/export:
    get:
      tags:
        - Users
      summary: Export user data in JSONL format
      description: >
        This endpoint returns user data in JSONL format, with each line containing a complete user record.
        This format is ideal for large datasets that need to be processed one record at a time.
      responses:
        '200':
          description: User data in JSONL format
          content:
            application/jsonl:
              stream: # applicable for streaming media types only
                maxItems: 2
              schema:
                $ref: '#/components/schemas/User'
        '400':
          description: Invalid request
        '500':
          description: Internal server error

For SSE, there are also advantages. Consider the special data types for data, id, event defined by the text/event-stream media type. It's commonly modelled with something like this:

paths:
  /stock-updates:
    get:
      tags:
        - ServerSentEvents
      summary: Subscribe to real-time stock market updates
      description: >
       This endpoint streams real-time stock updates to the client using server-sent events (SSE).
       The client must establish a persistent HTTP connection to receive updates.
      responses:
        '200':
          description: Stream of real-time stock updates
          content:
            text/event-stream:
              schema:
                $ref: '#/components/schemas/StockStream'
        '400':
          description: Invalid request
        '500':
          description: Internal server error
components:
  schemas:
    StockStream:
      type: object
      description: A server-sent event containing stock market update content
      required: [id, event, data]
      properties:
        id:
          type: string
          description: Unique identifier for the stock update event
        event:
          type: string
          const: stock_update
          description: Event type
        data:
          $ref: '#/components/schemas/StockUpdate'

    StockUpdate:
      type: object
      properties:
        symbol:
          type: string
          description: Stock ticker symbol
        price:
          type: string
          description: Current stock price
          example: "100.25"

By continuing to represent the stream this way, we could open the door to richer modelling of the top level properties to also fit into the "encoding" object.

E.g. consider the "sentinel" event; something that's become popularised by the AI/LLM APIs by sending [DONE] as the last SSE data chunk. By avoiding the wrapping of the stream in type: array, we could enable the description of these media-type-specific entries in a standardized way through encoding, which will gracefully degrade if a tooling vendor doesn't understand the syntax because it's highly localized rather than "tainting" the JSON schema in the response body:

paths:
  /stock-updates:
    get:
      tags:
        - ServerSentEvents
      summary: Subscribe to real-time stock market updates
      description: >
       This endpoint streams real-time stock updates to the client using server-sent events (SSE).
       The client must establish a persistent HTTP connection to receive updates.
      responses:
        '200':
          description: Stream of real-time stock updates
          content:
            text/event-stream:
              encoding:
                event:
                  sentinel: '[DONE]]'
              stream:
                maxItems: 10
              schema:
                $ref: '#/components/schemas/StockStream'
        '400':
          description: Invalid request
        '500':
          description: Internal server error

By modelling it as type: array, it feels to me like we'd close the door on additional modelling of the top level fields outside of JSON Schema or extensions associated with the media type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ThomasRooney first, let me apologize for not tagging you in the original PR comment- I knew I was missing someone!

I'm going to take a while to think through this further, and also tag @gregsdennis who asked about this direction on Slack.

For now, I'll just state a few important principles that are guiding me here:

  • We model media types, and not protocols implemented on top of media types. There's nothing wrong with modeling protocols, but it can't be done by repurposing the media type layer. It would need a new mechanism, and that's too big of a change for 3.2, which needs to ship by this summer. Really, that would be better as a companion specification as it is beyond the current scope of the OAS.
  • The challenge here is that there's nothing in any of the JSON media types that says that every entry MUST be in the same format. If that were the case, then yes, the natural modeling would be to just model the single entry type. But we need to work with the media types as written, not as would make them more convenient. text/event-stream will tend to be more uniform, but there's no guarantee that someone won't use it in an unexpected way.
  • I feel like you're focusing on the response use case, but there are request use cases where the JSONL being sent is closer to a normal document.
  • I'm not that fixated on prefixItems, and in fact I think a more common relevant use would be to use maxItems as a way to limit the chunk size, although I do not know if that is ever actually done.
  • The Encoding Object is problematic for far too many reasons to get into here, and is due for a re-think in 3.3

This adds a link to the forthcoming media type registry, and
also adds support for various sequential media types:

* `application/json-seq`
* `application/jsonl`
* `application/x-ndjson`
* `text/event-stream`

Given how various modeling and encoding techniques are scattered
throughout the specification, the Media Types section seemed like
the best place to add these, preceded by a link to he new
Media Type Registry which will essentially be a catalog of where
to find the existing guidance for various media types.
@karenetheridge
Copy link
Member

karenetheridge commented Apr 10, 2025

The following are assertions that I believe to be true, but if you disagree, then read it as "this is my opinion" instead.

Given an OAD that looks like:

...
  responses:
    '200':
      content:
        <some media type here>:
          schema:
            ...

The content under the schema keyword here MUST describe the structure of the HTTP response body as a whole, after it has been decoded into the json document model using the mechanism prescribed for this media type. The body MUST be finite, and given the assumption one has enough computing resources available (disk, RAM etc), MUST be capable of being read all at once and decoded all at once. That is, we can expect that it is possible to parse the entire HTTP body, decode it, and apply that decoded content against the json schema for validation or other purposes that json schemas make possible.

Certain media types MAY allow for the possiblity of reading in just a portion of the HTTP response body, decode that portion, and apply that portion of the content to the json schema. However, this should not be assumed -- there are many ways to construct a json schema that would make this possibility impossible (for example, "type":"array","maxItems": 10 means that we need to see the end of the content so we can be sure that the maximum number of array items does not exceed 10. Also, some media types require seeing the end of the document before we can know if the entire content is valid -- for example application/json itself and an array or object requires seeing the final ] or } character at the end of the data).

Therefore, I conclude:

  • some streaming media types and content MAY be describable with the current syntax available to us in OAS v3.1.1. Users and implementers are welcome to go ahead and do so if they see a way forward for their particular usecase.
  • we do not currently have a way of modelling data in which a chunk is read from a stream, decoded and validated, and then we read another chunk and validate that, and so on -- whether the stream is finite or infinite.

An alternative modelling is to have the schema model purely the JSON within the stream, and to validate the schema against each entry.

I agree we should try to do that, but not within the framework of the {"content": { <media_type>: { "schema": { ... } } } } structure. schema here MUST always describe the expected structure of the entire content; repurposing this to describe an individual parsed item in some contexts would be confusing (both to users and to tools).

So, what we really need is:

  • we need to specify the media type of the HTTP request/response body itself (as found in the Content-Type header)
  • we may want to specify some constraints on the body as a whole (for example, "type":"array", "minItems": 3 is easily verifiable for a finite stream, by evaluating this schema after the stream has been fully read from
  • most importantly, we want to provide a schema for each individual item in the stream (not the same as a chunk, as we may read more or less than one item in each chunk, and the calling application must be responsible for concatening chunks together to create coherent items that can be validated), and this schema MIGHT be specified in addition to (not instead of) the schema of the content as a whole

Might all of the above be satisfied simply by adding a keyword itemSchema adjacent to schema? If this keyword is present, then the calling application can presume that the media type is one that can be decoded into coherent items, and that each item shall be evaluated against the itemSchema. Do we need a "streamed": true indicator to live alongside it, to indicate to the application that it should attempt to read the content in a chunked fashion, rather than wait until the end of the body is seen -- we might be able to infer that from the Content-Type itself? [edited to add: it appears that this can be determined unambiguously from the Transfer-Encoding header.]

Example:

...
responses:
  description: logging messages. may be transmitted in chunks.
  '200':
    content:
      application/jsonl:
        schema:
          type: array   # this is mandatory for the JSONL type, but is here just for clarity for the user
          maxItems: 5   # this is okay to use, because as soon as we pass 5 items this is true and can never become false again
        itemSchema:
          type: object
          properties:
            timestamp:
              type: string
            message:
              type: string

@handrews
Copy link
Member Author

@karenetheridge I largely agree with you here. Attempting to manage chunks has been unsatisfactory. And on a walk back from the store just now I more-or-less arrived at the same sort of itemSchema solution, although I might have some of the details a bit different. I'll write more on this later, although everyone should feel free to keep developing @karenetheridge 's idea without waiting for me.

@handrews
Copy link
Member Author

OK I'm going to close this in favor of starting over based on @karenetheridge 's idea. Karen, I'll follow up with you about whether you want to write a PR or be credited on a rewrite that I post (I said I'd gotten to the same point, but my idea was at the stage of "um... we probably need a new field" while you provided a complete write-up and explanation, and worked out many details I hadn't even considered yet).

@handrews handrews closed this Apr 10, 2025
@mikekistler
Copy link
Contributor

I had a long discussion on this topic with @hudlow today, and I now understand the situation much better. I agree that a new keyword seems like the right approach. Unfortunately, I think that means that there is no good solution to describing streaming media types prior to V3.2.

@hudlow
Copy link

hudlow commented Apr 11, 2025

I'm also good with @karenetheridge's proposed direction although I also sort of wonder about a chunkSchema that would be modeled as an array schema and that you could use if you wanted to guarantee a whole number of items would be in a chunk and make further assertions like minimum or maximum items in each chunk.

@handrews
Copy link
Member Author

@mikekistler @hudlow @karenetheridge etc. please continue this discussion at #4171 (comment) (my apologies for not noting this earlier- you were not expected to guess that I had transferred Karen's proposal over there!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement media and encoding Issues regarding media type support and how to encode data (outside of query/path params)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants