feat: introduce api leveling proposal #3317

cdoern · 2025-09-03T14:37:56Z

What does this PR do?

this document outlines different API stability levels, how to enforce them, and next steps

Next Steps

Following the adoption of this document, all existing APIs should follow the enforcement protocol.

relates to #3237

cdoern · 2025-09-03T14:46:07Z

cc @nathan-weinberg since I know you wanted a look at this :)

docs/source/apis/api_leveling.md

mattf

focusing only on the openai compat apis, how would this framework classify the following and why -

/v1/chat/completions
/v1/completions
/v1/files
/v1/batches

cdoern · 2025-09-03T15:23:12Z

great point @mattf

As the next steps indicates I think leveling each API will be a big task and will require an evaluation of how "API Complete" each one is as compared to the OpenAI spec.

Off the top of my head though:

/v1/chat/completions -- seems stable, so likely would remain v1 as this is the most commonly used API
/v1/completions -- also likely stable

with completions and chat completions though I know there are various features going in as we approach 0.3.0, so we'd need to evaluate if any of these are breaking, if they are -- it'd need to be v1alpha until a consumer can reliably upgrade between z-streams without breakages.

/v1/files -- I have seen a few major enhancements to files go in recently like (#3283) and the s3 provider in general, so I'd imagine this would be v1alpha1 for flexibility until we are sure the surface area is complete. I am not the expert here though and would leave this leveling up to folks more familiar with the API surface

/v1/batches -- given the large changes like #3309 and maybe #3261, #3171, etc etc I think this should be v1alpha1 unless we can ensure this churn is over by perhaps 0.3.0.

v1alpha1 IMO should be viewed as a good thing and not a "downgrade" as it allows us to perfect these APIs without issues of support between versions, stability concerns, etc.

and just a note -- the reason I chose alpha and not beta is as the doc states, beta is almost a stepping stone briefly between alpha and v1 where not much major development should happen.

mattf · 2025-09-03T19:45:35Z

great point @mattf

As the next steps indicates I think leveling each API will be a big task and will require an evaluation of how "API Complete" each one is as compared to the OpenAI spec.

working through some examples will help, at least me, understand the framework.

Off the top of my head though:

🙏

/v1/chat/completions -- seems stable, so likely would remain v1 as this is the most commonly used API
/v1/completions -- also likely stable

with completions and chat completions though I know there are various features going in as we approach 0.3.0, so we'd need to evaluate if any of these are breaking, if they are -- it'd need to be v1alpha until a consumer can reliably upgrade between z-streams without breakages.

by definition, the shape of these apis (path, input, output) is set and as stable as openai makes them.

there are variations in completeness of the implementation.

an implementation may be incomplete because the adapter is missing something it can implement, or
an implementation may be inconsistent semantics compared to openai or other adapters, e.g. logprob semantics, or
the implementation may be incomplete because it cannot be completed, e.g. image input to a text-only model or multi-image input to a single-image-only model

the first of these is arguably a gap to close.

the second is arguably a bug to fix, but may not be feasible. for instance, the nvidia service does not honor the number of logprobs requested. the llama api service is stricter about json schema for tool calls than other providers.

how would these provider differences impact the classification of the api under this framework?

how would you describe the third using this framework?

/v1/files -- I have seen a few major enhancements to files go in recently like (#3283) and the s3 provider in general, so I'd imagine this would be v1alpha1 for flexibility until we are sure the surface area is complete. I am not the expert here though and would leave this leveling up to folks more familiar with the API surface

also by definition, the shape is stable. it may be new to stack, but it isn't changing.

in the case of /v1/files, the localfs adapter does not implement expiration, while the s3 adapter does.

how does the difference in adapter implementation impact the classification of the api?

/v1/batches -- given the large changes like #3309 and maybe #3261, #3171, etc etc I think this should be v1alpha1 unless we can ensure this churn is over by perhaps 0.3.0.

by definition here, the shape is also stable.

missing from the implementation is support for /v1/embeddings and /v1/responses, which happen to be part of the openapi spec (endpoint is an enum of /v1/responses, /v1/chat/completinos, /v1/embeddings, /v1/completions).

the api shape for a Batch includes a status enum with fields validating, failed, in_progress, finalizing, completed, expired, cancelling, cancelled. the adapter will not produce a finalizing status.

unlike the inference and files endpoints, /v1/batches only has one inline provider.

how do these aspects impact the classification?

v1alpha1 IMO should be viewed as a good thing and not a "downgrade" as it allows us to perfect these APIs without issues of support between versions, stability concerns, etc.

a practical consideration here, when using the LlamaStackClient or OpenAIClient to interact w/ these apis, a path must be provided. users will need multiple clients to talk to each of the top level api versions, e.g. v1client = Client(base_url=".../v1"), alphaclient = Client(base_url=".../v1alpha1")

and just a note -- the reason I chose alpha and not beta is as the doc states, beta is almost a stepping stone briefly between alpha and v1 where not much major development should happen.

cdoern · 2025-09-03T20:18:24Z

@mattf I think simply put:

If any of our OpenAI compatible APIs are not "API complete" in the sense that a new route is added to the API itself (not a provider), or a breaking change to the api datatypes is made (like changing of required params for a route or return type) that is when something needs to be v1alpha1 or v1beta1.

so if our OpenAI compatible APIs are missing something that is in the OpenAI spec, I think that merits a less than v1 ranking until we are 1:1 with what OpenAI documents.

an example:

lets say post_training needs a massive change and supervised_fine_tune needs a new required parameter. This would happen in llama_stack/apis/post_training/... as well as any providers. this is a breaking change that merits a less than v1 leveling of the entire API.

however, lets say the ollama inference provider needs some new logic in how it internally handles streaming chat completions but no changes are required to the inference router or the api types in llama_stack/apis. This would not be a breaking change and allows this to be a v1 api.

So generally: provider changes do not correlate to API maturity, but rather API level datatype or structural changes to required endpoints necessitate a lower level than v1.

Does this align with your thinking?

ashwinb · 2025-09-03T21:24:41Z

I think there are two aspects here:

@cdoern is mostly concerned about maturity of the API definition ("is this settled", "will this randomly change")
@mattf is thinking about maturity of the API implementation ("does this work as advertised")

And it is not clear whether one should merge both concerns into a single token "v1alpha1". I am sure this issue has been thought of by other projects before?

docs/source/apis/api_leveling.md

r3v5

Nice work, @cdoern! This looks great, thank you!

r3v5

Suggested some small improvements though.

docs/source/apis/api_leveling.md

cdoern · 2025-09-04T14:34:25Z

I think there are two aspects here:

@cdoern is mostly concerned about maturity of the API definition ("is this settled", "will this randomly change")

@mattf is thinking about maturity of the API implementation ("does this work as advertised")

And it is not clear whether one should merge both concerns into a single token "v1alpha1". I am sure this issue has been thought of by other projects before?

yeah @ashwinb that is the proper delineation.

I think in LLS specifically, what matters most is the API definiton: datatypes, API routes+parameter+return types

I kind of view the providers similarly to operators in k8s, where the maturity of an individual operator is not correlated to the maturity of all high level APIs. Of course, there is some intertwined nature, but this proposal is basically saying:

Providers can iterate as much as they want on functionality as long as they work within the bounds of an API, if they need to change the API, then the API should not be /v1, or those breaking changes can only happen on a y-stream release basis.

cdoern · 2025-09-04T14:36:19Z

going to make some of the above suggestions and repush the proposal as is, generally.

skamenan7 · 2025-09-04T15:52:48Z

Great work, @cdoern ! Thanks!

r3v5

lgtm

franciscojavierarceo

lgtm

one last nit would be to include a proposal over the current state of APIs in this

cdoern · 2025-09-09T15:15:06Z

lgtm

one last nit would be to include a proposal over the current state of APIs in this

thanks @franciscojavierarceo ! I think this warrants its own piece of work as a follow up. I was imagining this would merge and then the work to actually define which apis are at which level would happen immediately after so that no assumptions are made without research into the actual stability. hope that makes sense!

cdoern · 2025-09-09T15:16:56Z

@mattf changed the verbiage discussing surface a provider must implement to:

- an API can graduate from `v1alpha` to `v1beta` if the team has identified the extent of the mandatory surface of the API. "mandatory surface" means non-optional routes and the shape of their parameters/return types eg. `/v1/openai/chat/completions`. Optional types can change.

reluctantfuturist · 2025-09-10T17:30:56Z

lgtm
one last nit would be to include a proposal over the current state of APIs in this

thanks @franciscojavierarceo ! I think this warrants its own piece of work as a follow up. I was imagining this would merge and then the work to actually define which apis are at which level would happen immediately after so that no assumptions are made without research into the actual stability. hope that makes sense!

+1 -- let's handle it separately (both defining which APIs are which, and figuring out how to reflect it in the docs)

raghotham · 2025-09-10T18:10:27Z

docs/source/apis/api_leveling.md

+
+Providers can iterate as much as they want on functionality as long as they work within the bounds of an API. If they need to change the API, then the API should not be `/v1`, or those breaking changes can only happen on a y-stream release basis.
+
+### Approval and Announcement Process for Breaking Changes


should probably also include something like this to define a protocol for when there is a breaking change - #3260. A PR that is titled a specific way will not fail the oasdiff check.

+1 it'll make it easier for calling out in the release notes and any sorts of additional announcements (e.g., in discord, email, etc.).

so by this you mean: I should add a bullet here describing how the PR title and commit message should include an indicator of a breaking change? I can add that!

I added a section here, luckily conventional commits outlines how to handle this: https://www.conventionalcommits.org/en/v1.0.0/#specification

bbrowning · 2025-09-11T15:31:07Z

docs/source/apis/api_leveling.md

+
+### Migration of API routes under `/v1alpha`, `/v1beta`, and `/v1`
+
+Instead of placing every API under `/v1`, any API that is not fully stable or complete should go under `/v1alpha` or `/v1beta`. For example, at the time of this writing,  `post_training` belongs here, as well as any OpenAI-compatible API whose surface does not exactly match the upstream OpenAI API it mimics.


If we place any OpenAI-compatible APIs anywhere other than /v1, that means that clients in the wild won't be able to find that API to use it, right? An OpenAI client or the various frameworks in the wild will expect all of the OpenAI APIs to live at the same place, behind a /v1 URL, and not be in different /v1alpha or similar URLs.

I 100% agree, and if we merge this and then find that openai APIs qualify as not v1, I think we should re-evalute this proposal.

I am really just concerned with placing APIs under their proper levels to convey to users our level of confidence in the API. all sorts of new routes, parameters, etc, can be added to a v1 API, just following the proper processes.

ashwinb · 2025-09-11T16:00:12Z

docs/source/apis/api_leveling.md

+- an API can graduate from `v1beta` to `v1` if the API surface and datatypes are complete as identified by the team. The parameters and return types that are mandatory for each route are stable. All aspects of graduating from `v1alpha1` to `v1beta` apply as well.
+- Optional parameters, routes, or parts of the return type can be added after graduating to `v1`
+
+### v1 (stable)


if there is v1, how do things become v2? is there a v2alpha process also?

v2 is something I think I'd avoid, especially given the openai routes. given the criteria here is rather loose (v1 apis can make breaking changes between Y-streams), I don't see a reason to graduate to v2 unless openai does.

I personally have seen most APIs who use this stability schema stick to v1 unless a major refactor occurs.

I added a section for this outlining how a migration to v2 would go

docs/source/apis/api_leveling.md

ashwinb · 2025-09-11T16:20:56Z

docs/source/apis/api_leveling.md

+
+### API Stability vs. Provider Stability
+
+The leveling introduced in this document relates to the stability of the API and not specifically the providers within the API.


we can perhaps add tables which show which providers are considered stable vs. not. at various points we have wanted to do this but never actually managed to run tests and do the maintenance given the surface area. anyhow just a drive-by comment.

agree, but this should likely be a separate document about provider stability guarantees.

ashwinb · 2025-09-11T16:22:13Z

docs/source/apis/api_leveling.md

+
+### Migration of API routes under `/v1alpha`, `/v1beta`, and `/v1`
+
+Instead of placing every API under `/v1`, any API that is not fully stable or complete should go under `/v1alpha` or `/v1beta`. For example, at the time of this writing,  `post_training` belongs here, as well as any OpenAI-compatible API whose surface does not exactly match the upstream OpenAI API it mimics.


I am not sure about the OpenAI stuff. Do you consider the extensions of the OpenAI API we add to be kosher or not?

I think openAI apis are all v1 by default for usability concerns, and the net-new additions we make to them are OK as long as clients can upgrade to use them in a non-breaking manner. This is the grey area between concerns about API structure and API implementation (providers).

Generally speaking, any new routes and their implementations (providers) are ok for these APIs as long as between Z-stream releases, users aren't broken.

I think we could write a whole separate doc just about this "implementation" concern, but this doc is primarily about enforcing API structure norms.

ashwinb

Overall looks pretty good. I would just want some discussion of "v1 -> v2" to also be included in the same document. It seems odd to have everything be just limited to "v1"

cdoern · 2025-09-11T20:04:33Z

@ashwinb I added a section about v1 -> v2 but I kept it general as we might want to iterate on that as we experience how the v1 API lifecycle goes.

this document outlines different API stability levels, how to enforce them, and next steps Signed-off-by: Charlie Doern <[email protected]>

reluctantfuturist · 2025-09-16T16:10:38Z

@cdoern are you waiting for more feedback, or is this good to go?

leseb · 2025-09-16T16:17:43Z

@cdoern are you waiting for more feedback, or is this good to go?

it's good to go, we wanted to make sure that @ashwinb comments have been addressed and that he could give this a final approval.

leseb

All comments have been addressed, let's move forward, thanks!

cdoern requested review from ashwinb, yanxi0830, hardikjshah, raghotham, ehhuang, terrytangyuan, leseb, bbrowning, reluctantfuturist, mattf and slekkala1 as code owners September 3, 2025 14:37

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 3, 2025

nathan-weinberg reviewed Sep 3, 2025

View reviewed changes

mattf reviewed Sep 3, 2025

View reviewed changes

reluctantfuturist mentioned this pull request Sep 2, 2025

Update documentation #2453

Open

11 tasks

reluctantfuturist mentioned this pull request Sep 3, 2025

Road to v1 #2296

Open

40 tasks

cdoern force-pushed the api-stability branch from 2107957 to 0f3ba22 Compare September 3, 2025 20:18

ashwinb reviewed Sep 3, 2025

View reviewed changes

docs/source/apis/api_leveling.md Outdated Show resolved Hide resolved

r3v5 suggested changes Sep 4, 2025

View reviewed changes

cdoern force-pushed the api-stability branch from 0f3ba22 to 6195e99 Compare September 4, 2025 14:41

r3v5 approved these changes Sep 4, 2025

View reviewed changes

cdoern force-pushed the api-stability branch from 4ba538c to 84a4adc Compare September 9, 2025 15:03

cdoern requested a review from franciscojavierarceo September 9, 2025 15:03

franciscojavierarceo approved these changes Sep 9, 2025

View reviewed changes

cdoern force-pushed the api-stability branch from 84a4adc to 06ae8ae Compare September 9, 2025 15:16

cdoern force-pushed the api-stability branch from 06ae8ae to dbfc850 Compare September 10, 2025 00:17

raghotham reviewed Sep 10, 2025

View reviewed changes

cdoern force-pushed the api-stability branch from dbfc850 to 884d541 Compare September 10, 2025 18:52

cdoern requested a review from raghotham September 10, 2025 18:53

cdoern force-pushed the api-stability branch from 884d541 to f4f2f8e Compare September 10, 2025 18:58

bbrowning reviewed Sep 11, 2025

View reviewed changes

ashwinb reviewed Sep 11, 2025

View reviewed changes

docs/source/apis/api_leveling.md Outdated Show resolved Hide resolved

ashwinb reviewed Sep 11, 2025

View reviewed changes

docs/source/apis/api_leveling.md Outdated Show resolved Hide resolved

ashwinb reviewed Sep 11, 2025

View reviewed changes

docs/source/apis/api_leveling.md Outdated Show resolved Hide resolved

ashwinb reviewed Sep 11, 2025

View reviewed changes

ashwinb approved these changes Sep 11, 2025

View reviewed changes

cdoern force-pushed the api-stability branch from f4f2f8e to 7c3ad5b Compare September 11, 2025 20:03

cdoern requested a review from bbrowning September 11, 2025 20:04

feat: introduce api leveling documentation

4e096ba

this document outlines different API stability levels, how to enforce them, and next steps Signed-off-by: Charlie Doern <[email protected]>

cdoern force-pushed the api-stability branch from 7c3ad5b to 4e096ba Compare September 12, 2025 14:00

cdoern mentioned this pull request Sep 15, 2025

feat: introduce API leveling, post_training to v1alpha #3449

Open

leseb approved these changes Sep 16, 2025

View reviewed changes

leseb merged commit 6b855af into llamastack:main Sep 16, 2025
6 checks passed


		Providers can iterate as much as they want on functionality as long as they work within the bounds of an API. If they need to change the API, then the API should not be `/v1`, or those breaking changes can only happen on a y-stream release basis.

		### Approval and Announcement Process for Breaking Changes


		### Migration of API routes under `/v1alpha`, `/v1beta`, and `/v1`

		Instead of placing every API under `/v1`, any API that is not fully stable or complete should go under `/v1alpha` or `/v1beta`. For example, at the time of this writing, `post_training` belongs here, as well as any OpenAI-compatible API whose surface does not exactly match the upstream OpenAI API it mimics.


		### API Stability vs. Provider Stability

		The leveling introduced in this document relates to the stability of the API and not specifically the providers within the API.

feat: introduce api leveling proposal #3317

feat: introduce api leveling proposal #3317

Uh oh!

Conversation

cdoern commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Next Steps

Uh oh!

cdoern commented Sep 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattf left a comment

Choose a reason for hiding this comment

Uh oh!

cdoern commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattf commented Sep 3, 2025

Uh oh!

cdoern commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ashwinb commented Sep 3, 2025

Uh oh!

Uh oh!

r3v5 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

r3v5 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cdoern commented Sep 4, 2025

Uh oh!

cdoern commented Sep 4, 2025

Uh oh!

skamenan7 commented Sep 4, 2025

Uh oh!

r3v5 left a comment

Choose a reason for hiding this comment

Uh oh!

franciscojavierarceo left a comment

Choose a reason for hiding this comment

Uh oh!

cdoern commented Sep 9, 2025

Uh oh!

cdoern commented Sep 9, 2025

Uh oh!

reluctantfuturist commented Sep 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cdoern commented Sep 3, 2025 •

edited

Loading

cdoern commented Sep 3, 2025 •

edited

Loading

cdoern commented Sep 3, 2025 •

edited

Loading

r3v5 left a comment •

edited

Loading

r3v5 left a comment •

edited

Loading