Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update token-counting and context overflow API surface #75

Merged
merged 2 commits into from
Mar 7, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 23 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ console.log(await session.prompt("What is your favorite food?"));

The system prompt is special, in that the language model will not respond to it, and it will be preserved even if the context window otherwise overflows due to too many calls to `prompt()`.

If the system prompt is too large (see [below](#tokenization-context-window-length-limits-and-overflow)), then the promise will be rejected with a `"QuotaExceededError"` `DOMException`.
If the system prompt is too large, then the promise will be rejected with a `QuotaExceededError` exception. See [below](#tokenization-context-window-length-limits-and-overflow) for more details on token counting and this new exception type.

### N-shot prompting

Expand Down Expand Up @@ -114,7 +114,7 @@ const result2 = await predictEmoji("This code is so good you should get promoted
Some details on error cases:

* Using both `systemPrompt` and a `{ role: "system" }` prompt in `initialPrompts`, or using multiple `{ role: "system" }` prompts, or placing the `{ role: "system" }` prompt anywhere besides at the 0th position in `initialPrompts`, will reject with a `TypeError`.
* If the combined token length of all the initial prompts (including the separate `systemPrompt`, if provided) is too large, then the promise will be rejected with a `"QuotaExceededError"` `DOMException`.
* If the combined token length of all the initial prompts (including the separate `systemPrompt`, if provided) is too large, then the promise will be rejected with a [`QuotaExceededError` exception](#tokenization-context-window-length-limits-and-overflow).

### Customizing the role per prompt

Expand Down Expand Up @@ -389,31 +389,37 @@ Note that because sessions are stateful, and prompts can be queued, aborting a s
A given language model session will have a maximum number of tokens it can process. Developers can check their current usage and progress toward that limit by using the following properties on the session object:

```js
console.log(`${session.tokensSoFar}/${session.maxTokens} (${session.tokensLeft} left)`);
console.log(`${session.inputUsage} tokens used, out of ${session.inputQuota} tokens available.`);
```

To know how many tokens a string will consume, without actually processing it, developers can use the `countPromptTokens()` method:
To know how many tokens a string will consume, without actually processing it, developers can use the `measureInputUsage()` method:

```js
const numTokens = await session.countPromptTokens(promptString);
const usage = await session.measureInputUsage(promptString);
```

Some notes on this API:

* We do not expose the actual tokenization to developers since that would make it too easy to depend on model-specific details.
* Implementations must include in their count any control tokens that will be necessary to process the prompt, e.g. ones indicating the start or end of the input.
* The counting process can be aborted by passing an `AbortSignal`, i.e. `session.countPromptTokens(promptString, { signal })`.
* The counting process can be aborted by passing an `AbortSignal`, i.e. `session.measureInputUsage(promptString, { signal })`.
* We use the phrases "input usage" and "input quota" in the API, to avoid being specific to the current language model tokenization paradigm. In the future, even if we change paradigms, we anticipate some concept of usage and quota still being applicable, even if it's just string length.

It's possible to send a prompt that causes the context window to overflow. That is, consider a case where `session.countPromptTokens(promptString) > session.tokensLeft` before calling `session.prompt(promptString)`, and then the web developer calls `session.prompt(promptString)` anyway. In such cases, the initial portions of the conversation with the language model will be removed, one prompt/response pair at a time, until enough tokens are available to process the new prompt. The exception is the [system prompt](#system-prompts), which is never removed. If it's not possible to remove enough tokens from the conversation history to process the new prompt, then the `prompt()` or `promptStreaming()` call will fail with an `"QuotaExceededError"` `DOMException` and nothing will be removed.
It's possible to send a prompt that causes the context window to overflow. That is, consider a case where `session.measureInputUsage(promptString) > session.inputQuota - session.inputUsage` before calling `session.prompt(promptString)`, and then the web developer calls `session.prompt(promptString)` anyway. In such cases, the initial portions of the conversation with the language model will be removed, one prompt/response pair at a time, until enough tokens are available to process the new prompt. The exception is the [system prompt](#system-prompts), which is never removed.

Such overflows can be detected by listening for the `"contextoverflow"` event on the session:
Such overflows can be detected by listening for the `"quotaoverflow"` event on the session:

```js
session.addEventListener("contextoverflow", () => {
console.log("Context overflow!");
session.addEventListener("quotaoverflow", () => {
console.log("We've gone past the quota, and some inputs will be dropped!");
});
```

If it's not possible to remove enough tokens from the conversation history to process the new prompt, then the `prompt()` or `promptStreaming()` call will fail with a `QuotaExceededError` exception and nothing will be removed. This is a proposed new type of exception, which subclasses `DOMException`, and replaces the web platform's existing `"QuotaExceededError"` `DOMException`. See [whatwg/webidl#1465](https://github.com/whatwg/webidl/pull/1465) for this proposal. For our purposes, the important part is that it has the following properties:

* `requested`: how many tokens the input consists of
* `quota`: how many tokens were available (which will be less than `requested`, and equal to the value of `session.inputQuota - session.inputUsage` at the time of the call)

### Multilingual content and expected languages

The default behavior for a language model session assumes that the input languages are unknown. In this case, implementations will use whatever "base" capabilities they have available for the language model, and might throw `"NotSupportedError"` `DOMException`s if they encounter languages they don't support.
Expand Down Expand Up @@ -533,28 +539,12 @@ Finally, note that there is a sort of precedent in the (never-shipped) [`FetchOb
### Full API surface in Web IDL

```webidl
// Shared self.ai APIs

partial interface WindowOrWorkerGlobalScope {
[Replaceable, SecureContext] readonly attribute AI ai;
};
// Shared self.ai APIs:
// See https://webmachinelearning.github.io/writing-assistance-apis/#shared-ai-api for most of them.

[Exposed=(Window,Worker), SecureContext]
interface AI {
partial interface AI {
readonly attribute AILanguageModelFactory languageModel;
};

[Exposed=(Window,Worker), SecureContext]
interface AICreateMonitor : EventTarget {
attribute EventHandler ondownloadprogress;

// Might get more stuff in the future, e.g. for
// https://github.com/webmachinelearning/prompt-api/issues/4
};

callback AICreateMonitorCallback = undefined (AICreateMonitor monitor);

enum AIAvailability { "unavailable", "downloadable", "downloading", "available" };
```

```webidl
Expand All @@ -579,19 +569,17 @@ interface AILanguageModel : EventTarget {
optional AILanguageModelPromptOptions options = {}
);

Promise<unsigned long long> countPromptTokens(
Promise<double> measureInputUsage(
AILanguageModelPromptInput input,
optional AILanguageModelPromptOptions options = {}
);
readonly attribute unsigned long long maxTokens;
readonly attribute unsigned long long tokensSoFar;
readonly attribute unsigned long long tokensLeft;
readonly attribute double inputUsage;
readonly attribute unrestricted double inputQuota;
attribute EventHandler onquotaoverflow;

readonly attribute unsigned long topK;
readonly attribute float temperature;

attribute EventHandler oncontextoverflow;

Promise<AILanguageModel> clone(optional AILanguageModelCloneOptions options = {});
undefined destroy();
};
Expand Down