Support embeddings models #3252

dmontagu · 2025-10-24T21:15:59Z

Started this in collaboration with @DouweM, I'd like to ensure consensus on the API design before adding the remaining-providers/logfire-instrumentation/docs/tests.

This is inspired by the approach in haiku.rag, though we adapted it to be a bit closer to the Agent APIs are used (and how you can override model, settings, etc.).

Closes #58

Example:

import asyncio

from pydantic_ai.embeddings import Embedder

embedder = Embedder("openai:text-embedding-3-large")


async def main():
    result = await embedder.embed_documents(["hello", "world"])
    print(result)
    # (IsList, snapshot, and IsDatetime are testing helpers, but you get the point)
    # EmbeddingResult(
    #     embeddings=IsList(
    #         IsList(
    #             snapshot(0.01681816205382347),
    #             snapshot(-0.05579638481140137),
    #             snapshot(0.005661087576299906),
    #             length=1536,
    #         ),
    #         IsList(
    #             snapshot(-0.010592407546937466),
    #             snapshot(-0.03599696233868599),
    #             snapshot(0.030227113515138626),
    #             length=1536,
    #         ),
    #         length=2,
    #     ),
    #     inputs=['hello', 'world'],
    #     input_type='document',
    #     usage=RequestUsage(input_tokens=2),
    #     model_name='text-embedding-3-small',
    #     timestamp=IsDatetime(),
    #     provider_name='openai',
    # )



if __name__ == "__main__":
    asyncio.run(main())

To do:

DouweM · 2025-10-24T21:17:48Z

pydantic_ai_slim/pydantic_ai/embeddings/__init__.py

+from pydantic_ai.models.instrumented import InstrumentationSettings
+from pydantic_ai.providers import infer_provider
+
+KnownEmbeddingModelName = TypeAliasType(


Add a test like this one to verify this is up to date:

pydantic-ai/tests/models/test_model_names.py

Line 52 in efa1e26

def test_known_model_names(): # pragma: lax no cover

pydantic_ai_slim/pydantic_ai/embeddings/__init__.py

pydantic_ai_slim/pydantic_ai/embeddings/cohere.py

pydantic_ai_slim/pydantic_ai/embeddings/base.py

pydantic_ai_slim/pydantic_ai/embeddings/openai.py

pydantic_ai_slim/pydantic_ai/embeddings/settings.py

github-actions · 2025-10-24T21:28:58Z

Docs Preview

commit:	`35abadc`
Preview URL:	https://8ee72db3-pydantic-ai-previews.pydantic.workers.dev

ggozad · 2025-10-29T07:45:24Z

Thanks for starting this and please do let me know if you need help :)
I went quickly through, looks like a great start!

One thing you might want to support from the start is having as part of the EmbeddingSettings is max_context_length and encoding.

Embedding models have a limit of how many tokens of input they can handle. Most providers will raise (openai.BadRequestError iirc for OpenAI, vLLM will return an ugly 500 omg) and then some will say nothing (looking at you Ollama) and just truncate the input so that it fits.

All this is well explained here

I would not necessarily truncate like in the cookbook and still just raise, but I would be grateful to have available from the model side the max_context_length and the encoding so that as a library I can quickly check if a chunk of text fits or not.
Even better if I could get the number of tokens used for some text by a given embedding model.

The only difficulty I see with this is that not all providers expose the tokenizers, for example Ollama does not. But still, would be nice to have it for the providers that do support it, as it's a crucial step when you are trying to chunk a document for embedding.

In haiku.rag, my focus is local models, and like I mentioned Ollama, the popular choice, does not expose a way to tokenize text. So I just do the dumb thing and guesstimate the tokens hoping they are not going to be all that different from some OpenAI model's encoder: I use tiktoken (which you would probably also want to use to support this) and gpt-4o as a "close" model and get an estimate. But I am sure we can do better that this here.

Edit: I am not suggesting that calling embed should calculate the tokens needed on every call. But I imagine that whoever used pydantic AI to embed, would need to also go through the process of chunking some large text, unless they only dealt with embedding queries or simple sentences. So it would be a missed opportunity to not have support for that.

pydantic_ai_slim/pydantic_ai/embeddings/openai.py

gvanrossum

I would like to be able to comment on the API, but there are no tests showing how to call it.

DouweM · 2025-11-14T14:23:36Z

@gvanrossum I'll make some progress on the PR today, but this is the API as it stands today:

import asyncio

from pydantic_ai.embeddings import Embedder

embedder = Embedder("openai:text-embedding-3-large")


async def main():
    result = await embedder.embed("Hello, world!")
    print(result)


if __name__ == "__main__":
    asyncio.run(main())

With Azure OpenAI you currently have to create the model and provider manually, but we'll make Embedder('azure:text-embedding-3-large') work as well:

import asyncio

from pydantic_ai.embeddings import Embedder
from pydantic_ai.embeddings.openai import OpenAIEmbeddingModel
from pydantic_ai.providers.azure import AzureProvider

model = OpenAIEmbeddingModel("text-embedding-3-large", provider=AzureProvider())

embedder = Embedder(model)


async def main():
    result = await embedder.embed("Hello, world!")
    print(result)


if __name__ == "__main__":
    asyncio.run(main())

gvanrossum · 2025-11-14T15:53:04Z

Nice. Do you have a bulk API too? That's essential for typeagent.

DouweM · 2025-11-14T16:39:41Z

@gvanrossum Yep, the embed method is overloaded to take either a str and return list[float], or take Sequence[str] and return list[list[float]], so it's the same method for single and bulk usage. (I'm aware str is itself a Sequence[str], but type checkers appear to handle the overloads correctly.)

DouweM · 2025-11-15T02:11:45Z

@gvanrossum In case you'd like to give it a try pre-release, I've made some progress today, including support for Embedder('azure:...').

…3463)

DouweM · 2025-11-21T21:03:19Z

Unfortunately I haven't managed to get to this this week. Next week should be better.

# Conflicts: # pydantic_ai_slim/pydantic_ai/models/__init__.py

ggozad · 2025-12-22T18:32:28Z

@ggozad @gvanrossum Just curious, what vector DB are you using? I'll want to have a RAG example in our docs.

I use lancedb

gvanrossum · 2025-12-22T21:36:13Z

@gvanrossum Just curious, what vector DB are you using?

I am using something I wrote myself. Persistence is optional (currently embeddings are stored in sqlite). Here's the code:
https://github.com/microsoft/typeagent-py/blob/main/typeagent/aitools/vectorbase.py
It stores a list of N normalized embeddings as a 2D numpy array of float32 with dimension (N, embedding_size). This makes for a fast dot product as long as all the vectors fit in memory. :-)

# Conflicts: # pydantic_ai_slim/pyproject.toml # uv.lock

ggozad · 2025-12-24T08:08:00Z

Thank you @DouweM & @dmontagu ❤️

tomaarsen · 2025-12-24T09:26:08Z

Nice work on this!

paulocoutinhox · 2025-12-24T22:07:26Z

Hi,

It has support for local embeddings?

@staticmethod
def generate_embeddings(texts):
    import os

    os.environ["TOKENIZERS_PARALLELISM"] = "false"

    from sentence_transformers import SentenceTransformer

    model = SentenceTransformer("all-MiniLM-L6-v2")
    return model.encode(texts, convert_to_numpy=True).tolist()
    ```

daikeren · 2025-12-25T03:08:22Z

Hi,

It has support for local embeddings?

@staticmethod
def generate_embeddings(texts):
    import os

    os.environ["TOKENIZERS_PARALLELISM"] = "false"

    from sentence_transformers import SentenceTransformer

    model = SentenceTransformer("all-MiniLM-L6-v2")
    return model.encode(texts, convert_to_numpy=True).tolist()
    ```

Based on document, I think it has.
https://ai.pydantic.dev/embeddings/#sentence-transformers-local

paulocoutinhox · 2025-12-26T03:18:57Z

Thanks @daikeren, implemented here and it works.

Mazyod · 2025-12-28T09:36:25Z

Tiktoken has a subtle footgun that bites us occasionally, which is its lazy loading for tokenizers. We run in an air-gapped environment and best case scenario, it just fails immediately, and the worst case scenario, it hangs the process until it times out.

I haven't tested this update yet, but wanted to share this bit as I do not see any particular support for offline/air-gapped environments.

paulocoutinhox · 2025-12-28T10:02:02Z

Thanks @Mazyod.

In my case, I use it with internet access, so it downloads normally.

Also, since you brought up the subject, how do I save it to a Dockerfile so I don't have to download it again when I need to use it in the application?

Mazyod · 2025-12-28T10:16:20Z

@paulocoutinhox I did this approach for a while, till it failed because tiktoken cache was outdated and it attempted to refetch.

# Cache tiktoken definition from the internet
RUN http_proxy=http://my-proxy \
    https_proxy=http://my-proxy \
    python -c "import tiktoken; tiktoken.encoding_for_model('gpt-4o')"

DouweM · 2026-01-05T22:05:32Z

@Mazyod Good catch about the offline environments; can you please file an issue for that? We can at the very least document a workaround like that one for Docker?

paulocoutinhox · 2026-01-05T22:10:22Z

Yeah. Will be nice a solution for this. 💯

stuaxo · 2026-01-06T16:44:48Z

Does this work with bedrock yet ?

I haven't quite work it out - or maybe something isn't implemented yet ?

By way of example on bedrock (not using pydantic-ai yet) I'm using the embedding models:

"cohere.embed-english-v3" and "amazon.titan-embed-text-v2"

My assumption was that I would prefix these with "bedrock:".

Have the embeddings models available on bedrock been added anywhere in the codebase ?

So far, I get unknown model using various combinations.

DouweM · 2026-01-06T19:27:42Z

@stuartaxonHO The supported providers are documented under https://ai.pydantic.dev/embeddings/#providers; Bedrock is not yet one of them but contribution welcome!

stuaxo · 2026-01-06T20:47:27Z

@DouweM the word provider is a little overloaded - what should happen, to handle embedding models from different providers on an API provider ?

e.g. Cohere models on bedrock, vs the Amazon models (the model parameters are different and one uses embeddings[0] and the other embedding)

Also - in the code, I noticed there's a list of LLM models, but I can't find one of the embedding models, would that need to be added ?

DouweM · 2026-01-07T16:07:28Z

@stuaxo I agree the provider/model terms are a bit overloaded, we use them as described in https://ai.pydantic.dev/models/overview/#models-and-providers. That means that in this context, "model" maps to "API format", and "provider" maps to "API client/base URL".

So in this case we'd need a BedrockEmbeddingModel that uses their embedding API/SDK, which can be used with the existing BedrockProvider that provide the base URL/API client, and should then support any model name their API does, no matter what company actually build that model.

Draft implementation of support for embeddings APIs

3dbad0d

DouweM requested changes Oct 24, 2025

View reviewed changes

Kludex reviewed Nov 10, 2025

View reviewed changes

pydantic_ai_slim/pydantic_ai/embeddings/openai.py Show resolved Hide resolved

DouweM mentioned this pull request Nov 13, 2025

Vector search and embeddings API #58

Closed

gvanrossum reviewed Nov 14, 2025

View reviewed changes

Merge branch 'main' into embeddings-api

467bb8e

DouweM self-assigned this Nov 14, 2025

DouweM mentioned this pull request Nov 14, 2025

Support OpenAI embeddings endpoint pydantic/pydantic-ai-gateway#144

Closed

DouweM added 4 commits November 14, 2025 21:04

Progress is made

00d8e26

Merge branch 'main' into embeddings-api

a133796

fix typing

6d9e2a5

fix tests

9ffddf8

tomaarsen mentioned this pull request Nov 18, 2025

Extension of embeddings draft implementation to support local models #3463

Merged

Extension of embeddings draft implementation to support local models (#…

d777138

…3463)

DouweM changed the title ~~Draft implementation of support for embeddings APIs~~ Support embeddings models Nov 18, 2025

DouweM added 8 commits November 24, 2025 21:31

Merge branch 'main' into embeddings-api

6973b28

# Conflicts: # pydantic_ai_slim/pydantic_ai/models/__init__.py

Split query and documents methods; add tests for SentenceTransformers

5aa6d87

Instrumentation

1e66742

Add sentence-transformers

bd65c4d

tweaks

35a533f

Add max_input_tokens and count_tokens

7392c38

Implement OpenAI token counting using tiktoken

45b2a6d

Test known embedding model names

c336093

DouweM mentioned this pull request Dec 23, 2025

Add SurrealDB examples #3799

Open

DouweM added 6 commits December 23, 2025 23:44

Support dimensions in SentenceTransformerEmbeddingModel

63ebb9a

Add docs

e41ae8b

Add docs

9348e12

Merge branch 'main' into embeddings-api

809664b

# Conflicts: # pydantic_ai_slim/pyproject.toml # uv.lock

skip logfire tests when not installed

accdace

fix tests

35abadc

DouweM marked this pull request as ready for review December 24, 2025 01:58

coverage

16a4f0e

DouweM merged commit 3717d20 into main Dec 24, 2025
19 of 21 checks passed

DouweM deleted the embeddings-api branch December 24, 2025 02:36

dsfaccini mentioned this pull request Jan 7, 2026

Add model catalog #3941

Draft

ggozad mentioned this pull request Jan 9, 2026

VoyageAI embeddings support #3856

Open

6 tasks

bitnahian mentioned this pull request Jan 15, 2026

feat: add BedrockEmbeddingModel for Nova, Cohere and Titan endpoints #4008

Open

6 tasks

Support embeddings models #3252

Support embeddings models #3252

Uh oh!

Conversation

dmontagu commented Oct 24, 2025 • edited by DouweM Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DouweM Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Docs Preview

Uh oh!

ggozad commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

gvanrossum left a comment

Choose a reason for hiding this comment

Uh oh!

DouweM commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gvanrossum commented Nov 14, 2025

Uh oh!

DouweM commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DouweM commented Nov 15, 2025

Uh oh!

DouweM commented Nov 21, 2025

Uh oh!

ggozad commented Dec 22, 2025

Uh oh!

gvanrossum commented Dec 22, 2025

Uh oh!

Uh oh!

ggozad commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomaarsen commented Dec 24, 2025

Uh oh!

paulocoutinhox commented Dec 24, 2025

Uh oh!

daikeren commented Dec 25, 2025

Uh oh!

paulocoutinhox commented Dec 26, 2025

Uh oh!

Mazyod commented Dec 28, 2025

Uh oh!

paulocoutinhox commented Dec 28, 2025

Uh oh!

Mazyod commented Dec 28, 2025

Uh oh!

DouweM commented Jan 5, 2026

Uh oh!

paulocoutinhox commented Jan 5, 2026

Uh oh!

stuaxo commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DouweM commented Jan 6, 2026

Uh oh!

stuaxo commented Jan 6, 2026

Uh oh!

DouweM commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

dmontagu commented Oct 24, 2025 •

edited by DouweM

Loading

github-actions bot commented Oct 24, 2025 •

edited

Loading

ggozad commented Oct 29, 2025 •

edited

Loading

DouweM commented Nov 14, 2025 •

edited

Loading

DouweM commented Nov 14, 2025 •

edited

Loading

ggozad commented Dec 24, 2025 •

edited

Loading

stuaxo commented Jan 6, 2026 •

edited

Loading