VoyageAI embeddings support #3856

ggozad · 2025-12-26T11:02:29Z

Add VoyageAIEmbeddingModel for VoyageAI's embedding API
Support all current VoyageAI models including domain-specific ones (code, finance, law)
Add voyageai optional dependency group

Closes Support for VoyageAI embeddings #3855

Pre-Review Checklist

Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it.
No breaking changes in accordance with the version policy.
Linting and type checking pass per make format and make typecheck.
PR title is fit for the release changelog.

Pre-Merge Checklist

New tests for any fix or new behavior, maintaining 100% coverage.
Updated documentation for new features and behaviors, including docstrings for API docs.

pydantic_ai_slim/pydantic_ai/embeddings/__init__.py

DouweM · 2026-01-08T23:15:12Z

pydantic_ai_slim/pydantic_ai/embeddings/voyageai.py

+    voyageai_truncation: bool
+    """Whether to truncate inputs that exceed the model's context length.
+
+    Defaults to True. If False, an error is raised for inputs that are too long.


With Cohere, I decided to default to False for consistency with OpenAI. Can that be the default here as well? Or do you think that was the wrong call?

Hmmm, it's hard to say, I did not notice to be honest in your original PR, I would have raised this.
I would be inclined to have truncation as default primarily because we do not have usable and accurate tokenizers for all providers. Ollama for example has no tokenizer and truncates silently with no option. There is tiktoken, but that really only covers few providers.

Having said that, since that's the default elsewhere let's keep it like that, I will adapt.

DouweM · 2026-01-08T23:15:33Z

pydantic_ai_slim/pydantic_ai/embeddings/voyageai.py

+    """
+
+    voyageai_output_dtype: Literal['float', 'int8', 'uint8', 'binary', 'ubinary']
+    """The output data type for embeddings.


Our types currently require embeddings to be floats though

DouweM · 2026-01-08T23:16:26Z

pydantic_ai_slim/pydantic_ai/embeddings/voyageai.py

+                to use as defaults for this model.
+        """
+        self._model_name = model_name
+        self._client = AsyncClient(


Please do follow the existing provider class / model class pattern for consistency.

DouweM · 2026-01-08T23:16:38Z

pydantic_ai_slim/pydantic_ai/embeddings/voyageai.py

+                texts=list(inputs),
+                model=self.model_name,
+                input_type=voyageai_input_type,
+                truncation=settings.get('voyageai_truncation', True),


See above, I think this should be False

DouweM · 2026-01-08T23:17:24Z

pydantic_ai_slim/pydantic_ai/embeddings/voyageai.py

+    usage_data = {'total_tokens': total_tokens}
+    response_data = {'model': model, 'usage': usage_data}
+
+    return RequestUsage.extract(


We should only use this if the models have data in genai-prices. In this case, it's better to build a RequestUsage object manually. As you can see in the tests snapshots, this will actually fail to extract anything.

Did not notice. Will look thoroughly and adapt.

Address review comments on PR pydantic#3856: - Create VoyageAIProvider class following Cohere pattern - Change truncation default from True to False - Remove voyageai_output_dtype setting (types require floats) - Build RequestUsage manually instead of using extract() - Update embeddings/__init__.py to pass provider to VoyageAI model - Re-record VCR cassettes with live API

ggozad · 2026-01-09T14:34:59Z

Hi, with regards to the coverage issue that's blocking CI:

The sentence-transformers case in infer_provider_class() (providers/__init__.py) appears to be dead code that was introduced in PR #3252 but never covered by tests.

SentenceTransformerEmbeddingModel doesn't use a provider - it runs locally
infer_embedding_model('sentence-transformers:...') handles this case directly without calling infer_provider_class

I would remove the sentence-transformers case from infer_provider_class() and delete providers/sentence_transformers.py since it serves no purpose, but this is your decision :)

DouweM · 2026-01-13T22:21:59Z

@ggozad The precedent for a local-model provider that doesn't actually do much is OutlinesProvider, which only exists because of the model_profile method, and so that OutlinesModel can have consistent __init__ signature with other models.

So even though it's not strictly necessary, I'd prefer to keep the SentenceTransformersProvider. I could imagine us adding EmbeddingModelProfile for example, which would then need SentenceTransformersProvider.model_profile.

DouweM · 2026-01-13T22:23:18Z

pydantic_ai_slim/pydantic_ai/embeddings/__init__.py


        model_kind = normalize_gateway_provider(model_kind)

+    # Handle models that don't need a provider first


If we revert this change, would we get test coverage again?

Yep, you are right, reverted.

DouweM · 2026-01-13T22:26:35Z

pydantic_ai_slim/pydantic_ai/embeddings/voyageai.py

+
+    # ALL FIELDS MUST BE `voyageai_` PREFIXED SO YOU CAN MERGE THEM WITH OTHER MODELS.
+
+    voyageai_truncation: bool


Since multiple models support truncation to be toggled now, let's move this to the EmbeddingSettings superclass. We should keep supporting cohere_truncate as well, but can prioritize the main truncate

I added truncate to EmbeddingSettings and kept the Cohere settings.

We don't need this field anymore then, right?

tests/test_embeddings.py

DouweM · 2026-01-13T22:29:23Z

pydantic_ai_slim/pydantic_ai/providers/voyageai.py

+            assert api_key is None, 'Cannot provide both `voyageai_client` and `api_key`'
+            assert base_url is None, 'Cannot provide both `voyageai_client` and `base_url`'
+            assert max_retries == 0, 'Cannot provide both `voyageai_client` and `max_retries`'
+            assert timeout is None, 'Cannot provide both `voyageai_client` and `timeout`'


Unless "most users" will need them, I'd prefer to not expose all the arguments on AsyncClient as arguments here: users can just pass their own voyageai_client if they want this level of control.

Done, left only api_key & voyageai_client.

DouweM · 2026-01-13T22:29:53Z

pydantic_ai_slim/pydantic_ai/providers/voyageai.py

+
+            # Only pass base_url if explicitly set; otherwise use VoyageAI's default
+            base_url = base_url or os.getenv('VOYAGE_BASE_URL')
+            self._client = AsyncClient(


If this takes a http_client, we should use a cached version like we do in the openai provider etc.

The VoyageAI sdk does not support custom HTTP clients :(

DouweM · 2026-01-13T22:30:21Z

pydantic_ai_slim/pydantic_ai/embeddings/voyageai.py

+        """The embedding model provider."""
+        return self._provider.name
+
+    async def embed(


It's a shame they don't (seem to) support counting tokens :(

pydantic_ai_slim/pydantic_ai/embeddings/voyageai.py

docs/embeddings.md

DouweM

@ggozad Thanks Yiorgis, a few more comments, + I just merged the Google embedding model, so you'll have a few conflicts to resolve

DouweM · 2026-01-15T19:41:33Z

pydantic_ai_slim/pydantic_ai/embeddings/settings.py

+    """
+
+    truncate: bool
+    """Whether to truncate inputs that exceed the model's context length.


We should specify that the default is False

I think it's worth explaining that you can use the max_input_tokens and count_tokens methods to implement your own (smarter) "truncation"

DouweM · 2026-01-15T19:41:47Z

pydantic_ai_slim/pydantic_ai/embeddings/voyageai.py

+
+    # ALL FIELDS MUST BE `voyageai_` PREFIXED SO YOU CAN MERGE THEM WITH OTHER MODELS.
+
+    voyageai_truncation: bool


We don't need this field anymore then, right?

DouweM · 2026-01-15T19:44:01Z

pydantic_ai_slim/pydantic_ai/embeddings/voyageai.py

+    Defaults to False. If True, inputs that are too long will be truncated.
+    """
+
+    voyageai_input_type: VoyageAIEmbedInputType


Hmm if it only supports query and document anyway I don't think a setting is warranted. If "direct embedding without prefix" is something users would want to do, I think we should make it 'none' instead of None so that the difference is more clear between this field being omitted (and we should use the default input type implied by the embed_query/document method) and explicitly being set to none.

The None option according to their docs does "raw" embedding. My guess is that that's useful say if you want to do clustering or classification. For retrieval one would use document or query.
Will change to none as you suggested.

Hmm thinking about this more, what do you think about changing our EmbedInputType type to accept None as well, plus having Embedder.embed's input_type argument default to None? Then we would not need this custom setting at all anymore. I don't think any of the embeddings APIs require an input type, and if they do we can pick a reasonable default. OpenAI ignores the argument entirely anyway.

Then we would not need a new setting here at all, so even though it's kind of a separate task from this PR, I think it's worth trying it here so we don't introduce the new setting and then immediately deprecate it.

pydantic_ai_slim/pydantic_ai/providers/voyageai.py

pydantic_ai_slim/pydantic_ai/embeddings/__init__.py

DouweM · 2026-01-15T19:48:40Z

tests/test_embeddings.py

+    async def test_query_with_cohere_truncate(self, co_api_key: str):
+        model = CohereEmbeddingModel('embed-v4.0', provider=CohereProvider(api_key=co_api_key))
+        embedder = Embedder(model)
+        result = await embedder.embed_query('Hello, world!', settings={'cohere_truncate': 'END'})  # pyright: ignore[reportArgumentType]


Why do we need the pyright ignore? Maybe if we use the CohereEmbeddingSettings constructor we won't need it

I added the series 4 models that were released I think yesterday. The other models I saw from series 1 & 2 are legacy and I would not include them.

# Conflicts: # docs/api/embeddings.md # tests/test_embeddings.py

…rity when raw embeddings are desired

…geai_client

ggozad · 2026-01-16T09:37:53Z

@DouweM thank you for the thorough review, I think I addressed all your comments, merged the google embeddings changes. I also added the series 4 embedding models that were released yesterday I think.

DouweM · 2026-01-16T17:42:36Z

docs/embeddings.md

+    'voyageai:voyage-3.5',
+    settings=VoyageAIEmbeddingSettings(
+        dimensions=512,  # Reduce output dimensions
+        truncate=True,  # Truncate input if it exceeds context length


This is currently in the "VoyageAI-specific settings" section, but neither of these are actually VoyageAI-specific :) So I think we should mention truncate in the top-level Settings section (where we mention dimensions) already, and change this section to be about voyageai_input_type, similar to the one about google_task_type

DouweM · 2026-01-16T17:44:01Z

pydantic_ai_slim/pydantic_ai/embeddings/cohere.py

                output_dimension=settings.get('dimensions'),
                input_type=cohere_input_type,
                max_tokens=settings.get('cohere_max_tokens'),
-                truncate=settings.get('cohere_truncate', 'NONE'),


Let's specify on the cohere_truncate docstring that it overrides the truncate boolean

DouweM · 2026-01-16T17:53:27Z

pydantic_ai_slim/pydantic_ai/embeddings/voyageai.py

+    Defaults to False. If True, inputs that are too long will be truncated.
+    """
+
+    voyageai_input_type: VoyageAIEmbedInputType


Hmm thinking about this more, what do you think about changing our EmbedInputType type to accept None as well, plus having Embedder.embed's input_type argument default to None? Then we would not need this custom setting at all anymore. I don't think any of the embeddings APIs require an input type, and if they do we can pick a reasonable default. OpenAI ignores the argument entirely anyway.

Then we would not need a new setting here at all, so even though it's kind of a separate task from this PR, I think it's worth trying it here so we don't introduce the new setting and then immediately deprecate it.

DouweM · 2026-01-16T17:54:03Z

pydantic_ai_slim/pydantic_ai/providers/voyageai.py

+    def __init__(self, *, voyageai_client: AsyncClient) -> None: ...
+
+    @overload
+    def __init__(self, *, api_key: str | None = None, voyageai_client: None = None) -> None: ...


This one should just accept the api key right?

With regards to the EmbedInputType:
I think the None choice in VoyageAI is a bit weird. Typically you would want to either embed for a query or documents (when the differentiation is available). The raw embeddings is an edge-case as I mentioned when you want to do something other than semantic search.
This makes me think that None here is ambiguous, as it does not mean "use default behaviour" (query for most) but rather use no prefixes in the embedding. So if we used None as default should we then introduce a "raw" setting?
Given we have meaningful methods for the common RAG case, i.e. embed_query() and embed_documents() and that different vendors have a mix of possibilities, I would keep input_type as vendor-specific.
But happy to discuss this, let me know what you think.

Kludex reviewed Dec 27, 2025

View reviewed changes

pydantic_ai_slim/pydantic_ai/embeddings/__init__.py Show resolved Hide resolved

DouweM added feature New feature request, or PR implementing a feature (enhancement) size: M Medium PR (101-500 weighted lines) labels Jan 6, 2026

DouweM requested changes Jan 8, 2026

View reviewed changes

DouweM self-assigned this Jan 8, 2026

DouweM added the awaiting author revision label Jan 8, 2026

ggozad added 5 commits January 9, 2026 12:25

VoyageAI embeddings support

78c4473

VoyageAI cassettes

50b1f64

Update docs for VoyageAI with example

a166cb7

Fix example tests

aeb5ccc

ggozad force-pushed the voyageai-embeddings branch from b934d5b to 88ab61b Compare January 9, 2026 13:05

Add VoyageAI provider tests for full coverage

bcdd6f2

DouweM requested changes Jan 13, 2026

View reviewed changes

DouweM reviewed Jan 14, 2026

View reviewed changes

docs/embeddings.md Show resolved Hide resolved

ggozad added 5 commits January 15, 2026 12:28

Fix coverage by restoring original infer_embedding_model structure

324e3bd

Add truncate setting to EmbeddingSettings base class

f079fa9

Simplify VoyageAIProvider to only accept api_key and voyageai_client

8526d4b

Add VoyageAI to API docs

87c7b72

Tests & vcr for truncate option in embeddings

093eaa5

ggozad force-pushed the voyageai-embeddings branch from 78e615f to 093eaa5 Compare January 15, 2026 11:21

Add voyageai_input_type setting

b283b0b

DouweM requested changes Jan 15, 2026

View reviewed changes

ggozad added 5 commits January 16, 2026 10:36

Merge remote-tracking branch 'upstream/main' into voyageai-embeddings

edde0de

# Conflicts: # docs/api/embeddings.md # tests/test_embeddings.py

Add voyage-4-large, voyage-4, voyage-4-lite to supported models

95e2f45

Document truncate option, remove voyageai_truncation override

31d5e4a

Change voyageai_input_type to use none string instead of None for cla…

7820d20

…rity when raw embeddings are desired

Add overloads to VoyageAIProvider for mutually exclusive api_key/voya…

c35fa2c

…geai_client

Fix pyright ignores by using typed settings variables

53e08af

DouweM mentioned this pull request Jan 16, 2026

feat: add BedrockEmbeddingModel for Nova, Cohere and Titan endpoints #4008

Open

6 tasks

DouweM requested changes Jan 16, 2026

View reviewed changes

ggozad added 2 commits January 19, 2026 10:24

Update docs, clean up overload signature of VoyageAIs __init__()

0171011

Merge upstream/main

ca963f6


		model_kind = normalize_gateway_provider(model_kind)

		# Handle models that don't need a provider first


		# ALL FIELDS MUST BE `voyageai_` PREFIXED SO YOU CAN MERGE THEM WITH OTHER MODELS.

		voyageai_truncation: bool

VoyageAI embeddings support #3856

Are you sure you want to change the base?

VoyageAI embeddings support #3856

Conversation

ggozad commented Dec 26, 2025

Pre-Review Checklist

Pre-Merge Checklist

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggozad commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DouweM commented Jan 13, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DouweM left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggozad commented Jan 16, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggozad commented Jan 9, 2026 •

edited

Loading