Feature Request
Add comprehensive embeddings support across multiple providers, including local open-source models.
Motivation
- Vector embeddings are crucial for RAG, semantic search, and similarity matching
- Different providers excel at different embedding tasks
- Gemini uniquely supports multimodal embeddings (text, image, video)
- Local embeddings for privacy-sensitive applications
Proposed Implementation
1. Base Embeddings Interface
# ai_proxy_core/embeddings.py
class EmbeddingsHandler:
async def create_embeddings(
self,
input: Union[str, List[str], bytes],
model: str = "text-embedding-ada-002",
input_type: str = "text" # text, image, video
) -> Dict[str, Any]:
# Route to appropriate provider
if model.startswith("text-embedding"):
return await self._openai_embeddings(input, model)
elif model.startswith("models/embedding"):
return await self._gemini_embeddings(input, model)
elif model in LOCAL_MODELS:
return await self._local_embeddings(input, model)
2. OpenAI Embeddings
async def _openai_embeddings(self, input, model):
response = await openai_client.embeddings.create(
input=input,
model=model # text-embedding-ada-002, text-embedding-3-small, text-embedding-3-large
)
return self._format_response(response)
3. Gemini Multimodal Embeddings
async def _gemini_embeddings(self, input, model):
# Gemini supports text, image, AND video embeddings\!
if isinstance(input, str):
content = types.Content(parts=[types.Part.from_text(input)])
elif self._is_image(input):
content = types.Content(parts=[types.Part.from_image(input)])
elif self._is_video(input):
content = types.Content(parts=[types.Part.from_video(input)])
response = await gemini_client.models.embed_content(
model=model, # models/embedding-001
content=content
)
return self._format_response(response)
4. Local OSS Embeddings
# Support for sentence-transformers, instructor-embeddings, etc.
async def _local_embeddings(self, input, model):
if model == "all-MiniLM-L6-v2":
from sentence_transformers import SentenceTransformer
model = SentenceTransformer(model)
embeddings = model.encode(input)
elif model.startswith("instructor-"):
from InstructorEmbedding import INSTRUCTOR
model = INSTRUCTOR(model)
embeddings = model.encode(input)
elif model.startswith("ollama/"):
# Use Ollama's embedding endpoint
embeddings = await self._ollama_embeddings(input, model)
return {"embeddings": embeddings.tolist()}
5. Model Registry
EMBEDDING_MODELS = {
# OpenAI
"text-embedding-ada-002": {"dim": 1536, "provider": "openai"},
"text-embedding-3-small": {"dim": 1536, "provider": "openai"},
"text-embedding-3-large": {"dim": 3072, "provider": "openai"},
# Gemini
"models/embedding-001": {"dim": 768, "provider": "gemini", "multimodal": True},
# Local OSS
"all-MiniLM-L6-v2": {"dim": 384, "provider": "local"},
"all-mpnet-base-v2": {"dim": 768, "provider": "local"},
"instructor-xl": {"dim": 768, "provider": "local"},
"instructor-large": {"dim": 1024, "provider": "local"},
# Ollama
"ollama/mxbai-embed-large": {"dim": 1024, "provider": "ollama"},
"ollama/nomic-embed-text": {"dim": 768, "provider": "ollama"}
}
Unique Features
Gemini Video Embeddings
# Extract embeddings from video content\!
video_embedding = await handler.create_embeddings(
input=video_bytes,
model="models/embedding-001",
input_type="video"
)
# Use for video similarity search, content matching, etc.
Batch Processing
# Efficient batch embedding generation
embeddings = await handler.create_embeddings(
input=["text1", "text2", "text3"],
model="text-embedding-3-small"
)
Hybrid Search Support
# Combine different embedding models for hybrid search
text_emb = await handler.create_embeddings(text, model="text-embedding-3-large")
image_emb = await handler.create_embeddings(image, model="models/embedding-001")
Configuration
# Embedding-specific settings
embedding_config:
cache_embeddings: true
normalize_vectors: true
default_model: "text-embedding-3-small"
local_model_path: "~/.cache/embeddings"
Benefits
- Unified API for all embedding providers
- Multimodal support via Gemini (text, image, video)
- Privacy options with local models
- Cost optimization by choosing appropriate models
- Dimension flexibility for different use cases
Use Cases
- RAG pipelines - Generate embeddings for document chunks
- Semantic search - Find similar content across modalities
- Recommendation systems - Compute similarity scores
- Clustering - Group similar items
- Video search - Search videos by visual content (Gemini)
Provider Comparison
| Provider |
Models |
Dimensions |
Multimodal |
Cost |
Speed |
| OpenAI |
3 |
1536-3072 |
❌ |
$$ |
Fast |
| Gemini |
1 |
768 |
✅ (video!) |
$ |
Fast |
| Local OSS |
Many |
384-1024 |
❌ |
Free |
Varies |
| Ollama |
Several |
768-1024 |
❌ |
Free |
Fast |
Note: Anthropic does not currently offer an embeddings API.
References
Feature Request
Add comprehensive embeddings support across multiple providers, including local open-source models.
Motivation
Proposed Implementation
1. Base Embeddings Interface
2. OpenAI Embeddings
3. Gemini Multimodal Embeddings
4. Local OSS Embeddings
5. Model Registry
Unique Features
Gemini Video Embeddings
Batch Processing
Hybrid Search Support
Configuration
Benefits
Use Cases
Provider Comparison
Note: Anthropic does not currently offer an embeddings API.
References