Skip to content

Add /v1/embeddings Support to Batches API #3145

@mattf

Description

@mattf

🚀 Describe the new functionality needed

Extend the Llama Stack Batches API to support the /v1/embeddings endpoint, enabling efficient batch processing of embedding requests alongside the existing /v1/chat/completions support.

💡 Why is this needed? What if we don't build it?

The current Batches API implementation only supports /v1/chat/completions requests, limiting batch processing to chat completions. However, many use cases require processing large datasets for embeddings, such as:

  1. Document Indexing: Processing thousands of documents for vector search systems
  2. Similarity Analysis: Computing embeddings for large text corpora
  3. RAG Preprocessing: Batch embedding generation for retrieval-augmented generation pipelines
  4. Semantic Clustering: Large-scale text clustering and categorization
  5. Content Analysis: Bulk processing of user-generated content for recommendations

Other thoughts

No response

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions