-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
🚀 Describe the new functionality needed
Extend the Llama Stack Batches API to support the /v1/embeddings
endpoint, enabling efficient batch processing of embedding requests alongside the existing /v1/chat/completions
support.
💡 Why is this needed? What if we don't build it?
The current Batches API implementation only supports /v1/chat/completions
requests, limiting batch processing to chat completions. However, many use cases require processing large datasets for embeddings, such as:
- Document Indexing: Processing thousands of documents for vector search systems
- Similarity Analysis: Computing embeddings for large text corpora
- RAG Preprocessing: Batch embedding generation for retrieval-augmented generation pipelines
- Semantic Clustering: Large-scale text clustering and categorization
- Content Analysis: Bulk processing of user-generated content for recommendations
Other thoughts
No response
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request