Skip to content

Conversation

jairad26
Copy link
Contributor

@jairad26 jairad26 commented Oct 14, 2025

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • This PR adds the hosted splade embedding function to both js and python, allowing users to use chroma api keys and the chroma hosted embedding service to compute sparse vectors for a given document/query. This works across both js and python clients
  • New functionality
    • ...

Test plan

How are these changes tested?
Manually tested both embedding functions, work as intended

  • [ x] Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?

Observability plan

What is the plan to instrument and monitor this change?

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

Copy link
Contributor Author

jairad26 commented Oct 14, 2025

@github-actions
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@jairad26 jairad26 force-pushed the jai/hosted-splade-ef branch from bd0d7b2 to 3d3426a Compare October 14, 2025 18:46
@jairad26 jairad26 marked this pull request as ready for review October 14, 2025 18:52
@propel-code-bot
Copy link
Contributor

propel-code-bot bot commented Oct 14, 2025

Add Hosted Splade Embedding Functionality for Python and JS Clients

This pull request introduces a new hosted sparse embedding function for the Splade model family, enabling users to compute Splade-based sparse vectors using the Chroma hosted embedding service via both Python and JavaScript clients. It adds fully functional implementations, API schema definitions, and integration hooks for the new embedding function, ensuring unified support, configuration, validation, and documentation across both client ecosystems. Substantial updates also include dependency and package management, registration and retrieval logic for sparse embedding functions, and comprehensive test coverage for core functionality and error conditions.

Key Changes

• Introduced new ChromaCloudSpladeEmbeddingFunction class for both Python (chromadb/utils/embedding_functions/chroma_cloud_splade_embedding_function.py) and TypeScript (clients/new-js/packages/ai-embeddings/chroma-cloud-splade/src/index.ts), supporting hosted sparse embedding via the Splade model.
• Added JSON schema for chroma-cloud-splade embedding function configurations in schemas/embedding_functions/chroma-cloud-splade.json.
• Integrated registration, validation, and retrieval for sparse embedding functions in core client registry (clients/new-js/packages/chromadb/src/embedding-function.ts).
• Implemented robust error handling, resource management, and configuration update validation in both client implementations.
• Added and updated test suites (index.test.ts) for core and edge cases (including API key validation, config, and embedding correctness) in JS client.
• Enabled use of hosted Splade embedding in JS through new package (clients/new-js/packages/ai-embeddings/chroma-cloud-splade), with build/test/config scripts, README, and integration in meta modules.
• Updated dependency graphs (pnpm-lock.yaml, package.json) to include the new package and schema connections; exposed functionality via all meta-package.
• Extended schema utilities (clients/new-js/packages/ai-embeddings/common/src/schema-utils.ts) to support Splade.
• Added documentation (README) and configuration for package publication and local builds/tests.

Affected Areas

chromadb/utils/embedding_functions/chroma_cloud_splade_embedding_function.py (Python implementation)
clients/new-js/packages/ai-embeddings/chroma-cloud-splade/ (JS implementation, config, tests, docs)
clients/new-js/packages/chromadb/src/embedding-function.ts (core registry for embedding functions)
schemas/embedding_functions/chroma-cloud-splade.json
clients/new-js/packages/ai-embeddings/common/src/schema-utils.ts
clients/new-js/packages/ai-embeddings/all/ and related aggregators
• Package management: pnpm-lock.yaml, various package.json files

This summary was automatically generated by @propel-code-bot

@jairad26 jairad26 force-pushed the jai/hosted-splade-ef branch from 3d3426a to 53968a0 Compare October 14, 2025 19:47
@jairad26 jairad26 force-pushed the jai/hosted-splade-ef branch from 20bcadf to 6515339 Compare October 16, 2025 20:33
}

try:
import httpx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Redundant httpx import inside method. The httpx library is already imported and validated in the constructor (line 32). The import on line 85 inside the __call__ method is unnecessary and could impact performance with repeated calls.

Suggested change
import httpx
response = self._session.post(self._api_url, json=payload, timeout=60)

Note: Reusing the existing httpx.Client instance (self._session) is more efficient as it leverages connection pooling and avoids redundant imports.

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents
[**BestPractice**]

Redundant httpx import inside method. The httpx library is already imported and validated in the constructor (line 32). The import on line 85 inside the `__call__` method is unnecessary and could impact performance with repeated calls.

```suggestion
response = self._session.post(self._api_url, json=payload, timeout=60)
```

*Note: Reusing the existing httpx.Client instance (self._session) is more efficient as it leverages connection pooling and avoids redundant imports.*

⚡ **Committable suggestion**

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

File: chromadb/utils/embedding_functions/chroma_cloud_splade_embedding_function.py
Line: 85

Copy link
Member

@philipithomas philipithomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jairad26 jairad26 merged commit 170673c into main Oct 16, 2025
60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants