[ENH] Add local support for schema #5714

jairad26 · 2025-10-22T03:08:33Z

Description of changes

Summarize the changes made by this PR.

Improvements & Bug fixes
- This PR adds support for schema in sqlite sysdb, correctly reconciling with schema, legacy metadata, and supporting configuration updates. It also adds support for passing schema via bindings, to allow for local chroma support. It also updates cli usage of to allow copying of schema
New functionality
- ...

Test plan

How are these changes tested?

expanded schema e2e tests to ensure bindings and single node all work as intended

[ x] Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?

Observability plan

What is the plan to instrument and monitor this change?

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

jairad26 · 2025-10-22T03:08:56Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

github-actions · 2025-10-22T03:08:57Z

rust/frontend/src/get_collection_with_segments_provider.rs

rust/types/src/collection_schema.rs

rust/sqlite/migrations/sysdb/00010-collection-schema.sqlite.sql

propel-code-bot · 2025-10-23T22:05:05Z

rust/sysdb/src/sqlite.rs

+                let schema = match first_row.get::<Option<&str>, _>(7) {
+                    Some(json_str) if !json_str.trim().is_empty() && json_str.trim() != "null" => {


[BestPractice]

Error handling inconsistency: The schema deserialization logic handles empty strings and "null" values differently across the codebase. In the SQLite implementation, it checks for both conditions, but other parts of the code may not handle these edge cases consistently.

match first_row.get::<Option<&str>, _>(7) { Some(json_str) if !json_str.trim().is_empty() && json_str.trim() != "null" => { // This logic should be centralized in a helper function // to ensure consistency across all schema deserialization points } // ... }

Consider creating a centralized deserialize_schema_from_db helper function to ensure consistent handling of these edge cases.

Context for Agents

[**BestPractice**] Error handling inconsistency: The schema deserialization logic handles empty strings and "null" values differently across the codebase. In the SQLite implementation, it checks for both conditions, but other parts of the code may not handle these edge cases consistently. ```rust match first_row.get::<Option<&str>, _>(7) { Some(json_str) if !json_str.trim().is_empty() && json_str.trim() != "null" => { // This logic should be centralized in a helper function // to ensure consistency across all schema deserialization points } // ... } ``` Consider creating a centralized `deserialize_schema_from_db` helper function to ensure consistent handling of these edge cases. File: rust/sysdb/src/sqlite.rs Line: 844

propel-code-bot · 2025-10-24T06:43:56Z

rust/sqlite/src/db.rs

+        // Check if database has more applied migrations than available source migrations
+        if applied_migrations.len() > source_migrations.len() {
+            return Ok(vec![]);


[BestPractice]

Potential null database migration issue: The code checks if applied_migrations.len() > source_migrations.len() and returns empty migrations, but this could mask real migration problems. If the database has more applied migrations than source migrations, this suggests a version mismatch or corrupted migration state that should be explicitly handled.

Suggested Change

Suggested change

// Check if database has more applied migrations than available source migrations

if applied_migrations.len() > source_migrations.len() {

return Ok(vec![]);

// Check if database has more applied migrations than available source migrations

if applied_migrations.len() > source_migrations.len() {

return Err(SqliteError::MigrationVersionMismatch(

format!(

"Database has {} applied migrations but only {} source migrations available",

applied_migrations.len(),

source_migrations.len()

)

));

}

⚡ Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents

[**BestPractice**] Potential null database migration issue: The code checks `if applied_migrations.len() > source_migrations.len()` and returns empty migrations, but this could mask real migration problems. If the database has more applied migrations than source migrations, this suggests a version mismatch or corrupted migration state that should be explicitly handled. <details> <summary>Suggested Change</summary> ```suggestion // Check if database has more applied migrations than available source migrations if applied_migrations.len() > source_migrations.len() { return Err(SqliteError::MigrationVersionMismatch( format!( "Database has {} applied migrations but only {} source migrations available", applied_migrations.len(), source_migrations.len() ) )); } ``` ⚡ **Committable suggestion** Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. </details> File: rust/sqlite/src/db.rs Line: 159

jairad26 · 2025-10-24T23:11:14Z

rust/segment/src/local_hnsw.rs

-            .get_hnsw_config_with_legacy_fallback(segment)?
+            .schema
+            .as_ref()
+            .map(|schema| schema.get_internal_hnsw_config_with_legacy_fallback(segment))


@sanketkedia is it fine to not reconcile on the writer? i believe it should come through frontend, so it should reconcile

Looking at the code, I think we need this here. The reconcile of schema with config happens in the handler for BackfillMessage. And then we need to reconcile this with legacy metadata here so this seems correct and necessary

Ideally, I'd have liked all the three reconciles to happen at one place but that's not what it is now even with collection config

i meant reconcile schema and config. yes the legacy fallback is needed everywhere

jairad26

use validate_schema from validators.rs in bindings.rs

jairad26 · 2025-10-24T23:17:21Z

nvm

chromadb/api/collection_configuration.py

rust/sysdb/src/sqlite.rs

sanketkedia · 2025-10-27T18:29:29Z

rust/types/src/collection_configuration.rs

                    // This is for backwards compatibility so that users who migrate to distributed
                    // from local don't break their code.
                    KnnIndex::Spann => {
                        let internal_config = if let Some(space) = hnsw.space {


shouldn't you reconcile with legacy metadata here before getting the space?

In general, I feel like we should remove the blanket reconcile at the top and reconcile here in various places for readability

sanketkedia · 2025-10-27T20:54:00Z

In general, I feel like the reconciliation logic is all over the place. But that's from before (collection config) so ok for now. But ideally once you've read from sysdb, you should assume that it has schema set and properly reconciled with both collection config and legacy metadata and just use it downstream

## Description of changes _Summarize the changes made by this PR._ - Improvements & Bug fixes - This PR adds support for schema in sqlite sysdb, correctly reconciling with schema, legacy metadata, and supporting configuration updates. It also adds support for passing schema via bindings, to allow for local chroma support. It also updates cli usage of to allow copying of schema - New functionality - ... ## Test plan _How are these changes tested?_ expanded schema e2e tests to ensure bindings and single node all work as intended - [ x] Tests pass locally with `pytest` for python, `yarn test` for js, `cargo test` for rust ## Migration plan _Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?_ ## Observability plan _What is the plan to instrument and monitor this change?_ ## Documentation Changes _Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the [docs section](https://github.com/chroma-core/chroma/tree/main/docs/docs.trychroma.com)?_

This was referenced Oct 22, 2025

[ENH] Embed query strings in search api #5599

Merged

[ENH] Add Schema to js client #5621

Merged

[ENH] Python client cleanup: export hosted ef from utils, fix sparse auto-embed #5710

Merged

jairad26 force-pushed the jai/local-schema-support branch from 26fbb95 to 50dc87a Compare October 22, 2025 03:28

blacksmith-sh bot deleted a comment from jairad26 Oct 22, 2025

jairad26 force-pushed the jai/local-schema-support branch from 50dc87a to 4f90e30 Compare October 22, 2025 04:20

jairad26 force-pushed the jai/embed-search-query branch from d88801a to a1642ad Compare October 22, 2025 16:48

jairad26 force-pushed the jai/local-schema-support branch from 4f90e30 to 4170cd0 Compare October 22, 2025 16:48

blacksmith-sh bot deleted a comment from jairad26 Oct 22, 2025

jairad26 force-pushed the jai/embed-search-query branch from a1642ad to d9975c0 Compare October 22, 2025 17:13

jairad26 force-pushed the jai/local-schema-support branch 2 times, most recently from cc49da4 to 82c400d Compare October 22, 2025 17:15

jairad26 force-pushed the jai/embed-search-query branch from d9975c0 to 73d6c8c Compare October 22, 2025 17:15

jairad26 force-pushed the jai/local-schema-support branch from 82c400d to 5a6e468 Compare October 22, 2025 17:35

jairad26 force-pushed the jai/embed-search-query branch from 73d6c8c to f60a76e Compare October 22, 2025 17:35

jairad26 force-pushed the jai/local-schema-support branch from 5a6e468 to 8e145e7 Compare October 22, 2025 18:12

blacksmith-sh bot deleted a comment from jairad26 Oct 22, 2025

jairad26 force-pushed the jai/local-schema-support branch 3 times, most recently from 45ca931 to 3e6652e Compare October 22, 2025 23:43

jairad26 force-pushed the jai/embed-search-query branch from cb029e5 to ebae17b Compare October 22, 2025 23:43

blacksmith-sh bot deleted a comment from jairad26 Oct 23, 2025

jairad26 force-pushed the jai/embed-search-query branch from ebae17b to 9bc109b Compare October 23, 2025 01:02

jairad26 force-pushed the jai/local-schema-support branch from 3e6652e to bdf764b Compare October 23, 2025 01:02

jairad26 mentioned this pull request Oct 23, 2025

[BUG] Default create path with no config or schema does not populate default ef in schema #5726

Merged

jairad26 force-pushed the jai/local-schema-support branch from bdf764b to 2def8f3 Compare October 23, 2025 01:05

jairad26 force-pushed the jai/embed-search-query branch from 9bc109b to d22f13f Compare October 23, 2025 01:05

jairad26 force-pushed the jai/local-schema-support branch from 2def8f3 to 1695d4a Compare October 23, 2025 02:00

propel-code-bot bot reviewed Oct 23, 2025

View reviewed changes

rust/frontend/src/get_collection_with_segments_provider.rs Show resolved Hide resolved

propel-code-bot bot reviewed Oct 23, 2025

View reviewed changes

rust/types/src/collection_schema.rs Show resolved Hide resolved

propel-code-bot bot reviewed Oct 23, 2025

View reviewed changes

rust/sqlite/migrations/sysdb/00010-collection-schema.sqlite.sql Show resolved Hide resolved

propel-code-bot bot reviewed Oct 23, 2025

View reviewed changes

jairad26 force-pushed the jai/local-schema-support branch 7 times, most recently from 17e078b to 32d4b95 Compare October 24, 2025 06:37

propel-code-bot bot reviewed Oct 24, 2025

View reviewed changes

jairad26 force-pushed the jai/local-schema-support branch 3 times, most recently from 8ec4278 to bee55b4 Compare October 24, 2025 17:09

jairad26 mentioned this pull request Oct 24, 2025

[ENH] Export schema and search types from chromadb.api #5736

Merged

jairad26 commented Oct 24, 2025

View reviewed changes

jairad26 force-pushed the jai/local-schema-support branch from bee55b4 to 296af1b Compare October 24, 2025 23:19

propel-code-bot bot reviewed Oct 24, 2025

View reviewed changes

chromadb/api/collection_configuration.py Show resolved Hide resolved

jairad26 force-pushed the jai/local-schema-support branch from 296af1b to dd3f346 Compare October 27, 2025 17:44

propel-code-bot bot reviewed Oct 27, 2025

View reviewed changes

rust/sysdb/src/sqlite.rs Show resolved Hide resolved

sanketkedia reviewed Oct 27, 2025

View reviewed changes

[ENH] Add local support for schema

df8c8c8

jairad26 force-pushed the jai/local-schema-support branch from dd3f346 to df8c8c8 Compare October 27, 2025 19:47

sanketkedia approved these changes Oct 27, 2025

View reviewed changes

jairad26 merged commit 998da94 into main Oct 27, 2025
120 of 122 checks passed

		let schema = match first_row.get::<Option<&str>, _>(7) {
		Some(json_str) if !json_str.trim().is_empty() && json_str.trim() != "null" => {

[ENH] Add local support for schema #5714

[ENH] Add local support for schema #5714

Uh oh!

Conversation

jairad26 commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of changes

Test plan

Migration plan

Observability plan

Documentation Changes

Uh oh!

jairad26 commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 22, 2025

Reviewer Checklist

Testing, Bugs, Errors, Logs, Documentation

System Compatibility

Quality

Uh oh!

Uh oh!

Uh oh!

Uh oh!

propel-code-bot bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

propel-code-bot bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

jairad26 Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

sanketkedia Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

sanketkedia Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

jairad26 Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

jairad26 left a comment

Choose a reason for hiding this comment

Uh oh!

jairad26 commented Oct 24, 2025

Uh oh!

Uh oh!

Uh oh!

sanketkedia Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

sanketkedia Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

jairad26 Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

sanketkedia commented Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jairad26 commented Oct 22, 2025 •

edited

Loading

jairad26 commented Oct 22, 2025 •

edited

Loading