Skip to content

Conversation

@jairad26
Copy link
Contributor

@jairad26 jairad26 commented Oct 22, 2025

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • This PR adds support for schema in sqlite sysdb, correctly reconciling with schema, legacy metadata, and supporting configuration updates. It also adds support for passing schema via bindings, to allow for local chroma support. It also updates cli usage of to allow copying of schema
  • New functionality
    • ...

Test plan

How are these changes tested?

expanded schema e2e tests to ensure bindings and single node all work as intended

  • [ x] Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?

Observability plan

What is the plan to instrument and monitor this change?

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

Copy link
Contributor Author

jairad26 commented Oct 22, 2025

@github-actions
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@jairad26 jairad26 force-pushed the jai/local-schema-support branch from 26fbb95 to 50dc87a Compare October 22, 2025 03:28
@blacksmith-sh blacksmith-sh bot deleted a comment from jairad26 Oct 22, 2025
@jairad26 jairad26 force-pushed the jai/local-schema-support branch from 50dc87a to 4f90e30 Compare October 22, 2025 04:20
@jairad26 jairad26 force-pushed the jai/embed-search-query branch from d88801a to a1642ad Compare October 22, 2025 16:48
@jairad26 jairad26 force-pushed the jai/local-schema-support branch from 4f90e30 to 4170cd0 Compare October 22, 2025 16:48
@blacksmith-sh blacksmith-sh bot deleted a comment from jairad26 Oct 22, 2025
@jairad26 jairad26 force-pushed the jai/embed-search-query branch from a1642ad to d9975c0 Compare October 22, 2025 17:13
@jairad26 jairad26 force-pushed the jai/local-schema-support branch 2 times, most recently from cc49da4 to 82c400d Compare October 22, 2025 17:15
@jairad26 jairad26 force-pushed the jai/embed-search-query branch from d9975c0 to 73d6c8c Compare October 22, 2025 17:15
@jairad26 jairad26 force-pushed the jai/local-schema-support branch from 82c400d to 5a6e468 Compare October 22, 2025 17:35
@jairad26 jairad26 force-pushed the jai/embed-search-query branch from 73d6c8c to f60a76e Compare October 22, 2025 17:35
@jairad26 jairad26 force-pushed the jai/local-schema-support branch from 5a6e468 to 8e145e7 Compare October 22, 2025 18:12
@blacksmith-sh blacksmith-sh bot deleted a comment from jairad26 Oct 22, 2025
@jairad26 jairad26 force-pushed the jai/local-schema-support branch 3 times, most recently from 45ca931 to 3e6652e Compare October 22, 2025 23:43
@jairad26 jairad26 force-pushed the jai/embed-search-query branch from cb029e5 to ebae17b Compare October 22, 2025 23:43
@blacksmith-sh blacksmith-sh bot deleted a comment from jairad26 Oct 23, 2025
@jairad26 jairad26 force-pushed the jai/embed-search-query branch from ebae17b to 9bc109b Compare October 23, 2025 01:02
@jairad26 jairad26 force-pushed the jai/local-schema-support branch from 3e6652e to bdf764b Compare October 23, 2025 01:02
@jairad26 jairad26 force-pushed the jai/local-schema-support branch from bdf764b to 2def8f3 Compare October 23, 2025 01:05
@jairad26 jairad26 force-pushed the jai/embed-search-query branch from 9bc109b to d22f13f Compare October 23, 2025 01:05
@jairad26 jairad26 force-pushed the jai/local-schema-support branch from 2def8f3 to 1695d4a Compare October 23, 2025 02:00
Comment on lines +843 to +844
let schema = match first_row.get::<Option<&str>, _>(7) {
Some(json_str) if !json_str.trim().is_empty() && json_str.trim() != "null" => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Error handling inconsistency: The schema deserialization logic handles empty strings and "null" values differently across the codebase. In the SQLite implementation, it checks for both conditions, but other parts of the code may not handle these edge cases consistently.

match first_row.get::<Option<&str>, _>(7) {
    Some(json_str) if !json_str.trim().is_empty() && json_str.trim() != "null" => {
        // This logic should be centralized in a helper function
        // to ensure consistency across all schema deserialization points
    }
    // ...
}

Consider creating a centralized deserialize_schema_from_db helper function to ensure consistent handling of these edge cases.

Context for Agents
[**BestPractice**]

Error handling inconsistency: The schema deserialization logic handles empty strings and "null" values differently across the codebase. In the SQLite implementation, it checks for both conditions, but other parts of the code may not handle these edge cases consistently.

```rust
match first_row.get::<Option<&str>, _>(7) {
    Some(json_str) if !json_str.trim().is_empty() && json_str.trim() != "null" => {
        // This logic should be centralized in a helper function
        // to ensure consistency across all schema deserialization points
    }
    // ...
}
```

Consider creating a centralized `deserialize_schema_from_db` helper function to ensure consistent handling of these edge cases.

File: rust/sysdb/src/sqlite.rs
Line: 844

@jairad26 jairad26 force-pushed the jai/local-schema-support branch 7 times, most recently from 17e078b to 32d4b95 Compare October 24, 2025 06:37
Comment on lines 157 to 159
// Check if database has more applied migrations than available source migrations
if applied_migrations.len() > source_migrations.len() {
return Ok(vec![]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Potential null database migration issue: The code checks if applied_migrations.len() > source_migrations.len() and returns empty migrations, but this could mask real migration problems. If the database has more applied migrations than source migrations, this suggests a version mismatch or corrupted migration state that should be explicitly handled.

Suggested Change
Suggested change
// Check if database has more applied migrations than available source migrations
if applied_migrations.len() > source_migrations.len() {
return Ok(vec![]);
// Check if database has more applied migrations than available source migrations
if applied_migrations.len() > source_migrations.len() {
return Err(SqliteError::MigrationVersionMismatch(
format!(
"Database has {} applied migrations but only {} source migrations available",
applied_migrations.len(),
source_migrations.len()
)
));
}

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents
[**BestPractice**]

Potential null database migration issue: The code checks `if applied_migrations.len() > source_migrations.len()` and returns empty migrations, but this could mask real migration problems. If the database has more applied migrations than source migrations, this suggests a version mismatch or corrupted migration state that should be explicitly handled.

<details>
<summary>Suggested Change</summary>

```suggestion
// Check if database has more applied migrations than available source migrations
if applied_migrations.len() > source_migrations.len() {
    return Err(SqliteError::MigrationVersionMismatch(
        format!(
            "Database has {} applied migrations but only {} source migrations available", 
            applied_migrations.len(), 
            source_migrations.len()
        )
    ));
}
```

⚡ **Committable suggestion**

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

</details>

File: rust/sqlite/src/db.rs
Line: 159

@jairad26 jairad26 force-pushed the jai/local-schema-support branch 3 times, most recently from 8ec4278 to bee55b4 Compare October 24, 2025 17:09
.get_hnsw_config_with_legacy_fallback(segment)?
.schema
.as_ref()
.map(|schema| schema.get_internal_hnsw_config_with_legacy_fallback(segment))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanketkedia is it fine to not reconcile on the writer? i believe it should come through frontend, so it should reconcile

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code, I think we need this here. The reconcile of schema with config happens in the handler for BackfillMessage. And then we need to reconcile this with legacy metadata here so this seems correct and necessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, I'd have liked all the three reconciles to happen at one place but that's not what it is now even with collection config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i meant reconcile schema and config. yes the legacy fallback is needed everywhere

Copy link
Contributor Author

@jairad26 jairad26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use validate_schema from validators.rs in bindings.rs

@jairad26
Copy link
Contributor Author

nvm

@jairad26 jairad26 force-pushed the jai/local-schema-support branch from bee55b4 to 296af1b Compare October 24, 2025 23:19
@jairad26 jairad26 force-pushed the jai/local-schema-support branch from 296af1b to dd3f346 Compare October 27, 2025 17:44
// This is for backwards compatibility so that users who migrate to distributed
// from local don't break their code.
KnnIndex::Spann => {
let internal_config = if let Some(space) = hnsw.space {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't you reconcile with legacy metadata here before getting the space?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I feel like we should remove the blanket reconcile at the top and reconcile here in various places for readability

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@jairad26 jairad26 force-pushed the jai/local-schema-support branch from dd3f346 to df8c8c8 Compare October 27, 2025 19:47
@sanketkedia
Copy link
Contributor

In general, I feel like the reconciliation logic is all over the place. But that's from before (collection config) so ok for now. But ideally once you've read from sysdb, you should assume that it has schema set and properly reconciled with both collection config and legacy metadata and just use it downstream

@jairad26 jairad26 merged commit 998da94 into main Oct 27, 2025
120 of 122 checks passed
sanketkedia pushed a commit that referenced this pull request Oct 29, 2025
## Description of changes

_Summarize the changes made by this PR._

- Improvements & Bug fixes
- This PR adds support for schema in sqlite sysdb, correctly reconciling
with schema, legacy metadata, and supporting configuration updates. It
also adds support for passing schema via bindings, to allow for local
chroma support. It also updates cli usage of to allow copying of schema
- New functionality
  - ...

## Test plan

_How are these changes tested?_

expanded schema e2e tests to ensure bindings and single node all work as
intended

- [ x] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust

## Migration plan

_Are there any migrations, or any forwards/backwards compatibility
changes needed in order to make sure this change deploys reliably?_

## Observability plan

_What is the plan to instrument and monitor this change?_

## Documentation Changes

_Are all docstrings for user-facing APIs updated if required? Do we need
to make documentation changes in the [docs
section](https://github.com/chroma-core/chroma/tree/main/docs/docs.trychroma.com)?_
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants