Skip to content

fix: orphan vector purge — correct table name#37

Merged
thebtf merged 1 commit intomainfrom
fix/orphan-vectors-table-name
Mar 21, 2026
Merged

fix: orphan vector purge — correct table name#37
thebtf merged 1 commit intomainfrom
fix/orphan-vectors-table-name

Conversation

@thebtf
Copy link
Owner

@thebtf thebtf commented Mar 21, 2026

Migration 040 used wrong table (observation_vectors). Actual table is vectors. Migration 041 runs DELETE FROM vectors WHERE sqlite_id NOT IN (SELECT id FROM observations) to purge ~229K orphans.

Summary by CodeRabbit

  • Chores
    • Добавлена миграция базы данных для очистки и оптимизации хранилища данных, с логированием результатов операции.

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical data integrity issue by implementing a new database migration. It corrects a previous attempt to purge orphaned vector entries by ensuring the correct table (vectors) is targeted for deletion, thereby removing unlinked data and maintaining database consistency.

Highlights

  • Database Migration Correction: Introduced Migration 041 to correctly purge orphan vectors from the vectors table, addressing an error in a previous migration (040) that targeted an incorrect table (observation_vectors).
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Mar 21, 2026

Caution

Review failed

Pull request was closed or merged during review

Walkthrough

Добавлена новая миграция базы данных (041_purge_orphan_vectors), которая удаляет осиротевшие строки из таблицы vectors, где нет соответствующих записей в таблице observations. При успехе миграция логирует количество удалённых строк, при ошибке логирует предупреждение и продолжает выполнение.

Changes

Cohort / File(s) Summary
Database Migration
internal/db/gorm/migrations.go
Добавлена новая миграция 041_purge_orphan_vectors, которая удаляет осиротевшие записи из таблицы vectors на основе отсутствия соответствующего observation. Ошибка обрабатывается как некритическая с возвратом nil и пустым rollback.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 Хоп-хоп, в базе чистота!
Волшебная миграция спешит,
Осиротевшие векторы летят,
И в логах запись вот звучит:
"Удалено строк, данных нет следа!"

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: orphan vector purge — correct table name' directly describes the main change: correcting the table name reference in the orphan vector purge migration from an incorrect name to 'vectors'.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/orphan-vectors-table-name

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.3)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new database migration to purge orphan vectors from the vectors table. The previous attempt in migration 040 targeted an incorrect table. The new migration 041 correctly targets the vectors table and resolves the issue. My feedback focuses on improving the performance and correctness of the SQL query used for the purge operation.

{
ID: "041_purge_orphan_vectors",
Migrate: func(tx *gorm.DB) error {
result := tx.Exec(`DELETE FROM vectors WHERE sqlite_id NOT IN (SELECT id FROM observations)`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better performance and to correctly handle NULL values in sqlite_id, it's recommended to use NOT EXISTS instead of NOT IN.

The NOT IN operator can be inefficient on large tables. More importantly, if vectors.sqlite_id is NULL, the NOT IN condition evaluates to UNKNOWN, and the row won't be deleted. An entry with a NULL sqlite_id is an orphan and should likely be purged.

The NOT EXISTS clause is generally more performant and will correctly identify and include rows where sqlite_id is NULL for deletion.

Suggested change
result := tx.Exec(`DELETE FROM vectors WHERE sqlite_id NOT IN (SELECT id FROM observations)`)
result := tx.Exec(`DELETE FROM vectors WHERE NOT EXISTS (SELECT 1 FROM observations WHERE observations.id = vectors.sqlite_id)`)

@thebtf thebtf merged commit 602fbe3 into main Mar 21, 2026
1 of 2 checks passed
@thebtf thebtf deleted the fix/orphan-vectors-table-name branch March 21, 2026 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant