Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions internal/db/gorm/migrations.go
Original file line number Diff line number Diff line change
Expand Up @@ -1399,6 +1399,22 @@ func runMigrations(db *gorm.DB, embeddingDims int) error {
return nil
},
},
// Migration 041: Purge orphan vectors — correct table name (vectors, not observation_vectors).
{
ID: "041_purge_orphan_vectors",
Migrate: func(tx *gorm.DB) error {
result := tx.Exec(`DELETE FROM vectors WHERE sqlite_id NOT IN (SELECT id FROM observations)`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better performance and to correctly handle NULL values in sqlite_id, it's recommended to use NOT EXISTS instead of NOT IN.

The NOT IN operator can be inefficient on large tables. More importantly, if vectors.sqlite_id is NULL, the NOT IN condition evaluates to UNKNOWN, and the row won't be deleted. An entry with a NULL sqlite_id is an orphan and should likely be purged.

The NOT EXISTS clause is generally more performant and will correctly identify and include rows where sqlite_id is NULL for deletion.

Suggested change
result := tx.Exec(`DELETE FROM vectors WHERE sqlite_id NOT IN (SELECT id FROM observations)`)
result := tx.Exec(`DELETE FROM vectors WHERE NOT EXISTS (SELECT 1 FROM observations WHERE observations.id = vectors.sqlite_id)`)

if result.Error != nil {
log.Warn().Err(result.Error).Msg("migration 041: orphan vector purge failed (non-fatal)")
return nil
}
log.Info().Int64("orphan_vectors_deleted", result.RowsAffected).Msg("migration 041: orphan vector purge complete")
return nil
},
Rollback: func(tx *gorm.DB) error {
return nil
},
},
})
if err := m.Migrate(); err != nil {
return fmt.Errorf("run gormigrate migrations: %w", err)
Expand Down
Loading