fix: migration 042 — purge 111K low-quality patterns#38
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request implements two new database migrations. The primary migration performs a crucial one-time cleanup to remove a large volume of low-quality patterns from the database, improving data quality and reducing noise. Additionally, a corrective migration is included to ensure the proper purging of orphan vectors by targeting the correct database table. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
WalkthroughДобавлена новая миграция GORM Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.3)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces two database migrations for data cleanup. Migration 041 purges orphan vectors, and migration 042 purges low-quality patterns with a frequency less than 5. The changes are straightforward and align with the goal of cleaning up legacy data. I've suggested a minor performance improvement for the SQL query in migration 041.
| { | ||
| ID: "041_purge_orphan_vectors", | ||
| Migrate: func(tx *gorm.DB) error { | ||
| result := tx.Exec(`DELETE FROM vectors WHERE sqlite_id NOT IN (SELECT id FROM observations)`) |
There was a problem hiding this comment.
For improved performance and to align with SQL best practices, consider using NOT EXISTS instead of NOT IN. NOT EXISTS is often handled more efficiently by query planners and is safer with subqueries that could potentially return NULL values.
| result := tx.Exec(`DELETE FROM vectors WHERE sqlite_id NOT IN (SELECT id FROM observations)`) | |
| result := tx.Exec("DELETE FROM vectors WHERE NOT EXISTS (SELECT 1 FROM observations WHERE observations.id = vectors.sqlite_id)") |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/db/gorm/migrations.go`:
- Line 1406: Замените вызываемый запрос в migrations.go (в месте где
используется tx.Exec и переменная result) — вместо необратимого DELETE без
фильтрации выполните запрос, который затрагивает только вектора типа
observation: используйте DELETE FROM vectors WHERE doc_type = 'observation' AND
NOT EXISTS (SELECT 1 FROM observations WHERE observations.id =
vectors.sqlite_id); т.е. уберите NOT IN и добавьте предусловие по
doc_type='observation', сохранив вызов через tx.Exec и проверку
result.Error/RowsAffected как ранее.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fd13fade-61b7-4305-9aec-05c424596924
📒 Files selected for processing (2)
internal/db/gorm/migrations.gointernal/worker/handlers_maintenance.go
| { | ||
| ID: "041_purge_orphan_vectors", | ||
| Migrate: func(tx *gorm.DB) error { | ||
| result := tx.Exec(`DELETE FROM vectors WHERE sqlite_id NOT IN (SELECT id FROM observations)`) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Проверка, что таблица vectors реально используется для разных doc_type,
# и что purge нужно ограничивать observation-контекстом.
set -euo pipefail
echo "== Поиск мест, где задается/используется doc_type для vectors =="
rg -n -C2 --type=go '\bdoc_type\b|\bDocType\b|\bvectors\b'
echo
echo "== Поиск SQL-вставок/апдейтов в vectors с sqlite_id =="
rg -n -C2 --type=go 'INSERT INTO vectors|UPDATE vectors|sqlite_id'Repository: thebtf/engram
Length of output: 50369
🏁 Script executed:
# Check the migration 041 code and surrounding context
sed -n '1402,1420p' internal/db/gorm/migrations.goRepository: thebtf/engram
Length of output: 896
🏁 Script executed:
# Verify the vectors table structure from migration 006
sed -n '208,225p' internal/db/gorm/migrations.goRepository: thebtf/engram
Length of output: 684
🏁 Script executed:
# Check if there are other migrations that reference pattern or user_prompt vectors
grep -n "doc_type.*pattern\|doc_type.*user_prompt\|doc_type.*session_summary" internal/db/gorm/migrations.goRepository: thebtf/engram
Length of output: 39
Замените DELETE на фильтрованный запрос с предусловием на doc_type и используйте NOT EXISTS вместо NOT IN.
Текущая миграция удаляет векторы других типов (session_summary, user_prompt, pattern), так как их sqlite_id указывают на ID из других таблиц, а не на observations. Это необратимая потеря данных.
💡 Необходимый фикс
- result := tx.Exec(`DELETE FROM vectors WHERE sqlite_id NOT IN (SELECT id FROM observations)`)
+ result := tx.Exec(`
+ DELETE FROM vectors v
+ WHERE v.doc_type = 'observation'
+ AND v.sqlite_id IS NOT NULL
+ AND NOT EXISTS (
+ SELECT 1
+ FROM observations o
+ WHERE o.id = v.sqlite_id
+ )
+ `)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| result := tx.Exec(`DELETE FROM vectors WHERE sqlite_id NOT IN (SELECT id FROM observations)`) | |
| result := tx.Exec(` | |
| DELETE FROM vectors v | |
| WHERE v.doc_type = 'observation' | |
| AND v.sqlite_id IS NOT NULL | |
| AND NOT EXISTS ( | |
| SELECT 1 | |
| FROM observations o | |
| WHERE o.id = v.sqlite_id | |
| ) | |
| `) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/db/gorm/migrations.go` at line 1406, Замените вызываемый запрос в
migrations.go (в месте где используется tx.Exec и переменная result) — вместо
необратимого DELETE без фильтрации выполните запрос, который затрагивает только
вектора типа observation: используйте DELETE FROM vectors WHERE doc_type =
'observation' AND NOT EXISTS (SELECT 1 FROM observations WHERE observations.id =
vectors.sqlite_id); т.е. уберите NOT IN и добавьте предусловие по
doc_type='observation', сохранив вызов через tx.Exec и проверку
result.Error/RowsAffected как ранее.
…042) Migration 041: DELETE FROM vectors WHERE sqlite_id NOT IN (SELECT id FROM observations) — purges ~229K orphan vectors from correct table name Migration 042: DELETE FROM patterns WHERE frequency < 5 — purges ~111K low-quality patterns accumulated from garbage SDK extraction — matches raised MinFrequency threshold (T019)
acdb385 to
aa2545c
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/db/gorm/migrations.go`:
- Around line 1425-1428: В блоке проверки результата миграции (где проверяется
result.Error и логируется "migration 042: pattern purge failed (non-fatal)") не
подавляйте ошибку — вместо возвращения nil верните саму ошибку; замените текущее
поведение в этой ветке так, чтобы функция возвращала result.Error (и при желании
подняла уровень лога на error), чтобы миграция не считалась успешно выполненной
при падении DELETE.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 871124f6-64e2-486d-89a2-2c65b0ad011c
📒 Files selected for processing (2)
internal/db/gorm/migrations.gointernal/worker/handlers_maintenance.go
✅ Files skipped from review due to trivial changes (1)
- internal/worker/handlers_maintenance.go
| if result.Error != nil { | ||
| log.Warn().Err(result.Error).Msg("migration 042: pattern purge failed (non-fatal)") | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Не скрывайте ошибку миграции на Line 1425—1428.
Если DELETE падает и функция всё равно возвращает nil, миграция может быть отмечена как выполненная без фактической очистки. Для one-time cleanup это ломает детерминизм состояния БД. Верните ошибку, чтобы миграция не считалась успешно применённой.
Предлагаемый фикс
Migrate: func(tx *gorm.DB) error {
result := tx.Exec(`DELETE FROM patterns WHERE frequency < 5`)
if result.Error != nil {
- log.Warn().Err(result.Error).Msg("migration 042: pattern purge failed (non-fatal)")
- return nil
+ return fmt.Errorf("migration 042: purge low-quality patterns: %w", result.Error)
}
log.Info().Int64("patterns_deleted", result.RowsAffected).Msg("migration 042: low-quality pattern purge complete")
return nil
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if result.Error != nil { | |
| log.Warn().Err(result.Error).Msg("migration 042: pattern purge failed (non-fatal)") | |
| return nil | |
| } | |
| if result.Error != nil { | |
| return fmt.Errorf("migration 042: purge low-quality patterns: %w", result.Error) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/db/gorm/migrations.go` around lines 1425 - 1428, В блоке проверки
результата миграции (где проверяется result.Error и логируется "migration 042:
pattern purge failed (non-fatal)") не подавляйте ошибку — вместо возвращения nil
верните саму ошибку; замените текущее поведение в этой ветке так, чтобы функция
возвращала result.Error (и при желании подняла уровень лога на error), чтобы
миграция не считалась успешно выполненной при падении DELETE.
Deletes patterns with frequency < 5 (raised threshold in T019). One-time cleanup migration, no dangerous endpoints.
Summary by CodeRabbit
Заметки о выпуске