Skip to content

fix: migration 042 — purge 111K low-quality patterns#38

Merged
thebtf merged 1 commit intomainfrom
fix/pattern-purge-migration
Mar 21, 2026
Merged

fix: migration 042 — purge 111K low-quality patterns#38
thebtf merged 1 commit intomainfrom
fix/pattern-purge-migration

Conversation

@thebtf
Copy link
Owner

@thebtf thebtf commented Mar 21, 2026

Deletes patterns with frequency < 5 (raised threshold in T019). One-time cleanup migration, no dangerous endpoints.

Summary by CodeRabbit

Заметки о выпуске

  • Chores
    • Добавлена задача очистки базы данных для удаления низкокачественных записей (уменьшение объёма и улучшение качества данных); операция логирует число удалённых записей и не прерывает работу при ошибках.
  • Style
    • Небольшое форматирование исходников (добавлен пустой символ/перевод строки).

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements two new database migrations. The primary migration performs a crucial one-time cleanup to remove a large volume of low-quality patterns from the database, improving data quality and reducing noise. Additionally, a corrective migration is included to ensure the proper purging of orphan vectors by targeting the correct database table.

Highlights

  • Database Migration: Introduced migration 042 to purge over 111K low-quality patterns from the database, specifically those with a frequency less than 5, aligning with a previously established minimum frequency threshold.
  • Migration Correction: Added migration 041 to correctly purge orphan vectors by targeting the vectors table, rectifying a previous attempt that might have referenced an incorrect table name.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Mar 21, 2026

Walkthrough

Добавлена новая миграция GORM 042_purge_low_quality_patterns, которая удаляет из таблицы patterns записи с frequency < 5 и логирует количество удалённых строк; ошибка удаления помечается предупреждением и не прерывает миграции. Также добавлена пустая строка в конце одного файла.

Changes

Cohort / File(s) Summary
Database Migrations
internal/db/gorm/migrations.go
Добавлена миграция 042_purge_low_quality_patterns: выполняет DELETE FROM patterns WHERE frequency < 5, логирует patterns_deleted при успехе; при ошибке записывает warning и возвращает nil. Откат — no-op.
Code Formatting
internal/worker/handlers_maintenance.go
Добавлена одна пустая строка в конце файла — нет изменений логики.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • fix: orphan vector purge — correct table name #37 — добавляет схожую non-fatal миграцию в internal/db/gorm/migrations.go для очистки осиротевших записей в vectors; тесно связан по месту и подходу.

Poem

🐇 Я — кролик, прыг! — миграцию спешу создать,
Устаревшие шаблоны ветром сдуваю прочь,
В логе тихо счёт веду — кто ушёл, кто остался,
Откат лёгкий, будто след в траве, не тревожь,
Пусть база чиста будет — я рад вас обрадовать!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Заголовок точно описывает основное изменение: добавление миграции 042 для удаления низкокачественных паттернов с частотой < 5.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/pattern-purge-migration

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.3)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two database migrations for data cleanup. Migration 041 purges orphan vectors, and migration 042 purges low-quality patterns with a frequency less than 5. The changes are straightforward and align with the goal of cleaning up legacy data. I've suggested a minor performance improvement for the SQL query in migration 041.

{
ID: "041_purge_orphan_vectors",
Migrate: func(tx *gorm.DB) error {
result := tx.Exec(`DELETE FROM vectors WHERE sqlite_id NOT IN (SELECT id FROM observations)`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For improved performance and to align with SQL best practices, consider using NOT EXISTS instead of NOT IN. NOT EXISTS is often handled more efficiently by query planners and is safer with subqueries that could potentially return NULL values.

Suggested change
result := tx.Exec(`DELETE FROM vectors WHERE sqlite_id NOT IN (SELECT id FROM observations)`)
result := tx.Exec("DELETE FROM vectors WHERE NOT EXISTS (SELECT 1 FROM observations WHERE observations.id = vectors.sqlite_id)")

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/db/gorm/migrations.go`:
- Line 1406: Замените вызываемый запрос в migrations.go (в месте где
используется tx.Exec и переменная result) — вместо необратимого DELETE без
фильтрации выполните запрос, который затрагивает только вектора типа
observation: используйте DELETE FROM vectors WHERE doc_type = 'observation' AND
NOT EXISTS (SELECT 1 FROM observations WHERE observations.id =
vectors.sqlite_id); т.е. уберите NOT IN и добавьте предусловие по
doc_type='observation', сохранив вызов через tx.Exec и проверку
result.Error/RowsAffected как ранее.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fd13fade-61b7-4305-9aec-05c424596924

📥 Commits

Reviewing files that changed from the base of the PR and between 17025e0 and acdb385.

📒 Files selected for processing (2)
  • internal/db/gorm/migrations.go
  • internal/worker/handlers_maintenance.go

{
ID: "041_purge_orphan_vectors",
Migrate: func(tx *gorm.DB) error {
result := tx.Exec(`DELETE FROM vectors WHERE sqlite_id NOT IN (SELECT id FROM observations)`)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Проверка, что таблица vectors реально используется для разных doc_type,
# и что purge нужно ограничивать observation-контекстом.

set -euo pipefail

echo "== Поиск мест, где задается/используется doc_type для vectors =="
rg -n -C2 --type=go '\bdoc_type\b|\bDocType\b|\bvectors\b'

echo
echo "== Поиск SQL-вставок/апдейтов в vectors с sqlite_id =="
rg -n -C2 --type=go 'INSERT INTO vectors|UPDATE vectors|sqlite_id'

Repository: thebtf/engram

Length of output: 50369


🏁 Script executed:

# Check the migration 041 code and surrounding context
sed -n '1402,1420p' internal/db/gorm/migrations.go

Repository: thebtf/engram

Length of output: 896


🏁 Script executed:

# Verify the vectors table structure from migration 006
sed -n '208,225p' internal/db/gorm/migrations.go

Repository: thebtf/engram

Length of output: 684


🏁 Script executed:

# Check if there are other migrations that reference pattern or user_prompt vectors
grep -n "doc_type.*pattern\|doc_type.*user_prompt\|doc_type.*session_summary" internal/db/gorm/migrations.go

Repository: thebtf/engram

Length of output: 39


Замените DELETE на фильтрованный запрос с предусловием на doc_type и используйте NOT EXISTS вместо NOT IN.

Текущая миграция удаляет векторы других типов (session_summary, user_prompt, pattern), так как их sqlite_id указывают на ID из других таблиц, а не на observations. Это необратимая потеря данных.

💡 Необходимый фикс
- result := tx.Exec(`DELETE FROM vectors WHERE sqlite_id NOT IN (SELECT id FROM observations)`)
+ result := tx.Exec(`
+   DELETE FROM vectors v
+   WHERE v.doc_type = 'observation'
+     AND v.sqlite_id IS NOT NULL
+     AND NOT EXISTS (
+       SELECT 1
+       FROM observations o
+       WHERE o.id = v.sqlite_id
+     )
+ `)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
result := tx.Exec(`DELETE FROM vectors WHERE sqlite_id NOT IN (SELECT id FROM observations)`)
result := tx.Exec(`
DELETE FROM vectors v
WHERE v.doc_type = 'observation'
AND v.sqlite_id IS NOT NULL
AND NOT EXISTS (
SELECT 1
FROM observations o
WHERE o.id = v.sqlite_id
)
`)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/db/gorm/migrations.go` at line 1406, Замените вызываемый запрос в
migrations.go (в месте где используется tx.Exec и переменная result) — вместо
необратимого DELETE без фильтрации выполните запрос, который затрагивает только
вектора типа observation: используйте DELETE FROM vectors WHERE doc_type =
'observation' AND NOT EXISTS (SELECT 1 FROM observations WHERE observations.id =
vectors.sqlite_id); т.е. уберите NOT IN и добавьте предусловие по
doc_type='observation', сохранив вызов через tx.Exec и проверку
result.Error/RowsAffected как ранее.

…042)

Migration 041: DELETE FROM vectors WHERE sqlite_id NOT IN (SELECT id FROM observations)
  — purges ~229K orphan vectors from correct table name

Migration 042: DELETE FROM patterns WHERE frequency < 5
  — purges ~111K low-quality patterns accumulated from garbage SDK extraction
  — matches raised MinFrequency threshold (T019)
@thebtf thebtf force-pushed the fix/pattern-purge-migration branch from acdb385 to aa2545c Compare March 21, 2026 10:34
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/db/gorm/migrations.go`:
- Around line 1425-1428: В блоке проверки результата миграции (где проверяется
result.Error и логируется "migration 042: pattern purge failed (non-fatal)") не
подавляйте ошибку — вместо возвращения nil верните саму ошибку; замените текущее
поведение в этой ветке так, чтобы функция возвращала result.Error (и при желании
подняла уровень лога на error), чтобы миграция не считалась успешно выполненной
при падении DELETE.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 871124f6-64e2-486d-89a2-2c65b0ad011c

📥 Commits

Reviewing files that changed from the base of the PR and between acdb385 and aa2545c.

📒 Files selected for processing (2)
  • internal/db/gorm/migrations.go
  • internal/worker/handlers_maintenance.go
✅ Files skipped from review due to trivial changes (1)
  • internal/worker/handlers_maintenance.go

Comment on lines +1425 to +1428
if result.Error != nil {
log.Warn().Err(result.Error).Msg("migration 042: pattern purge failed (non-fatal)")
return nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Не скрывайте ошибку миграции на Line 1425—1428.

Если DELETE падает и функция всё равно возвращает nil, миграция может быть отмечена как выполненная без фактической очистки. Для one-time cleanup это ломает детерминизм состояния БД. Верните ошибку, чтобы миграция не считалась успешно применённой.

Предлагаемый фикс
 		Migrate: func(tx *gorm.DB) error {
 			result := tx.Exec(`DELETE FROM patterns WHERE frequency < 5`)
 			if result.Error != nil {
-				log.Warn().Err(result.Error).Msg("migration 042: pattern purge failed (non-fatal)")
-				return nil
+				return fmt.Errorf("migration 042: purge low-quality patterns: %w", result.Error)
 			}
 			log.Info().Int64("patterns_deleted", result.RowsAffected).Msg("migration 042: low-quality pattern purge complete")
 			return nil
 		},
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if result.Error != nil {
log.Warn().Err(result.Error).Msg("migration 042: pattern purge failed (non-fatal)")
return nil
}
if result.Error != nil {
return fmt.Errorf("migration 042: purge low-quality patterns: %w", result.Error)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/db/gorm/migrations.go` around lines 1425 - 1428, В блоке проверки
результата миграции (где проверяется result.Error и логируется "migration 042:
pattern purge failed (non-fatal)") не подавляйте ошибку — вместо возвращения nil
верните саму ошибку; замените текущее поведение в этой ветке так, чтобы функция
возвращала result.Error (и при желании подняла уровень лога на error), чтобы
миграция не считалась успешно выполненной при падении DELETE.

@thebtf thebtf merged commit af34a17 into main Mar 21, 2026
2 checks passed
@thebtf thebtf deleted the fix/pattern-purge-migration branch March 21, 2026 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant