feat: composite relevance scoring + radical observation cleanup#39
feat: composite relevance scoring + radical observation cleanup#39
Conversation
…042) Migration 041: DELETE FROM vectors WHERE sqlite_id NOT IN (SELECT id FROM observations) — purges ~229K orphan vectors from correct table name Migration 042: DELETE FROM patterns WHERE frequency < 5 — purges ~111K low-quality patterns accumulated from garbage SDK extraction — matches raised MinFrequency threshold (T019)
Migration 043: deletes ~170 garbage observations matching 45 title patterns (tool mechanics, task transitions, system prompt analysis, iSCSI debug, etc.) Composite scoring: search results re-ranked by: score = similarity × recencyDecay × typeWeight × importance - recencyDecay: 0.5^(age_days/7) — halves every 7 days - typeWeight: decision=1.4, bugfix=1.3, discovery=0.8, change=0.7 - importance: floor at 0.3 to prevent zero-scored observations Based on Deep Research (Gemini) + claude-mnemonic analysis findings. Sources: CrewAI, Mem0, Letta, Zep scoring formulas.
WalkthroughДобавлена миграция базы данных для удаления наблюдений по заданным шаблонам, реализована новая функция составного рейтинга, которая объединяет оценки подобия, свежести, типа и важности, и интегрирована в обработчик поиска для применения этого рейтинга к результатам запросов. Changes
Sequence Diagram(s)sequenceDiagram
participant Handler as Search Handler<br/>(handleSearchByPrompt)
participant Manager as Search Manager<br/>(ApplyCompositeScoring)
participant Storage as Observations<br/>(Clustered Results)
Handler->>Storage: Retrieve clustered observations
Handler->>Manager: ApplyCompositeScoring(observations, similarityScores)
Manager->>Manager: For each observation:<br/>- Calculate recency decay<br/>- Apply type weight<br/>- Clamp importance score<br/>- Compute composite score
Manager->>Manager: Update similarityScores<br/>with composite values
Manager-->>Handler: Return updated scores
Handler->>Handler: Sort results by<br/>composite scores
Handler-->>Handler: Return ranked results
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.3)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the relevance of context injection by addressing two key areas. First, it cleans up historical "garbage" observations that were polluting the system, removing noise from semantic search. Second, it introduces a sophisticated composite scoring mechanism for search results, which dynamically re-ranks observations based on their similarity, recency, type, and importance. These changes are designed to drastically improve agent performance by providing more pertinent information and reducing irrelevant data. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
internal/db/gorm/migrations.go (1)
1436-1527: Миграция выглядит корректной, но есть возможность оптимизации.Код логически верен и безопасен. Несколько замечаний:
45 отдельных DELETE запросов можно объединить в один запрос с
title SIMILAR TOили несколькимиORусловиями для лучшей производительности. Однако, учитывая что это одноразовая миграция, текущий подход приемлем.Запросы
DELETE FROM observations WHERE title LIKE ?без индекса наtitleприведут к full table scan. Если таблица большая, это может занять время.♻️ Опциональная оптимизация: объединение паттернов
ID: "043_radical_observation_cleanup", Migrate: func(tx *gorm.DB) error { - garbagePatterns := []string{ - // ... 45 patterns ... - } - - var totalDeleted int64 - for _, pattern := range garbagePatterns { - result := tx.Exec("DELETE FROM observations WHERE title LIKE ?", pattern) - if result.Error != nil { - log.Warn().Err(result.Error).Str("pattern", pattern).Msg("migration 043: delete failed") - continue - } - totalDeleted += result.RowsAffected - } + // Batch delete with single query for better performance + result := tx.Exec(`DELETE FROM observations WHERE + title LIKE 'Tool%Query Pattern%' OR + title LIKE 'Tool%Search%Pattern%' OR + -- ... remaining patterns ... + title LIKE 'npm install completion%'`) + totalDeleted := result.RowsAffected + if result.Error != nil { + log.Warn().Err(result.Error).Msg("migration 043: batch delete failed") + } log.Info().Int64("total_deleted", totalDeleted).Msg("migration 043: radical observation cleanup complete") return nil },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/db/gorm/migrations.go` around lines 1436 - 1527, The migration "043_radical_observation_cleanup" issues many separate DELETEs (loop over garbagePatterns) which causes repeated full-table scans; combine them into a single DELETE to improve performance: build one SQL statement (e.g., "DELETE FROM observations WHERE title SIMILAR TO ? " or "DELETE FROM observations WHERE title LIKE ? OR title LIKE ? ...") and execute it once via tx.Exec, reusing the garbagePatterns slice as parameters, then read result.RowsAffected into totalDeleted; update the Migrate function (referencing garbagePatterns, totalDeleted, and the migration ID) to run that single query and keep the same error logging/behavior on failure.internal/search/manager.go (1)
140-143: Дефолтное значение similarity=0.5 может маскировать проблемы.Если
similarityScores[obs.ID]возвращает 0, это может означать как отсутствие записи, так и реальный нулевой скор. Использование0.5как дефолта нейтрально, но может быть полезно логировать такие случаи для отладки.🔍 Опциональное улучшение: различение отсутствия и нулевого скора
for _, obs := range observations { - sim := similarityScores[obs.ID] - if sim == 0 { + sim, exists := similarityScores[obs.ID] + if !exists || sim == 0 { sim = 0.5 // default if no similarity score }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/search/manager.go` around lines 140 - 143, The code currently treats similarityScores[obs.ID]==0 as "no score" and overwrites it with 0.5, which hides missing-entry vs true-zero; change the lookup to use the comma-ok form (v, ok := similarityScores[obs.ID]) in the function/method that references similarityScores and obs.ID, and if ok is false log a debug/warning (including obs.ID and any context) so missing entries are visible, otherwise use the actual value v (including if it's zero); remove the unconditional replacement with 0.5 or only apply a documented fallback in the missing-entry branch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/search/manager.go`:
- Around line 129-137: The map typeWeights in internal/search/manager.go
currently uses string literals for keys and includes "pattern" which has no
corresponding models constant; either add a new constant models.ObsTypePattern
in pkg/models/observation.go (matching the ObservationType type and existing
ObsType* pattern) or remove the "pattern" entry from the map if that observation
type is not supported, and then replace all remaining string keys with the
corresponding constants (models.ObsTypeDecision, models.ObsTypeBugfix,
models.ObsTypeFeature, models.ObsTypeDiscovery, models.ObsTypeChange,
models.ObsTypeRefactor) to ensure type-safety.
---
Nitpick comments:
In `@internal/db/gorm/migrations.go`:
- Around line 1436-1527: The migration "043_radical_observation_cleanup" issues
many separate DELETEs (loop over garbagePatterns) which causes repeated
full-table scans; combine them into a single DELETE to improve performance:
build one SQL statement (e.g., "DELETE FROM observations WHERE title SIMILAR TO
? " or "DELETE FROM observations WHERE title LIKE ? OR title LIKE ? ...") and
execute it once via tx.Exec, reusing the garbagePatterns slice as parameters,
then read result.RowsAffected into totalDeleted; update the Migrate function
(referencing garbagePatterns, totalDeleted, and the migration ID) to run that
single query and keep the same error logging/behavior on failure.
In `@internal/search/manager.go`:
- Around line 140-143: The code currently treats similarityScores[obs.ID]==0 as
"no score" and overwrites it with 0.5, which hides missing-entry vs true-zero;
change the lookup to use the comma-ok form (v, ok := similarityScores[obs.ID])
in the function/method that references similarityScores and obs.ID, and if ok is
false log a debug/warning (including obs.ID and any context) so missing entries
are visible, otherwise use the actual value v (including if it's zero); remove
the unconditional replacement with 0.5 or only apply a documented fallback in
the missing-entry branch.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 98340c28-a028-4ecf-98b0-70ac158737ac
📒 Files selected for processing (3)
internal/db/gorm/migrations.gointernal/search/manager.gointernal/worker/handlers_context.go
| typeWeights := map[models.ObservationType]float64{ | ||
| "decision": 1.4, | ||
| "bugfix": 1.3, | ||
| "feature": 1.2, | ||
| "pattern": 1.2, | ||
| "discovery": 0.8, | ||
| "change": 0.7, | ||
| "refactor": 0.9, | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Поиск констант ObservationType в models
rg -n "ObsType\w*\s*=|ObservationType\s*=" --type=goRepository: thebtf/engram
Length of output: 1054
🏁 Script executed:
cd internal/search && cat -n manager.go | sed -n '125,145p'Repository: thebtf/engram
Length of output: 769
🏁 Script executed:
cat -n pkg/models/observation.go | head -30Repository: thebtf/engram
Length of output: 1012
Используйте константы типов вместо строковых литералов для типобезопасности.
Карта typeWeights должна использовать константы из пакета models вместо строковых литералов. Однако имеется проблема: для типа "pattern" отсутствует соответствующая константа ObsType* в models.ObservationType, хотя для всех остальных типов они определены.
Требуется либо:
- Добавить
ObsTypePatternвpkg/models/observation.go - Либо удалить "pattern" из этой карты, если это не поддерживаемый тип наблюдения
После этого заменить все строковые литералы на константы:
"decision"→models.ObsTypeDecision"bugfix"→models.ObsTypeBugfix"feature"→models.ObsTypeFeature"discovery"→models.ObsTypeDiscovery"change"→models.ObsTypeChange"refactor"→models.ObsTypeRefactor
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/search/manager.go` around lines 129 - 137, The map typeWeights in
internal/search/manager.go currently uses string literals for keys and includes
"pattern" which has no corresponding models constant; either add a new constant
models.ObsTypePattern in pkg/models/observation.go (matching the ObservationType
type and existing ObsType* pattern) or remove the "pattern" entry from the map
if that observation type is not supported, and then replace all remaining string
keys with the corresponding constants (models.ObsTypeDecision,
models.ObsTypeBugfix, models.ObsTypeFeature, models.ObsTypeDiscovery,
models.ObsTypeChange, models.ObsTypeRefactor) to ensure type-safety.
There was a problem hiding this comment.
Code Review
This pull request introduces two significant improvements. First, it adds a database migration to clean up a large number of irrelevant observations, which should improve search quality. Second, it implements a new composite relevance scoring model for search results, factoring in recency, type, and importance, which will likely lead to more relevant context injection. My review focuses on performance aspects of these changes.
| for _, pattern := range garbagePatterns { | ||
| result := tx.Exec("DELETE FROM observations WHERE title LIKE ?", pattern) | ||
| if result.Error != nil { | ||
| log.Warn().Err(result.Error).Str("pattern", pattern).Msg("migration 043: delete failed") | ||
| continue | ||
| } | ||
| totalDeleted += result.RowsAffected | ||
| } |
There was a problem hiding this comment.
This loop executes a separate DELETE query for each of the ~45 patterns. This results in many database round trips and can be inefficient, especially on a large observations table. Consider combining these into a single DELETE statement using OR conditions to improve performance. For example: DELETE FROM observations WHERE title LIKE ? OR title LIKE ? .... This would make the operation atomic and significantly faster.
| typeWeights := map[models.ObservationType]float64{ | ||
| "decision": 1.4, | ||
| "bugfix": 1.3, | ||
| "feature": 1.2, | ||
| "pattern": 1.2, | ||
| "discovery": 0.8, | ||
| "change": 0.7, | ||
| "refactor": 0.9, | ||
| } |
There was a problem hiding this comment.
Summary
Two changes to fix ~85% irrelevant context injection:
1. Migration 043: Radical observation cleanup
Deletes observations matching 45 garbage title patterns:
2. Composite relevance scoring
Re-ranks search results using multi-signal scoring:
0.5^(age_days/7)— observations halve in score every 7 daysBased on: Gemini Deep Research (36 sources: CrewAI, Mem0, Letta, Zep) + claude-mnemonic codebase analysis.
Test plan
Summary by CodeRabbit
Примечания к выпуску
Улучшения поиска
Техническое обслуживание