feat: composite relevance scoring + radical observation cleanup by thebtf · Pull Request #39 · thebtf/engram

thebtf · 2026-03-21T11:28:14Z

Summary

Two changes to fix ~85% irrelevant context injection:

1. Migration 043: Radical observation cleanup

Deletes observations matching 45 garbage title patterns:

Tool mechanics (ToolSearch Query Pattern ×15)
Task status transitions (×20)
System prompt analysis (Claude Anti-Sycophancy, etc.)
iSCSI debug noise, subtitle timestamps, test data

2. Composite relevance scoring

Re-ranks search results using multi-signal scoring:

score = similarity × recencyDecay × typeWeight × importance

recencyDecay: 0.5^(age_days/7) — observations halve in score every 7 days
typeWeight: decision=1.4, bugfix=1.3, feature=1.2, discovery=0.8, change=0.7
importance: uses stored importance_score (floor 0.3)

Based on: Gemini Deep Research (36 sources: CrewAI, Mem0, Letta, Zep) + claude-mnemonic codebase analysis.

Test plan

Build passes
Migration 043 deletes garbage on deploy
Context injection contains more relevant observations after deploy
Old generic observations (>7 days) naturally sink in ranking

Summary by CodeRabbit

Примечания к выпуску

Улучшения поиска
- Реализована улучшенная система ранжирования результатов поиска, которая учитывает актуальность, тип и значимость наблюдений для выдачи более релевантных результатов.
Техническое обслуживание
- Выполнена очистка базы данных путем удаления записей по определенным критериям фильтрации.

…042) Migration 041: DELETE FROM vectors WHERE sqlite_id NOT IN (SELECT id FROM observations) — purges ~229K orphan vectors from correct table name Migration 042: DELETE FROM patterns WHERE frequency < 5 — purges ~111K low-quality patterns accumulated from garbage SDK extraction — matches raised MinFrequency threshold (T019)

Migration 043: deletes ~170 garbage observations matching 45 title patterns (tool mechanics, task transitions, system prompt analysis, iSCSI debug, etc.) Composite scoring: search results re-ranked by: score = similarity × recencyDecay × typeWeight × importance - recencyDecay: 0.5^(age_days/7) — halves every 7 days - typeWeight: decision=1.4, bugfix=1.3, discovery=0.8, change=0.7 - importance: floor at 0.3 to prevent zero-scored observations Based on Deep Research (Gemini) + claude-mnemonic analysis findings. Sources: CrewAI, Mem0, Letta, Zep scoring formulas.

coderabbitai · 2026-03-21T11:28:30Z

Walkthrough

Добавлена миграция базы данных для удаления наблюдений по заданным шаблонам, реализована новая функция составного рейтинга, которая объединяет оценки подобия, свежести, типа и важности, и интегрирована в обработчик поиска для применения этого рейтинга к результатам запросов.

Changes

Cohort / File(s)	Summary
Миграция базы данных `internal/db/gorm/migrations.go`	Добавлена одноразовая миграция gormigrate (ID: "043_radical_observation_cleanup"), которая удаляет наблюдения по списку SQL-шаблонов с подстановочными символами. Логирует сбои удаления как предупреждения и в конце выводит общий счётчик удалённых строк. Откат миграции — пустая операция.
Логика составного рейтинга `internal/search/manager.go`	Добавлена экспортируемая функция `ApplyCompositeScoring`, которая переранжирует наблюдения путём комбинирования оценки подобия с затуханием свежести (полупериод 7 дней), весовыми коэффициентами типа и скорректированным баллом важности (минимум 0,3). Результирующие композитные оценки записываются обратно в карту `similarityScores`.
Интеграция с обработчиком поиска `internal/worker/handlers_context.go`	Удалена логика отслеживания состояния переранжирования; добавлено безусловное применение составного рейтинга к кластеризованным результатам, что заменило прежнюю условную сортировку по подобию.

Sequence Diagram(s)

sequenceDiagram
    participant Handler as Search Handler<br/>(handleSearchByPrompt)
    participant Manager as Search Manager<br/>(ApplyCompositeScoring)
    participant Storage as Observations<br/>(Clustered Results)
    
    Handler->>Storage: Retrieve clustered observations
    Handler->>Manager: ApplyCompositeScoring(observations, similarityScores)
    
    Manager->>Manager: For each observation:<br/>- Calculate recency decay<br/>- Apply type weight<br/>- Clamp importance score<br/>- Compute composite score
    
    Manager->>Manager: Update similarityScores<br/>with composite values
    Manager-->>Handler: Return updated scores
    
    Handler->>Handler: Sort results by<br/>composite scores
    Handler-->>Handler: Return ranked results

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

fix: migration 042 — purge 111K low-quality patterns #38: Обе ветки добавляют GORM-миграции для выполнения удаления данных, специфичных для таблиц, в internal/db/gorm/migrations.go.

Poem

🐰 Хоп-хоп! Вот чистка и оценки,
Составной рейтинг в помощь спешке,
Свежесть, вес и важность в слёт,
Наблюденья ранжирует вот! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	Заголовок точно отражает две основные изменения: добавление составного алгоритма подсчета релевантности и удаление нежелательных наблюдений через миграцию.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/retrieval-relevance-scoring

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.3)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-21T11:28:31Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the relevance of context injection by addressing two key areas. First, it cleans up historical "garbage" observations that were polluting the system, removing noise from semantic search. Second, it introduces a sophisticated composite scoring mechanism for search results, which dynamically re-ranks observations based on their similarity, recency, type, and importance. These changes are designed to drastically improve agent performance by providing more pertinent information and reducing irrelevant data.

Highlights

Radical Observation Cleanup: Introduced Migration 043 to delete observations matching 45 garbage title patterns, including tool mechanics, task status transitions, system prompt analysis, iSCSI debug noise, subtitle timestamps, and test data, aiming to fix irrelevant context injection.
Composite Relevance Scoring: Implemented a new multi-signal scoring formula (score = similarity × recencyDecay × typeWeight × importance) to re-rank search results, ensuring recent, high-importance decisions are prioritized over old, generic discoveries.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

internal/db/gorm/migrations.go (1)
1436-1527: Миграция выглядит корректной, но есть возможность оптимизации.

Код логически верен и безопасен. Несколько замечаний:

45 отдельных DELETE запросов можно объединить в один запрос с title SIMILAR TO или несколькими OR условиями для лучшей производительности. Однако, учитывая что это одноразовая миграция, текущий подход приемлем.

Запросы DELETE FROM observations WHERE title LIKE ? без индекса на title приведут к full table scan. Если таблица большая, это может занять время.
♻️ Опциональная оптимизация: объединение паттернов
 		ID: "043_radical_observation_cleanup",
 		Migrate: func(tx *gorm.DB) error {
-			garbagePatterns := []string{
-				// ... 45 patterns ...
-			}
-
-			var totalDeleted int64
-			for _, pattern := range garbagePatterns {
-				result := tx.Exec("DELETE FROM observations WHERE title LIKE ?", pattern)
-				if result.Error != nil {
-					log.Warn().Err(result.Error).Str("pattern", pattern).Msg("migration 043: delete failed")
-					continue
-				}
-				totalDeleted += result.RowsAffected
-			}
+			// Batch delete with single query for better performance
+			result := tx.Exec(`DELETE FROM observations WHERE 
+				title LIKE 'Tool%Query Pattern%' OR
+				title LIKE 'Tool%Search%Pattern%' OR
+				-- ... remaining patterns ...
+				title LIKE 'npm install completion%'`)
+			totalDeleted := result.RowsAffected
+			if result.Error != nil {
+				log.Warn().Err(result.Error).Msg("migration 043: batch delete failed")
+			}

 			log.Info().Int64("total_deleted", totalDeleted).Msg("migration 043: radical observation cleanup complete")
 			return nil
 		},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/db/gorm/migrations.go` around lines 1436 - 1527, The migration
"043_radical_observation_cleanup" issues many separate DELETEs (loop over
garbagePatterns) which causes repeated full-table scans; combine them into a
single DELETE to improve performance: build one SQL statement (e.g., "DELETE
FROM observations WHERE title SIMILAR TO ? " or "DELETE FROM observations WHERE
title LIKE ? OR title LIKE ? ...") and execute it once via tx.Exec, reusing the
garbagePatterns slice as parameters, then read result.RowsAffected into
totalDeleted; update the Migrate function (referencing garbagePatterns,
totalDeleted, and the migration ID) to run that single query and keep the same
error logging/behavior on failure.
internal/search/manager.go (1)
140-143: Дефолтное значение similarity=0.5 может маскировать проблемы.

Если similarityScores[obs.ID] возвращает 0, это может означать как отсутствие записи, так и реальный нулевой скор. Использование 0.5 как дефолта нейтрально, но может быть полезно логировать такие случаи для отладки.
🔍 Опциональное улучшение: различение отсутствия и нулевого скора
 	for _, obs := range observations {
-		sim := similarityScores[obs.ID]
-		if sim == 0 {
+		sim, exists := similarityScores[obs.ID]
+		if !exists || sim == 0 {
 			sim = 0.5 // default if no similarity score
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/search/manager.go` around lines 140 - 143, The code currently treats
similarityScores[obs.ID]==0 as "no score" and overwrites it with 0.5, which
hides missing-entry vs true-zero; change the lookup to use the comma-ok form (v,
ok := similarityScores[obs.ID]) in the function/method that references
similarityScores and obs.ID, and if ok is false log a debug/warning (including
obs.ID and any context) so missing entries are visible, otherwise use the actual
value v (including if it's zero); remove the unconditional replacement with 0.5
or only apply a documented fallback in the missing-entry branch.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/search/manager.go`:
- Around line 129-137: The map typeWeights in internal/search/manager.go
currently uses string literals for keys and includes "pattern" which has no
corresponding models constant; either add a new constant models.ObsTypePattern
in pkg/models/observation.go (matching the ObservationType type and existing
ObsType* pattern) or remove the "pattern" entry from the map if that observation
type is not supported, and then replace all remaining string keys with the
corresponding constants (models.ObsTypeDecision, models.ObsTypeBugfix,
models.ObsTypeFeature, models.ObsTypeDiscovery, models.ObsTypeChange,
models.ObsTypeRefactor) to ensure type-safety.

---

Nitpick comments:
In `@internal/db/gorm/migrations.go`:
- Around line 1436-1527: The migration "043_radical_observation_cleanup" issues
many separate DELETEs (loop over garbagePatterns) which causes repeated
full-table scans; combine them into a single DELETE to improve performance:
build one SQL statement (e.g., "DELETE FROM observations WHERE title SIMILAR TO
? " or "DELETE FROM observations WHERE title LIKE ? OR title LIKE ? ...") and
execute it once via tx.Exec, reusing the garbagePatterns slice as parameters,
then read result.RowsAffected into totalDeleted; update the Migrate function
(referencing garbagePatterns, totalDeleted, and the migration ID) to run that
single query and keep the same error logging/behavior on failure.

In `@internal/search/manager.go`:
- Around line 140-143: The code currently treats similarityScores[obs.ID]==0 as
"no score" and overwrites it with 0.5, which hides missing-entry vs true-zero;
change the lookup to use the comma-ok form (v, ok := similarityScores[obs.ID])
in the function/method that references similarityScores and obs.ID, and if ok is
false log a debug/warning (including obs.ID and any context) so missing entries
are visible, otherwise use the actual value v (including if it's zero); remove
the unconditional replacement with 0.5 or only apply a documented fallback in
the missing-entry branch.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 98340c28-a028-4ecf-98b0-70ac158737ac

📥 Commits

Reviewing files that changed from the base of the PR and between 57e2e2d and efa5084.

📒 Files selected for processing (3)

internal/db/gorm/migrations.go
internal/search/manager.go
internal/worker/handlers_context.go

coderabbitai · 2026-03-21T11:31:46Z

internal/search/manager.go

+	typeWeights := map[models.ObservationType]float64{
+		"decision":  1.4,
+		"bugfix":    1.3,
+		"feature":   1.2,
+		"pattern":   1.2,
+		"discovery": 0.8,
+		"change":    0.7,
+		"refactor":  0.9,
+	}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Поиск констант ObservationType в models rg -n "ObsType\w*\s*=|ObservationType\s*=" --type=go

Repository: thebtf/engram

Length of output: 1054

🏁 Script executed:

cd internal/search && cat -n manager.go | sed -n '125,145p'

Repository: thebtf/engram

Length of output: 769

🏁 Script executed:

cat -n pkg/models/observation.go | head -30

Repository: thebtf/engram

Length of output: 1012

Используйте константы типов вместо строковых литералов для типобезопасности.

Карта typeWeights должна использовать константы из пакета models вместо строковых литералов. Однако имеется проблема: для типа "pattern" отсутствует соответствующая константа ObsType* в models.ObservationType, хотя для всех остальных типов они определены.

Требуется либо:

Добавить ObsTypePattern в pkg/models/observation.go

Либо удалить "pattern" из этой карты, если это не поддерживаемый тип наблюдения

После этого заменить все строковые литералы на константы:

"decision" → models.ObsTypeDecision

"bugfix" → models.ObsTypeBugfix

"feature" → models.ObsTypeFeature

"discovery" → models.ObsTypeDiscovery

"change" → models.ObsTypeChange

"refactor" → models.ObsTypeRefactor

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/search/manager.go` around lines 129 - 137, The map typeWeights in internal/search/manager.go currently uses string literals for keys and includes "pattern" which has no corresponding models constant; either add a new constant models.ObsTypePattern in pkg/models/observation.go (matching the ObservationType type and existing ObsType* pattern) or remove the "pattern" entry from the map if that observation type is not supported, and then replace all remaining string keys with the corresponding constants (models.ObsTypeDecision, models.ObsTypeBugfix, models.ObsTypeFeature, models.ObsTypeDiscovery, models.ObsTypeChange, models.ObsTypeRefactor) to ensure type-safety.

gemini-code-assist

Code Review

This pull request introduces two significant improvements. First, it adds a database migration to clean up a large number of irrelevant observations, which should improve search quality. Second, it implements a new composite relevance scoring model for search results, factoring in recency, type, and importance, which will likely lead to more relevant context injection. My review focuses on performance aspects of these changes.

gemini-code-assist · 2026-03-21T11:33:40Z

internal/db/gorm/migrations.go

+			for _, pattern := range garbagePatterns {
+				result := tx.Exec("DELETE FROM observations WHERE title LIKE ?", pattern)
+				if result.Error != nil {
+					log.Warn().Err(result.Error).Str("pattern", pattern).Msg("migration 043: delete failed")
+					continue
+				}
+				totalDeleted += result.RowsAffected
+			}


This loop executes a separate DELETE query for each of the ~45 patterns. This results in many database round trips and can be inefficient, especially on a large observations table. Consider combining these into a single DELETE statement using OR conditions to improve performance. For example: DELETE FROM observations WHERE title LIKE ? OR title LIKE ? .... This would make the operation atomic and significantly faster.

gemini-code-assist · 2026-03-21T11:33:40Z

internal/search/manager.go

+	typeWeights := map[models.ObservationType]float64{
+		"decision":  1.4,
+		"bugfix":    1.3,
+		"feature":   1.2,
+		"pattern":   1.2,
+		"discovery": 0.8,
+		"change":    0.7,
+		"refactor":  0.9,
+	}


The typeWeights map is re-initialized on every call to ApplyCompositeScoring. Since its contents are static, consider defining it as a package-level variable to avoid repeated memory allocations. This can improve performance, as this function is part of the search path.

thebtf and others added 4 commits March 21, 2026 13:20

Merge branch 'main' of https://github.com/thebtf/engram

e1a91be

Merge branch 'main' of https://github.com/thebtf/engram

92a786b

coderabbitai bot requested changes Mar 21, 2026

View reviewed changes

thebtf merged commit f91699f into main Mar 21, 2026
2 checks passed

thebtf deleted the feat/retrieval-relevance-scoring branch March 21, 2026 11:31

gemini-code-assist bot reviewed Mar 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: composite relevance scoring + radical observation cleanup#39

feat: composite relevance scoring + radical observation cleanup#39
thebtf merged 4 commits intomainfrom
feat/retrieval-relevance-scoring

thebtf commented Mar 21, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 21, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 21, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 21, 2026

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 21, 2026

Uh oh!

gemini-code-assist bot Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thebtf commented Mar 21, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Migration 043: Radical observation cleanup

2. Composite relevance scoring

Test plan

Summary by CodeRabbit

Примечания к выпуску

Uh oh!

coderabbitai bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

gemini-code-assist bot commented Mar 21, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thebtf commented Mar 21, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 21, 2026 •

edited

Loading