Skip to content

Commit 637fc01

Browse files
authored
refactor: extract embedder into src/embeddings/ subsystem (ROADMAP 3.10) (#433)
* refactor: domain error hierarchy replacing ad-hoc error handling (ROADMAP 3.8) Add structured domain errors (CodegraphError base + 7 subclasses) to replace the mix of process.exit(1), throw new Error, and console.error scattered across library code. - New src/errors.js with ParseError, DbError, ConfigError, ResolutionError, EngineError, AnalysisError, BoundaryError - Library code throws domain errors instead of calling process.exit(1) - CLI top-level catch formats CodegraphError with [CODE] prefix - MCP catch returns structured { isError, code } responses - CLI commands use parseAsync() so async errors propagate - CI gate commands (check, manifesto) use process.exitCode instead of exit - All error classes exported from public API (src/index.js) Impact: 52 functions changed, 215 affected * fix: address Greptile review feedback on PR #431 - Use expect.assertions(4) in db.test.js to prevent silent assertion skips - Change snapshot "already exists" error from DbError to ConfigError (it's a missing --force flag, not a database failure) Impact: 1 functions changed, 0 affected * refactor: extract embedder.js into src/embeddings/ subsystem (ROADMAP 3.10) Split the monolithic 1,100-line embedder.js into a modular subsystem with clear separation of concerns: models, generator, strategies, stores, and search modules. Uses a pluggable VectorStore JSDoc contract for future ANN backends. Reuses existing db/repository/embeddings.js for search preparation. All 9 consumer import paths updated, old file deleted. Impact: 26 functions changed, 16 affected * fix: address review feedback on embedder extraction - Remove dead _cos_sim variable from models.js (greptile) - Fix embedding-benchmark.js import path (greptile) - Update workflow path filters and cache keys for new directory (greptile) - Update stale file references in test comments and CLAUDE.md (greptile) Impact: 1 functions changed, 1 affected * fix: harden prepareSearch with try/catch for DB leak and use getEmbeddingCount - Wrap post-open logic in try/catch so DB is closed on unexpected exceptions - Switch from hasEmbeddings to getEmbeddingCount for clearer zero-count check Impact: 1 functions changed, 0 affected * fix: guard cosineSim against zero-magnitude vectors returning NaN Return 0 instead of NaN when either vector has zero magnitude (e.g. corrupted DB row). In practice embed() stores L2-normalised vectors, but this makes the contract explicit. Impact: 1 functions changed, 0 affected * fix: add @internal JSDoc tags to non-public model helpers Mark getModelConfig, promptInstall, and loadTransformers as @internal since they are exported only for sibling module use, not the public barrel. * fix: unexport initEmbeddingsSchema — only used within generator.js Impact: 1 functions changed, 1 affected
1 parent 6b0f44d commit 637fc01

28 files changed

Lines changed: 1166 additions & 1116 deletions

.github/workflows/benchmark.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -228,7 +228,7 @@ jobs:
228228
uses: actions/cache@v5
229229
with:
230230
path: ~/.cache/huggingface
231-
key: hf-models-${{ runner.os }}-${{ hashFiles('src/embedder.js') }}
231+
key: hf-models-${{ runner.os }}-${{ hashFiles('src/embeddings/**') }}
232232
restore-keys: hf-models-${{ runner.os }}-
233233

234234
- name: Build graph

.github/workflows/embedding-regression.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ on:
66
workflow_dispatch:
77
pull_request:
88
paths:
9-
- 'src/embedder.js'
9+
- 'src/embeddings/**'
1010
- 'tests/search/**'
1111
- 'package.json'
1212

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ JS source is plain JavaScript (ES modules) in `src/`. No transpilation step. The
4545
| `builder.js` | Graph building: file collection, parsing, import resolution, incremental hashing |
4646
| `parser.js` | tree-sitter WASM wrapper; `LANGUAGE_REGISTRY` + per-language extractors for functions, classes, methods, imports, exports, call sites |
4747
| `queries.js` | Query functions: symbol search, file deps, impact analysis, diff-impact; `SYMBOL_KINDS` constant defines all node kinds |
48-
| `embedder.js` | Semantic search with `@huggingface/transformers`; multi-query RRF ranking |
48+
| `embeddings/` | Embedding subsystem: model management, vector generation, semantic/keyword/hybrid search, CLI formatting |
4949
| `db.js` | SQLite schema and operations (`better-sqlite3`) |
5050
| `mcp.js` | MCP server exposing graph queries to AI agents; single-repo by default, `--multi-repo` to enable cross-repo access |
5151
| `cycles.js` | Circular dependency detection |

scripts/embedding-benchmark.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ const { version, srcDir, cleanup } = await resolveBenchmarkSource();
2626
const dbPath = path.join(root, '.codegraph', 'graph.db');
2727

2828
const { buildEmbeddings, MODELS, searchData, disposeModel } = await import(
29-
srcImport(srcDir, 'embedder.js')
29+
srcImport(srcDir, 'embeddings/index.js')
3030
);
3131

3232
// Redirect console.log to stderr so only JSON goes to stdout

src/cli/commands/embed.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
import path from 'node:path';
2-
import { buildEmbeddings, DEFAULT_MODEL, EMBEDDING_STRATEGIES } from '../../embedder.js';
2+
import { buildEmbeddings, DEFAULT_MODEL, EMBEDDING_STRATEGIES } from '../../embeddings/index.js';
33

44
export const command = {
55
name: 'embed [dir]',

src/cli/commands/models.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
import { DEFAULT_MODEL, MODELS } from '../../embedder.js';
1+
import { DEFAULT_MODEL, MODELS } from '../../embeddings/index.js';
22

33
export const command = {
44
name: 'models',

src/cli/commands/search.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
import { search } from '../../embedder.js';
1+
import { search } from '../../embeddings/index.js';
22

33
export const command = {
44
name: 'search <query>',

0 commit comments

Comments
 (0)