feat: add local embeddings via transformers.js (no API key required)#56
Open
futuregerald wants to merge 9 commits intodcostenco:mainfrom
Open
feat: add local embeddings via transformers.js (no API key required)#56futuregerald wants to merge 9 commits intodcostenco:mainfrom
futuregerald wants to merge 9 commits intodcostenco:mainfrom
Conversation
Provides a no-op text provider used when embedding_provider=local and no text API key is configured. Factory routes embeddings to LocalEmbeddingAdapter, not this class.
…t-v1.5) Implements local 768-dim embeddings via @huggingface/transformers pipeline. Features: q8 quantization by default, model ID validation, non-fatal warmup, search_document prefix, text truncation at word boundary, dimension guard.
Verifies loadPromise resolves non-fatally and generateEmbedding throws when pipeline construction fails (simulates missing/corrupted transformers install).
Add embedding_provider=local → LocalEmbeddingAdapter routing. Add text_provider=none → DisabledTextAdapter for local-only setups. Update factory tests: 14 cases covering all providers including local.
Security/correctness fixes: - Fix HF_ENDPOINT check: use URL.parse + hostname comparison instead of substring match (prevents huggingface.co.evil.com bypass, CWE-918) - Add REVISION_PATTERN validation for local_embedding_revision setting (prevents arbitrary git ref injection, CWE-494) - Add null guard before Array.from on pipeline tensor output (prevents opaque TypeError on incompatible transformers version) Minor fixes: - DisabledTextAdapter: remove internal LocalEmbeddingAdapter name from error - Add @internal JSDoc for loadPromise - Add comment explaining MODEL_ID_PATTERN + includes("..") dual check - Fix local-missing-dep.test.ts description to match actual tested behavior Tests: +6 new cases (HF_ENDPOINT subdomain spoof, trusted subdomain, valid revisions, invalid revision, tensor null guard)
Move HF_ENDPOINT and LOCAL_EMBEDDING_MODEL cleanup to beforeEach so a test failure can't leak env vars into the next test. Drops the brittle inline restoration from the three HF_ENDPOINT cases. Co-Authored-By: Claude Opus 4.6 <[email protected]>
The handler code in graphHandlers.ts and ledgerHandlers.ts checked for GOOGLE_API_KEY before calling generateEmbedding(), preventing the local embedding adapter (and any non-Gemini provider) from ever being used. The existing try/catch and .catch() blocks already handle provider failures gracefully, making these guards redundant. - Remove GOOGLE_API_KEY import and search guard from graphHandlers.ts - Remove GOOGLE_API_KEY import and 5 embedding guards from ledgerHandlers.ts - Update dashboard semantic search error message - Update comments in definitions.ts, factMerger.ts, factory.ts Co-Authored-By: Claude Opus 4.6 <[email protected]>
…e docs The Xenova/nomic-embed-text-v1.5 HuggingFace repo returns 401 (removed or made private). Update the default to the official nomic-ai/nomic-embed-text-v1.5 which is publicly accessible. Also updates README, WEB_SCHOLAR.md, and design docs to reflect: - Semantic search works offline with embedding_provider=local - GOOGLE_API_KEY is no longer required for embeddings - Cloud provider keys are optional, not mandatory for core features Co-Authored-By: Claude Opus 4.6 <[email protected]>
98413e9 to
c124464
Compare
dcostenco
added a commit
that referenced
this pull request
Apr 17, 2026
…56) Adds LocalEmbeddingAdapter using nomic-embed-text-v1.5 via @huggingface/transformers. - Local embedding-only provider (embedding_provider=local) - DisabledTextAdapter stub for zero-API-key setups (text_provider=none) - Removes hardcoded GOOGLE_API_KEY guards from handlers - Model ID + revision validation with path traversal protection - Comprehensive test coverage (59 tests pass) Co-authored-by: Gerald Onyango <[email protected]>
dcostenco
added a commit
that referenced
this pull request
Apr 17, 2026
- LocalEmbeddingAdapter: nomic-embed-text-v1.5 via @huggingface/transformers - DisabledTextAdapter: text_provider=none for embedding-only setups - Removed GOOGLE_API_KEY guards from all handlers - Model ID + revision security validation - Updated README, CHANGELOG, ROADMAP - 1622 tests passing across 55 suites - Co-authored-by: Gerald Onyango (PR #56)
brisbanewebdeveloper
pushed a commit
to brisbanewebdeveloper/prism-mcp
that referenced
this pull request
Apr 23, 2026
…costenco#56) Adds LocalEmbeddingAdapter using nomic-embed-text-v1.5 via @huggingface/transformers. - Local embedding-only provider (embedding_provider=local) - DisabledTextAdapter stub for zero-API-key setups (text_provider=none) - Removes hardcoded GOOGLE_API_KEY guards from handlers - Model ID + revision validation with path traversal protection - Comprehensive test coverage (59 tests pass) Co-authored-by: Gerald Onyango <[email protected]>
brisbanewebdeveloper
pushed a commit
to brisbanewebdeveloper/prism-mcp
that referenced
this pull request
Apr 23, 2026
- LocalEmbeddingAdapter: nomic-embed-text-v1.5 via @huggingface/transformers - DisabledTextAdapter: text_provider=none for embedding-only setups - Removed GOOGLE_API_KEY guards from all handlers - Model ID + revision security validation - Updated README, CHANGELOG, ROADMAP - 1622 tests passing across 55 suites - Co-authored-by: Gerald Onyango (PR dcostenco#56)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds local embedding support using
@huggingface/transformersso Prism can generate embeddings without requiring a Google API key.LocalEmbeddingAdapterusingnomic-ai/nomic-embed-text-v1.5via transformers.jsDisabledTextAdapteras a stub fallback when no embedding provider is configuredGOOGLE_API_KEYguards that blocked local-only usagenomic-ai/nomic-embed-text-v1.5@huggingface/transformersis an optional peer dependency — existing setups are unaffectedMotivation
Users who want to run Prism locally without a Google API key were blocked by guards that required the key even when local embeddings could handle it. This change makes Prism work out of the box for local-only setups while preserving Google API support for those who have it configured.
Test plan
LocalEmbeddingAdapter(including broken-install scenarios)