Skip to content

Commit a9d9f74

Browse files
authored
fix(search): point jina-code alias at published HF repo (#1053)
The `jina-code` alias mapped to `Xenova/jina-embeddings-v2-base-code`, which 404s on Hugging Face. Point it at `jinaai/jina-embeddings-v2-base-code`, the published code embedding model, drop the stale "requires HF token" note in the README, and add a regression test for the alias. Fixes #1025.
1 parent 0d9ecee commit a9d9f74

3 files changed

Lines changed: 6 additions & 2 deletions

File tree

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -428,7 +428,7 @@ A single trailing semicolon is ignored (falls back to single-query mode). The `-
428428
| `minilm` | all-MiniLM-L6-v2 | 384 | ~23 MB | Apache-2.0 | Fastest, good for quick iteration |
429429
| `jina-small` | jina-embeddings-v2-small-en | 512 | ~33 MB | Apache-2.0 | Better quality, still small |
430430
| `jina-base` | jina-embeddings-v2-base-en | 768 | ~137 MB | Apache-2.0 | High quality, 8192 token context |
431-
| `jina-code` | jina-embeddings-v2-base-code | 768 | ~137 MB | Apache-2.0 | Best for code search, trained on code+text (requires HF token) |
431+
| `jina-code` | jina-embeddings-v2-base-code | 768 | ~137 MB | Apache-2.0 | Best for code search, trained on code+text |
432432
| `nomic` | nomic-embed-text-v1 | 768 | ~137 MB | Apache-2.0 | Good quality, 8192 context |
433433
| `nomic-v1.5` (default) | nomic-embed-text-v1.5 | 768 | ~137 MB | Apache-2.0 | **Improved nomic, Matryoshka dimensions** |
434434
| `bge-large` | bge-large-en-v1.5 | 1024 | ~335 MB | MIT | Best general retrieval, top MTEB scores |

src/domain/search/models.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ export const MODELS: Record<string, ModelConfig> = {
4242
quantized: false,
4343
},
4444
'jina-code': {
45-
name: 'Xenova/jina-embeddings-v2-base-code',
45+
name: 'jinaai/jina-embeddings-v2-base-code',
4646
dim: 768,
4747
contextWindow: 8192,
4848
desc: 'Code-aware (~137MB). Trained on code+text, best for code search.',

tests/search/embedding-strategy.test.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,10 @@ describe('MODELS contextWindow', () => {
143143
expect(config.contextWindow, `${key} missing contextWindow`).toBeGreaterThan(0);
144144
}
145145
});
146+
147+
test('jina-code points to the published code embedding model', () => {
148+
expect(MODELS['jina-code'].name).toBe('jinaai/jina-embeddings-v2-base-code');
149+
});
146150
});
147151

148152
describe('buildEmbeddings with structured strategy', () => {

0 commit comments

Comments
 (0)