A PostgreSQL extension that provides tokenizers for full-text search.
The official ghcr.io/tensorchord/vchord_bm25-postgres
Docker image comes pre-configured with several complementary extensions:
pg_tokenizer
- This extensionVectorChord-bm25
- Native BM25 Ranking IndexVectorChord
- Scalable, high-performance, and disk-efficient vector similarity searchpgvector
- Popular vector similarity search
Simply run the Docker container as shown below:
docker run \
--name vectorchord-demo \
-e POSTGRES_PASSWORD=mysecretpassword \
-p 5432:5432 \
-d ghcr.io/tensorchord/vchord_bm25-postgres:pg17-v0.2.0
Once everything’s set up, you can connect to the database using the psql
command line tool. The default username is postgres
, and the default password is mysecretpassword
. Here’s how to connect:
psql -h localhost -p 5432 -U postgres
After connecting, run the following SQL to make sure the extension is enabled:
CREATE EXTENSION pg_tokenizer;
Then, don’t forget to add tokenizer_catalog
to your search_path
:
ALTER SYSTEM SET search_path TO "$user", public, tokenizer_catalog;
SELECT pg_reload_conf();
SELECT create_tokenizer('tokenizer1', $$
model = "llmlingua2"
$$);
SELECT tokenize('PostgreSQL is a powerful, open-source object-relational database system. It has over 15 years of active development.', 'tokenizer1');
More examples can be found in docs/03-examples.md.