diff --git a/_partials/_early_access_11_25.md b/_partials/_early_access_11_25.md new file mode 100644 index 0000000000..1ac4003e4a --- /dev/null +++ b/_partials/_early_access_11_25.md @@ -0,0 +1 @@ +Early access: October 2025 diff --git a/_partials/_integration-prereqs-cloud-only.md b/_partials/_integration-prereqs-cloud-only.md index e2be24287e..22e7fd8e3c 100644 --- a/_partials/_integration-prereqs-cloud-only.md +++ b/_partials/_integration-prereqs-cloud-only.md @@ -1,7 +1,7 @@ To follow the steps on this page: -* Create a target [$SERVICE_LONG][create-service] with time-series and analytics enabled. +* Create a target [$SERVICE_LONG][create-service] with the Real-time analytics capability. You need your [connection details][connection-info]. diff --git a/_partials/_integration-prereqs.md b/_partials/_integration-prereqs.md index 2dd9da6482..86fb5409a7 100644 --- a/_partials/_integration-prereqs.md +++ b/_partials/_integration-prereqs.md @@ -1,6 +1,6 @@ To follow the steps on this page: -* Create a target [$SERVICE_LONG][create-service] with time-series and analytics enabled. +* Create a target [$SERVICE_LONG][create-service] with the Real-time analytics capability. You need [your connection details][connection-info]. This procedure also works for [$SELF_LONG][enable-timescaledb]. diff --git a/use-timescale/extensions/index.md b/use-timescale/extensions/index.md index 89e6375287..33817f7d51 100644 --- a/use-timescale/extensions/index.md +++ b/use-timescale/extensions/index.md @@ -16,13 +16,14 @@ The following $PG extensions are installed with each $SERVICE_LONG: ## $COMPANY extensions -| Extension | Description | Enabled by default | -|--------------------------------------------|------------------------------------|-----------------------------------------------------| -| [pgai][pgai] | Helper functions for AI workflows | For [AI-focused][services] $SERVICE_SHORTs | -| [pgvector][pgvector] | Vector similarity search for $PG | For [AI-focused][services] $SERVICE_SHORTs | -| [pgvectorscale][pgvectorscale] | Advanced indexing for vector data | For [AI-focused][services] $SERVICE_SHORTs | -| [timescaledb_toolkit][timescaledb-toolkit] | TimescaleDB Toolkit | For [Real-time analytics][services] $SERVICE_SHORTs | -| [timescaledb][timescaledb] | TimescaleDB | For all $SERVICE_SHORTs | +| Extension | Description | Enabled by default | +|---------------------------------------------|--------------------------------------------|-----------------------------------------------------------------------| +| [pgai][pgai] | Helper functions for AI workflows | For [AI-focused][services] $SERVICE_SHORTs | +| [pg_textsearch][pg_textsearch] | [BM25][bm25-wiki]-based full-text search | Currently early access. For development and staging environments only | +| [pgvector][pgvector] | Vector similarity search for $PG | For [AI-focused][services] $SERVICE_SHORTs | +| [pgvectorscale][pgvectorscale] | Advanced indexing for vector data | For [AI-focused][services] $SERVICE_SHORTs | +| [timescaledb_toolkit][timescaledb-toolkit] | TimescaleDB Toolkit | For [Real-time analytics][services] $SERVICE_SHORTs | +| [timescaledb][timescaledb] | TimescaleDB | For all $SERVICE_SHORTs | ## $PG built-in extensions @@ -138,6 +139,7 @@ The following $PG extensions are installed with each $SERVICE_LONG: [refint]: https://www.postgresql.org/docs/current/contrib-spi.html [seg]: https://www.postgresql.org/docs/current/seg.html [pgcrypto]: /use-timescale/:currentVersion:/extensions/pgcrypto/ +[pg_textsearch]: /use-timescale/:currentVersion:/extensions/pg-textsearch/ [sslinfo]: https://www.postgresql.org/docs/current/sslinfo.html [tablefunc]: https://www.postgresql.org/docs/current/tablefunc.html [tcn]: https://www.postgresql.org/docs/current/tcn.html @@ -153,3 +155,4 @@ The following $PG extensions are installed with each $SERVICE_LONG: [timescale-extensions]: #timescale-extensions [third-party]: #third-party-extensions [services]: /getting-started/:currentVersion:/ +[bm25-wiki]: https://en.wikipedia.org/wiki/Okapi_BM25 \ No newline at end of file diff --git a/use-timescale/extensions/pg-textsearch.md b/use-timescale/extensions/pg-textsearch.md new file mode 100644 index 0000000000..fbf7cb63f0 --- /dev/null +++ b/use-timescale/extensions/pg-textsearch.md @@ -0,0 +1,336 @@ +--- +title: Optimize full text search with BM25 +excerpt: Set up and optimize BM25-based full-text search using the pg_textsearch extension +keywords: [pg_textsearch, BM25, full-text search, text search, ranking, hybrid search] +tags: [search, indexing, performance, BM25] +--- + +import EA1125 from "versionContent/_partials/_early_access_11_25.mdx"; +import IntegrationPrereqs from "versionContent/_partials/_integration-prereqs.mdx"; + +# Optimize full text search with BM25 + +$PG full-text search at scale consistently hits a wall where performance degrades catastrophically. +$COMPANY's [pg_textsearch][pg_textsearch-repo] brings modern [BM25][bm25-wiki]-based full-text search directly into $PG, +with a memtable architecture for efficient indexing and ranking. `pg_textsearch` integrates seamlessly with SQL and +provides better search quality and performance than the $PG built-in full-text search. + +BM25 scores in `pg_textsearch` are returned as negative values, where lower (more negative) numbers indicate better +matches. `pg_textsearch` implements the following: + +* **Corpus-aware ranking**: BM25 uses inverse document frequency to weight rare terms higher +* **Term frequency saturation**: prevents documents with excessive term repetition from dominating results +* **Length normalization**: adjusts scores based on document length relative to corpus average +* **Relative ranking**: focuses on rank order rather than absolute score values + +This page shows you how to install `pg_textsearch`, configure BM25 indexes, and optimize your search capabilities using +the following best practice: + +* **Memory planning**: size your `index_memory_limit` based on corpus vocabulary and document count +* **Language configuration**: choose appropriate text search configurations for your data language +* **Hybrid search**: combine with pgvector or pgvectorscale for applications requiring both semantic and keyword search +* **Query optimization**: use score thresholds to filter low-relevance results +* **Index monitoring**: regularly check index usage and memory consumption + + this preview release is designed for development and staging environments. It is not recommended for use with hypertables. + +## Prerequisites + + + +## Install pg_textsearch + +To install this $PG extension: + + + +1. **Connect to your $SERVICE_LONG** + + In [$CONSOLE][services-portal] open an [SQL editor][in-console-editors]. You can also connect to your $SERVICE_SHORT using [psql][connect-using-psql]. + +1. **Enable the extension on your $SERVICE_LONG** + + - For new services, simply enable the extension: + ```sql + CREATE EXTENSION pg_textsearch; + ``` + + - For existing services, update your instance, then enable the extension: + + The extension may not be available until after your next scheduled maintenance window. To pick up the update + immediately, manually pause and restart your service. + +1. **Verify the installation** + + ```sql + SELECT * FROM pg_extension WHERE extname = 'pg_textsearch'; + ``` + + + +You have installed `pg_textsearch` on $CLOUD_LONG. + +## Create BM25 indexes on your data + +BM25 indexes provide modern relevance ranking that outperforms $PG's built-in ts_rank functions by using corpus +statistics and better algorithmic design. + +To create a BM25 index with pg_textsearch: + + + +1. **Create a table with text content** + + ```sql + CREATE TABLE products ( + id serial PRIMARY KEY, + name text, + description text, + category text, + price numeric + ); + ``` + +1. **Insert sample data** + + ```sql + INSERT INTO products (name, description, category, price) VALUES + ('Mechanical Keyboard', 'Durable mechanical switches with RGB backlighting for gaming and productivity', 'Electronics', 149.99), + ('Ergonomic Mouse', 'Wireless mouse with ergonomic design to reduce wrist strain during long work sessions', 'Electronics', 79.99), + ('Standing Desk', 'Adjustable height desk for better posture and productivity throughout the workday', 'Furniture', 599.99); + ``` + +1. **Create a BM25 index** + + ```sql + CREATE INDEX products_search_idx ON products + USING bm25(description) + WITH (text_config='english'); + ``` + + BM25 supports single-column indexes only. + + + +You have created a BM25 index for full-text search. + +## Optimize search queries for performance + +Use efficient query patterns to leverage BM25 ranking and optimize search performance. + + + +1. **Perform ranked searches using the distance operator** + + ```sql + SELECT name, description, + description <@> to_bm25query('ergonomic work', 'products_search_idx') as score + FROM products + ORDER BY description <@> to_bm25query('ergonomic work', 'products_search_idx') + LIMIT 3; + ``` + +1. **Filter results by score threshold** + + ```sql + SELECT name, + description <@> to_bm25query('wireless', 'products_search_idx') as score + FROM products + WHERE description <@> to_bm25query('wireless', 'products_search_idx') < -2.0; + ``` + +1. **Combine with standard SQL operations** + + ```sql + SELECT category, name, + description <@> to_bm25query('ergonomic', 'products_search_idx') as score + FROM products + WHERE price < 500 + AND description <@> to_bm25query('ergonomic', 'products_search_idx') < -1.0 + ORDER BY description <@> to_bm25query('ergonomic', 'products_search_idx') + LIMIT 5; + ``` + +1. **Verify index usage with EXPLAIN** + + ```sql + EXPLAIN SELECT * FROM products + ORDER BY description <@> to_bm25query('wireless keyboard', 'products_search_idx') + LIMIT 5; + ``` + + + +You have optimized your search queries for BM25 ranking. + +## Build hybrid search with semantic and keyword search + +Combine `pg_textsearch` with `pgvector` or `pgvectorscale` to build powerful hybrid search systems that use both semantic vector search and keyword BM25 search. + + + +1. **Enable the [vectorscale][pg-vectorscale] extension on your $SERVICE_LONG** + ```sql + CREATE EXTENSION IF NOT EXISTS vectorscale CASCADE; + ``` +1. **Create a table with both text content and vector embeddings** + + ```sql + CREATE TABLE articles ( + id serial PRIMARY KEY, + title text, + content text, + embedding vector(1536) -- OpenAI ada-002 embedding dimension + ); + ``` + +1. **Create indexes for both search types** + + ```sql + -- Vector index for semantic search + CREATE INDEX articles_embedding_idx ON articles + USING hnsw (embedding vector_cosine_ops); + + -- Keyword index for BM25 search + CREATE INDEX articles_content_idx ON articles + USING bm25(content) + WITH (text_config='english'); + ``` + +1. **Perform hybrid search using [reciprocal rank fusion][recip-rank-fusion]** + + ```sql + WITH vector_search AS ( + SELECT id, + ROW_NUMBER() OVER (ORDER BY embedding <=> '[0.1, 0.2, 0.3]'::vector) AS rank + FROM articles + ORDER BY embedding <=> '[0.1, 0.2, 0.3]'::vector + LIMIT 20 + ), + keyword_search AS ( + SELECT id, + ROW_NUMBER() OVER (ORDER BY content <@> to_bm25query('query performance', 'articles_content_idx')) AS rank + FROM articles + ORDER BY content <@> to_bm25query('query performance', 'articles_content_idx') + LIMIT 20 + ) + SELECT a.id, + a.title, + COALESCE(1.0 / (60 + v.rank), 0.0) + COALESCE(1.0 / (60 + k.rank), 0.0) AS combined_score + FROM articles a + LEFT JOIN vector_search v ON a.id = v.id + LEFT JOIN keyword_search k ON a.id = k.id + WHERE v.id IS NOT NULL OR k.id IS NOT NULL + ORDER BY combined_score DESC + LIMIT 10; + ``` + +1. **Adjust relative weights for different search types** + + ```sql + WITH vector_search AS ( + SELECT id, + ROW_NUMBER() OVER (ORDER BY embedding <=> '[0.1, 0.2, 0.3]'::vector) AS rank + FROM articles + ORDER BY embedding <=> '[0.1, 0.2, 0.3]'::vector + LIMIT 20 + ), + keyword_search AS ( + SELECT id, + ROW_NUMBER() OVER (ORDER BY content <@> to_bm25query('query performance', 'articles_content_idx')) AS rank + FROM articles + ORDER BY content <@> to_bm25query('query performance', 'articles_content_idx') + LIMIT 20 + ) + SELECT + a.id, + a.title, + 0.7 * COALESCE(1.0 / (60 + v.rank), 0.0) + -- 70% weight to vectors + 0.3 * COALESCE(1.0 / (60 + k.rank), 0.0) -- 30% weight to keywords + AS combined_score + FROM articles a + LEFT JOIN vector_search v ON a.id = v.id + LEFT JOIN keyword_search k ON a.id = k.id + WHERE v.id IS NOT NULL OR k.id IS NOT NULL + ORDER BY combined_score DESC + LIMIT 10; + ``` + + + +You have implemented hybrid search combining semantic and keyword search. + +## Configuration options + +Customize `pg_textsearch` behavior for your specific use case and data characteristics. + + + +1. **Configure the memory limit** + + The size of the memtable depends primarily on the number of distinct terms in your corpus. A corpus with longer + documents or more varied vocabulary requires more memory per document. + ```sql + -- Set memory limit per index (default 64MB) + SET pg_textsearch.index_memory_limit = '128MB'; + ``` + +1. **Configure language-specific text processing** + + ```sql + -- French language configuration + CREATE INDEX products_fr_idx ON products_fr + USING pg_textsearch(description) + WITH (text_config='french'); + + -- Simple tokenization without stemming + CREATE INDEX products_simple_idx ON products + USING pg_textsearch(description) + WITH (text_config='simple'); + ``` + +1. **Tune BM25 parameters** + + ```sql + -- Adjust term frequency saturation (k1) and length normalization (b) + CREATE INDEX products_custom_idx ON products + USING bm25(description) + WITH (text_config='english', k1=1.5, b=0.8); + ``` + + 1. **Monitor index usage and memory consumption** + + - Check index usage statistics + ```sql + SELECT schemaname, relname, indexrelname, idx_scan, idx_tup_read + FROM pg_stat_user_indexes + WHERE indexrelid::regclass::text ~ 'bm25'; + ``` + + - View detailed index information + ```sql + SELECT bm25_debug_dump_index('products_search_idx'); + ``` + + + +You have configured `pg_textsearch` for optimal performance. For production applications, consider implementing result +caching and pagination to improve user experience with large result sets. + +## Current limitations + +This preview release focuses on core BM25 functionality. It has the following limitations: + +* **Memory-only storage**: indexes are limited by `pg_textsearch.index_memory_limit` (default 64MB) +* **No phrase queries**: cannot search for exact multi-word phrases yet + +These limitations will be addressed in upcoming releases with disk-based segments and expanded query capabilities. + + +[bm25-wiki]: https://en.wikipedia.org/wiki/Okapi_BM25 +[pg_textsearch-repo]: https://github.com/timescale/tapir +[in-console-editors]: /getting-started/:currentVersion:/run-queries-from-console/ +[services-portal]: https://console.cloud.timescale.com/dashboard/services +[connect-using-psql]: /integrations/:currentVersion:/psql/#connect-to-your-service +[recip-rank-fusion]: https://en.wikipedia.org/wiki/Mean_reciprocal_rank +[pg-vectorscale]: /ai/:currentVersion:/sql-interface-for-pgvector-and-timescale-vector/#installing-the-pgvector-and-pgvectorscale-extensions diff --git a/use-timescale/page-index/page-index.js b/use-timescale/page-index/page-index.js index e35af9c4b1..443988e557 100644 --- a/use-timescale/page-index/page-index.js +++ b/use-timescale/page-index/page-index.js @@ -583,6 +583,11 @@ module.exports = [ href: "extensions", excerpt: "The Postgres extensions installed in each Tiger service", children: [ + { + title: "Optimize full text search with BM25", + href: "pg-textsearch", + excerpt: "Set up and optimize BM25-based full-text search for efficient ranked text searching", + }, { title: "Create a chatbot using pgvector", href: "pgvector",