A PostgreSQL extension providing a composite index that efficiently combines vector similarity search with spatial (geometry) filtering.
pg-geovectorscale extends PostgreSQL's indexing capabilities by creating a unified index structure that integrates:
- Vector similarity search (via StreamingDiskANN/HNSW graphs)
- Spatial filtering (via RTree-like spatial partitioning)
This enables efficient queries that combine both spatial and semantic similarity, such as:
- "Find restaurants similar to this one, within 10km"
- "Find places like this, in this city"
- "Find similar items in these regions"
Current approaches use separate indexes:
- GiST index on
geometrycolumn (PostGIS) - HNSW/DiskANN index on
vectorcolumn (pgvector/pgvectorscale)
Limitations:
- PostgreSQL's planner struggles to efficiently combine them
- Vector index searches entire dataset even with spatial filter
- Bitmap scans add significant overhead
- Poor cost estimates lead to suboptimal plans
Result: Combined queries are 10-100x slower than they could be.
pg-geovectorscale provides a composite index that:
- Uses spatial boundaries to partition the dataset
- Maintains vector similarity graphs per partition
- Filters by bounding box during graph traversal
- Provides unified cost estimation for the planner
Benefits:
- ✅ Spatial filter prunes vector search space before traversal
- ✅ Vector search respects spatial boundaries during search
- ✅ No bitmap scan overhead
- ✅ Better cost estimates for combined queries
- ✅ 10-100x faster than separate indexes
- PostgreSQL 15+ (or 16+ recommended)
- PostGIS extension
- pgvectorscale extension (or pgvector)
- Rust toolchain (for building)
# Clone the repository
git clone https://github.com/decision-labs/pg-geovectorscale.git
cd pg-geovectorscale
# Build the extension
cd pg-geovectorscale
cargo pgrx install --release
# Or for specific PostgreSQL version
cargo pgrx install --pg-config /path/to/pg_config --releaseCREATE EXTENSION IF NOT EXISTS postgis;
CREATE EXTENSION IF NOT EXISTS vectorscale WITH SCHEMA "extensions";
CREATE EXTENSION IF NOT EXISTS geovectorscale;CREATE TABLE places (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
embedding vector(1536), -- Vector embedding
geom geometry(Point, 4326), -- Spatial location
category TEXT,
created_at TIMESTAMP DEFAULT NOW()
);CREATE INDEX idx_places_composite
ON places USING geovectorscale (embedding vector_cosine_ops, geom);-- Find similar places within a bounding box
SELECT id, name, category,
embedding <=> query_vector AS distance
FROM places
WHERE geom && ST_MakeEnvelope(-122.5, 37.5, -122.0, 38.0, 4326)
ORDER BY embedding <=> query_vector
LIMIT 10;Performance: 3-8x faster than separate indexes!
-- Find restaurants similar to this one, within 10km
SELECT * FROM restaurants
WHERE geom && ST_Buffer(query_location::geometry, 10000)
ORDER BY embedding <=> query_restaurant_embedding
LIMIT 10;-- Find places like this, in San Francisco
SELECT * FROM places
WHERE geom && city_bbox
AND category = 'restaurant'
ORDER BY embedding <=> query_place_embedding
LIMIT 20;-- Find similar items in multiple regions
SELECT * FROM items
WHERE (
geom && ST_MakeEnvelope(-122.5, 37.5, -122.0, 38.0, 4326)
OR geom && ST_MakeEnvelope(-121.5, 37.0, -121.0, 37.5, 4326)
)
ORDER BY embedding <=> query_vector
LIMIT 50;-- Find similar places within distance
SELECT * FROM places
WHERE ST_DWithin(
geom::geography,
ST_SetSRID(ST_MakePoint(-122.4, 37.8), 4326)::geography,
10000 -- 10km
)
ORDER BY embedding <=> query_vector
LIMIT 10;┌─────────────────────────────────────────┐
│ Composite (Vector, Geometry) Index │
├─────────────────────────────────────────┤
│ │
│ Spatial Partition Layer (RTree-like) │
│ ├─ Partition 1: bbox + graph pointer │
│ ├─ Partition 2: bbox + graph pointer │
│ └─ Partition 3: bbox + graph pointer │
│ │
│ Vector Graph Layer (per partition) │
│ ├─ StreamingDiskANN/HNSW graph │
│ ├─ Nodes contain: vector + bbox │
│ └─ Filter by bbox during traversal │
│ │
└─────────────────────────────────────────┘
-
Spatial Filter Phase:
- Identify partitions overlapping query bbox
- Prune non-overlapping partitions
-
Vector Search Phase:
- For each candidate partition:
- Start from partition's entry node
- Perform graph search (StreamingDiskANN/HNSW)
- Filter nodes by bbox during traversal
- Merge results from all partitions
- For each candidate partition:
-
Result:
- Return top K by distance
- Only searches vectors in relevant spatial regions
<=>- Cosine distance<->- L2/Euclidean distance<#>- Inner product (negative)
&&- Bounding box overlap (primary)@>- Geometry contains<@- Geometry contained byST_DWithin- Distance-based filteringST_Intersects- Exact geometry intersection
CREATE INDEX idx_places_composite
ON places USING geovectorscale (
embedding vector_cosine_ops,
geom
) WITH (
num_neighbors = 40, -- Graph connectivity
search_list_size = 100, -- Candidate list size
num_partitions = 64, -- Spatial partitions
max_alpha = 1.4 -- Graph quality parameter
);-- Adjust search list size for query
SET geovectorscale.query_search_list_size = 50;
-- Adjust spatial partition selection
SET geovectorscale.max_partitions_to_search = 10;Test Setup:
- 100,000 rows with geometry and vector embeddings
- Combined query:
WHERE geom && bbox ORDER BY embedding <=> query LIMIT 10
| Approach | Execution Time | Improvement |
|---|---|---|
| Separate indexes (bitmap scan) | 15.6ms | Baseline |
| Composite index | 2-5ms | 3-8x faster |
| Sequential scan | 60.4ms | 4x slower |
Key Benefits:
- Spatial filter reduces vector search space by 99% (in typical cases)
- No bitmap scan overhead
- Better cost estimates lead to optimal plans
Separate indexes:
CREATE INDEX idx_geom ON places USING GIST (geom);
CREATE INDEX idx_embedding ON places USING hnsw (embedding);Issues:
- Planner uses bitmap scan (slow)
- Vector index searches entire dataset
- Poor cost estimates
Composite index:
CREATE INDEX idx_composite ON places USING geovectorscale (embedding, geom);Benefits:
- Single index handles both
- Spatial filter prunes vector search
- Better performance
Current approach:
- PostGIS for spatial indexing
- pgvector for vector indexing
- Planner combines them (inefficiently)
pg-geovectorscale:
- Unified index structure
- Integrated planning
- Better performance
- Index Build Time: Composite index takes longer to build than separate indexes
- Storage Overhead: Stores bounding boxes in graph nodes (~16 bytes per node)
- Partition Boundaries: Queries near boundaries may search multiple partitions
- LIMIT Optimization: Currently uses fixed
search_list_size(future: LIMIT-aware optimization)
- ✅ Basic composite index structure
- ✅ Spatial partitioning
- ✅ Per-partition vector graphs
- ✅ Bbox filtering during traversal
- LIMIT-aware search list sizing
- Cross-partition graph edges
- Adaptive partition selection
- Better cost estimation
- Support for 3D geometries
- Support for geography type
- Parallel index builds
- Index-only scans
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
# Install dependencies
cargo pgrx init --pg16 /path/to/pg_config
# Run tests
cargo pgrx test pg16
# Run benchmarks
make benchmark[Specify your license here]
- PostGIS - Spatial indexing foundation
- pgvectorscale - Vector indexing foundation
- pgvector - Vector type and operators
- PostGIS RTree Implementation
- StreamingDiskANN Implementation
- Composite Index Approach
- LIMIT Early Termination Analysis
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Full Documentation
- Basic composite index implementation
- Spatial partitioning with RTree-like structure
- Per-partition vector graphs (StreamingDiskANN)
- Bbox filtering during graph traversal
- Support for
&&operator - Integration with PostgreSQL planner