11# DDoc Search
22
3- A high-performance, multi-tenant document search API built with Ruby on Rails. This application provides full-text search capabilities powered by Elasticsearch , with support for tenant isolation, rate limiting, caching, and asynchronous document indexing via Kafka.
3+ A high-performance, multi-tenant document search API built with Ruby on Rails. This application provides full-text search capabilities powered by Weaviate , with support for tenant isolation, rate limiting, caching, and asynchronous document indexing via Kafka.
44
55## Quick Start
66
77### Running Locally
88
991 . ** Install dependencies**
10+
1011 ``` bash
1112 bundle install
1213 ```
1314
14- 2 . ** Start required services** (Elasticsearch, Redis, Kafka)
15+ 2 . ** Start required services** (Weaviate, Redis, Kafka)
16+
1517 ``` bash
1618 docker compose -f docker-compose.dev.yml up -d
1719 ```
1820
19- 3 . ** Setup database and Elasticsearch**
21+ 3 . ** Setup database and Weaviate**
22+
2023 ``` bash
2124 rails db:drop db:create db:migrate
22- rails runner " Document.__elasticsearch__.create_index! force: true "
25+ rails runner " Document.ensure_weaviate_schema! "
2326 ```
2427
25284 . ** Create a test tenant**
29+
2630 ``` bash
2731 rails runner tmp/create_tenant.rb
2832 # Save the API key from the output!
@@ -43,9 +47,7 @@ A high-performance, multi-tenant document search API built with Ruby on Rails. T
4347
4448### Ready-to-Use curl Commands
4549
46- ``` bash
47- export TEST_API_KEY=" df1a5764855153924486beaae96cebef739f3f54f68e28ebdf0338aea5155ee5"
48- ```
50+ Using test API key: ` aead1b358e37d400e37bd9f6d031fe3a0fab53f6f6e3839b494740b7373658fe `
4951
5052** Health Check:**
5153
@@ -57,7 +59,7 @@ curl http://localhost:3000/health | jq '.'
5759
5860``` bash
5961curl -X POST http://localhost:3000/v1/documents \
60- -H " X-API-Key: $TEST_API_KEY " \
62+ -H " X-API-Key: aead1b358e37d400e37bd9f6d031fe3a0fab53f6f6e3839b494740b7373658fe " \
6163 -H " Content-Type: application/json" \
6264 -d ' {
6365 "document": {
@@ -68,25 +70,38 @@ curl -X POST http://localhost:3000/v1/documents \
6870 }' | jq ' .'
6971```
7072
73+ ``` bash
74+ curl -X POST http://localhost:3000/v1/documents \
75+ -H " X-API-Key: aead1b358e37d400e37bd9f6d031fe3a0fab53f6f6e3839b494740b7373658fe" \
76+ -H " Content-Type: application/json" \
77+ -d ' {
78+ "document": {
79+ "title": "The Symphony of Earth",
80+ "content": "' " $( cat test/fixtures/files/earth.txt | tr ' \n' ' ' | sed ' s/"/\\"/g' ) " ' ",
81+ "metadata": {"category": "story", "tags": ["earth", "life", "harmony"]}
82+ }
83+ }' | jq ' .'
84+ ```
85+
7186** Retrieve Document:**
7287
7388``` bash
7489curl http://localhost:3000/v1/documents/1 \
75- -H " X-API-Key: $TEST_API_KEY " | jq ' .'
90+ -H " X-API-Key: aead1b358e37d400e37bd9f6d031fe3a0fab53f6f6e3839b494740b7373658fe " | jq ' .'
7691```
7792
7893** Search Documents:**
7994
8095``` bash
8196curl " http://localhost:3000/v1/search?q=car&page=1&per_page=10" \
82- -H " X-API-Key: $TEST_API_KEY " | jq ' .'
97+ -H " X-API-Key: aead1b358e37d400e37bd9f6d031fe3a0fab53f6f6e3839b494740b7373658fe " | jq ' .'
8398```
8499
85100** Delete Document:**
86101
87102``` bash
88103curl -X DELETE http://localhost:3000/v1/documents/1 \
89- -H " X-API-Key: $TEST_API_KEY " | jq ' .'
104+ -H " X-API-Key: aead1b358e37d400e37bd9f6d031fe3a0fab53f6f6e3839b494740b7373658fe " | jq ' .'
90105```
91106
92107### Test Files Available
@@ -102,10 +117,10 @@ The project includes three test files in `test/fixtures/files/` that you can use
102117DDoc Search is designed to handle document storage and search for multiple tenants with the following key features:
103118
104119- ** Multi-tenant Architecture** : Complete data isolation per tenant with subdomain-based routing
105- - ** Full-text Search** : Powered by Elasticsearch with custom analyzers and highlighting
120+ - ** Full-text Search** : Powered by Weaviate with BM25 keyword search
106121- ** Asynchronous Processing** : Kafka-based message queue for document indexing operations
107122- ** High Performance** : Redis caching, circuit breakers, and rate limiting
108- - ** Scalable Design** : Horizontal scaling support with configurable shards and replicas
123+ - ** Scalable Design** : Horizontal scaling support with Weaviate's distributed architecture
109124- ** RESTful API** : Clean JSON API with comprehensive error handling
110125
111126## Architecture
@@ -115,7 +130,7 @@ DDoc Search is designed to handle document storage and search for multiple tenan
115130- ** Framework** : Ruby on Rails 8.0.3
116131- ** Ruby Version** : 3.3.0 (3.4.7 for Docker)
117132- ** Database** : SQLite3 (development/test), with support for multiple databases in production
118- - ** Search Engine** : Elasticsearch 8.0
133+ - ** Search Engine** : Weaviate 1.26.1
119134- ** Cache** : Redis 5.0 with connection pooling
120135- ** Message Queue** : Kafka (via Karafka 2.4)
121136- ** Background Jobs** : Sidekiq 7.0
@@ -125,7 +140,7 @@ DDoc Search is designed to handle document storage and search for multiple tenan
125140### Key Components
126141
127142- ** Tenant Middleware** : Request-level tenant identification via API keys
128- - ** Circuit Breaker** : Prevents cascading failures to Elasticsearch
143+ - ** Circuit Breaker** : Prevents cascading failures to Weaviate
129144- ** Rate Limiter** : Redis-based sliding window rate limiting per tenant
130145- ** Document Indexing** : Async Kafka-based indexing with automatic retries
131146- ** Search Analytics** : Background job processing for usage metrics
@@ -164,7 +179,7 @@ Ensure you have the following installed:
164179- Ruby 3.3.0 or higher
165180- Bundler 2.x
166181- PostgreSQL (if migrating from SQLite)
167- - Elasticsearch 8.0 +
182+ - Weaviate 1.26 +
168183- Redis 5.0+
169184- Kafka (Apache Kafka or compatible)
170185
@@ -191,8 +206,8 @@ Ensure you have the following installed:
191206 # Database
192207 DATABASE_URL=sqlite3:storage/development.sqlite3
193208
194- # Elasticsearch
195- ELASTICSEARCH_URL =http://localhost:9200
209+ # Weaviate
210+ WEAVIATE_URL =http://localhost:8080
196211
197212 # Redis
198213 REDIS_URL=redis://localhost:6379/0
@@ -213,13 +228,13 @@ Ensure you have the following installed:
213228 rails db:seed # Optional: creates sample data
214229 ```
215230
216- 5 . ** Configure Elasticsearch **
231+ 5 . ** Configure Weaviate **
217232
218- Ensure Elasticsearch is running, then create the index :
233+ Ensure Weaviate is running, then create the schema :
219234
220235 ``` bash
221236 rails console
222- > Document.__elasticsearch__.create_index ! force: true
237+ > Document.ensure_weaviate_schema !
223238 ```
224239
2252406 . ** Start required services**
@@ -272,7 +287,7 @@ docker build -t ddoc_search .
272287docker run -d \
273288 -p 80:80 \
274289 -e RAILS_MASTER_KEY=< value from config/master.key> \
275- -e ELASTICSEARCH_URL =http://elasticsearch:9200 \
290+ -e WEAVIATE_URL =http://weaviate:8080 \
276291 -e REDIS_URL=redis://redis:6379/0 \
277292 -e KAFKA_BROKERS=kafka:9092 \
278293 --name ddoc_search \
@@ -294,7 +309,7 @@ kamal deploy
294309
295310### Application Configuration
296311
297- - [ config/initializers/elasticsearch .rb] ( config/initializers/elasticsearch .rb ) - Elasticsearch client configuration
312+ - [ config/initializers/weaviate .rb] ( config/initializers/weaviate .rb ) - Weaviate client configuration
298313- [ config/initializers/redis.rb] ( config/initializers/redis.rb ) - Redis connection pool setup
299314- [ config/initializers/karafka.rb] ( config/initializers/karafka.rb ) - Kafka consumer configuration
300315- [ config/initializers/sidekiq.rb] ( config/initializers/sidekiq.rb ) - Sidekiq background job configuration
@@ -389,10 +404,10 @@ bundle exec brakeman
389404## Performance Features
390405
391406- ** Caching** : Search results cached for 10 minutes, documents cached for 1 hour
392- - ** Circuit Breaker** : Automatic fallback to SQL search when Elasticsearch is unavailable
407+ - ** Circuit Breaker** : Automatic fallback to SQL search when Weaviate is unavailable
393408- ** Rate Limiting** : Configurable per-tenant rate limits with Redis-backed sliding window
394409- ** Connection Pooling** : Redis connection pooling for efficient resource utilization
395- - ** Elasticsearch Optimization ** : 10 shards, 2 replicas, custom analyzers with snowball stemming
410+ - ** Weaviate BM25 Search ** : Efficient keyword-based search with relevance scoring
396411
397412## Monitoring
398413
@@ -401,7 +416,7 @@ The application includes:
401416- Health check endpoints at ` /health ` and ` /up `
402417- Search analytics tracking (query, results count, response time)
403418- Lograge for structured logging
404- - Circuit breaker metrics for Elasticsearch availability
419+ - Circuit breaker metrics for Weaviate availability
405420
406421## License
407422
0 commit comments