Skip to content

Commit f2d67f3

Browse files
committed
local-rag-with-lightweight-elasticsearch
1 parent 401d76c commit f2d67f3

File tree

8 files changed

+324
-213
lines changed

8 files changed

+324
-213
lines changed
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
📥 Indexing documents...
2+
3+
🔍 Search: 'Can you summarize the performance issues in the API?'
4+
5+
🤖 Asking to model: llama-smoltalk-3.2-1b-instruct
6+
7+
## 💡 Question:
8+
Can you summarize the performance issues in the API?
9+
10+
## 📝 Answer:
11+
The primary performance issue in the API is the slow response times of 3 seconds or more from the 1,000+ queries per minute. The search API, in particular, is experiencing performance degradations, with complex Elasticsearch queries causing the issues. A proposed solution is to implement a 15-minute TTL cache with event-based invalidation to improve response times. Additionally, a three-tiered approach involving optimization of bool queries and added calculated index fields is being implemented to improve query performance. Finally, auto-scaling for the infrastructure is set up to scale to 6 instances at 70% CPU.
12+
13+
14+
## Stats
15+
✅ Indexed 5 documents in 250ms
16+
17+
🔍 Search Latency: 57ms
18+
19+
🤖 AI Latency: 21019ms | 5.8 tokens/s

supporting-blog-content/local-rag-with-lightweight-elasticsearch/app-logs/qwen3:4b-results.md

Lines changed: 0 additions & 106 deletions
This file was deleted.

supporting-blog-content/local-rag-with-lightweight-elasticsearch/app-logs/results.md

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2,22 +2,24 @@
22

33
🔍 Search: 'Can you summarize the performance issues in the API?'
44

5-
## 🤖 Asking to model: llama3.2
5+
🤖 Asking to model: dolphin3.0-qwen2.5-0.5b
66

7-
### 💡 Question:
7+
## 💡 Question:
88
Can you summarize the performance issues in the API?
9-
### 📝 Answer:
10-
According to the transcript, the performance issues in the API are:
9+
## 📝 Answer:
1110

12-
1. Response times increase from 200ms to 3 seconds when handling 1,000+ queries per minute.
13-
2. Complex Elasticsearch queries are slow, with an average execution time of 1.2 seconds.
14-
3. Performance degrades during spikes.
11+
The performance issues in the Search API deployed on September 16, 2025, include:
1512

16-
These issues are attributed to the lack of caching and a complex Elasticsearch query setup.
13+
- Degradation in performance at 1,000+ queries per minute, resulting in a 200ms to 3-second response time for complex queries.
14+
- High response times for queries that do not utilize caching, causing them to take significantly longer than 2 seconds.
15+
- Inability to scale to handle spikes in query traffic, leading to increased CPU limits.
1716

18-
## App performance metrics:
19-
✅ Indexed 5 documents in 96ms
17+
These issues are primarily attributed to the complexity and inefficiency of the Elasticsearch queries, as well as the lack of caching layer. This indicates a need for optimization and addressing these specific performance bottlenecks to ensure the API's scalability and effectiveness for the development team.
2018

21-
🔍 Search Latency: 20ms
19+
## Stats
2220

23-
🤖 Ollama Latency: 36772ms | 24.7 tokens/s
21+
✅ Indexed 5 documents in 627ms
22+
23+
🔍 Search Latency: 81ms
24+
25+
🤖 AI Latency: 16044ms | 9.5 tokens/s
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
📥 Indexing documents...
2+
3+
🔍 Search: 'Can you summarize the performance issues in the API?'
4+
5+
🤖 Asking to model: smollm2-1.7b-instruct
6+
7+
## 💡 Question:
8+
9+
Can you summarize the performance issues in the API?
10+
## 📝 Answer:
11+
12+
The summary of the performance issues in the API can be summarized as follows:
13+
14+
- The API works but performance degrades at 1,000+ queries per minute, resulting in response times jumping from 200ms to 3 seconds.
15+
- The root cause of these issues is the lack of a caching layer in the Elasticsearch queries.
16+
- The team proposed a few solutions, including a 15-minute TTL cache with event-based invalidation, which would be implemented by Sarah.
17+
- They also proposed optimizing boolean queries and adding calculated index fields, which would be taken care of by John.
18+
- To handle the performance spikes, they suggested auto-scaling the infrastructure, with Mike working on this and aiming to scale to 6 instances at 70% CPU by Wednesday.
19+
- They also proposed implementing Redis cache, which would be done by Sarah.
20+
- The team discussed the timeline and timeline of the changes and proposed a phased migration approach: complete migration on October 30th, followed by a partial migration on October 15th.
21+
22+
## Stats
23+
24+
✅ Indexed 5 documents in 141ms
25+
26+
🔍 Search Latency: 26ms
27+
28+
🤖 AI Latency: 47561ms | 4.8 tokens/s

supporting-blog-content/local-rag-with-lightweight-elasticsearch/app-logs/tinyLlama-results.md

Lines changed: 0 additions & 15 deletions
This file was deleted.
Lines changed: 6 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -1,62 +1,11 @@
11
>>> Why Elastic is so cool?
22
3-
That's a **great question**, but there's a **critical clarification first**: **"Elastic" alone isn't a product** — it's almost always a shorthand for **Elasticsearch** (the core search
4-
engine) and the **Elastic Stack** (Elasticsearch + Kibana + Logstash + Beats).
3+
## Raw Response
54

6-
**Why people say "Elastic is so cool" (and why it's actually *Elasticsearch* that's the star):**
5+
```json
6+
{"created":1762881411,"object":"chat.completion","id":"0178b570-4e13-4c1b-9ff4-e2ca5bff1c67","model":"dolphin3.0-qwen2.5-0.5b","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"Elastic is a versatile technology that supports a wide range of applications. Its coolness stems from its ability to manage complex environments and provide a seamless integration with other technologies."}}],"usage":{"prompt_tokens":14,"completion_tokens":35,"total_tokens":49}}
7+
```
78

8-
1. **Real-Time Search & Analytics (The "Cool" Part!)**
9-
Elasticsearch doesn't just *search* — it **indexes, searches, and analyzes data in real time** (millions of events/sec). Unlike traditional databases (which need slow queries or
10-
batch processing), it gives you instant answers.
11-
*Example:* Netflix uses it to show personalized movie recommendations *as you browse* — not after you click "Next" or "Save."
9+
## Answer
1210

13-
2. **Handles "Wild" Data (Unstructured + Structured)**
14-
Most data today is messy (text, logs, images, JSON, CSV). Elasticsearch **natively understands** this.
15-
*Example:* A company can search *both* "user feedback in Slack messages" *and* "product prices from a spreadsheet" in one query.
16-
17-
3. **Scalability That Doesn’t Break**
18-
It’s built to scale **horizontally** (add more servers) without downtime. Handles **petabytes** of data.
19-
*Example:* Airbnb uses it to power their 10M+ listing search across 200+ countries — *without* slowing down.
20-
21-
4. **The Elastic Stack = Full Power**
22-
Elasticsearch isn’t alone — it’s part of a **complete suite**:
23-
- **Logstash**: Ingests data from anywhere (websites, apps, logs).
24-
- **Kibana**: Visualize data (dashboards, maps, charts).
25-
- **Beats**: Lightweight data shippers (for apps).
26-
*This lets you build end-to-end data pipelines:* **Collect → Process → Search → Visualize** in one flow.
27-
28-
5. **No More "Slow Queries" (The Real Pain Point)**
29-
Traditional SQL databases struggle with:
30-
- Full-text search (e.g., "show me products with 'sneakers' AND 'black'")
31-
- Real-time analytics (e.g., "how many users clicked 'checkout' in the last 5 mins?")
32-
Elasticsearch solves both **with one query**.
33-
34-
6. **Open Source (with Enterprise Support)**
35-
Free to use — but Elastic also offers enterprise features (security, ML, etc.) for large teams. *This is why it’s so widely adopted.*
36-
37-
### Why It’s "So Cool" in Practice:
38-
| **Problem** | **Traditional Tool** | **Elasticsearch** |
39-
|----------------------------|----------------------------|---------------------------------------|
40-
| Real-time product search | Slow (seconds) | Instant (milliseconds) |
41-
| Analyze user behavior | Requires complex SQL | Simple queries + real-time dashboards|
42-
| Handle messy logs | Needs ETL pipelines | Ingests logs *directly* |
43-
| Scale to 10M+ users | Databases become slow | Scales horizontally effortlessly |
44-
45-
### Real-World Examples:
46-
- **Netflix**: Uses Elasticsearch for 1B+ users to personalize content.
47-
- **GitHub**: Uses it to search code repositories (text + code structure).
48-
- **Healthcare**: Analyzes patient data for real-time alerts (e.g., "risk of sepsis").
49-
- **Security**: Real-time threat detection (e.g., "suspicious login from Brazil").
50-
51-
### Why People Get Confused:
52-
- **"Elastic" = Elasticsearch** (the product) → Not a standalone tool.
53-
- **"The Elastic Stack"** = The full suite (Elasticsearch + Kibana + Logstash + Beats).
54-
- **Not "Elastic" as in rubber bands** (that’s physics, not tech!).
55-
56-
### The Bottom Line:
57-
**Elasticsearch is "so cool" because it turns messy, real-time data into instant insights — without slowing down.** It’s the reason companies can build **search, analytics, and
58-
monitoring** at scale *without* writing complex code or waiting for results.
59-
60-
If you meant **"Elastic"** as in the rubber band (physics), that’s **not cool** 😄 — but in tech? **100% cool**. 😎
61-
62-
*So next time someone says "Elastic is so cool," you know exactly what they mean!* 🔥
11+
Elastic is a versatile technology that supports a wide range of applications. Its coolness stems from its ability to manage complex environments and provide a seamless integration with other technologies.

0 commit comments

Comments
 (0)