Skip to content

Commit 5bd891d

Browse files
committed
Align with VecturaKit async init and expand README
- Make LumoKit init async to match VecturaKit’s async initializer - Update README: add TOC, API overview, and detailed examples (single file, folder batch, parse-only, custom storage, error handling) - Use full variable name 'fileManager' in examples for clarity
1 parent 09af034 commit 5bd891d

2 files changed

Lines changed: 174 additions & 60 deletions

File tree

README.md

Lines changed: 172 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -1,119 +1,233 @@
11
# LumoKit
22

3-
LumoKit is a lightweight Swift library for **Retrieval-Augmented Generation (RAG)** systems. It integrates with **PicoDocs** for document parsing and **VecturaKit** for semantic search and vector storage.
3+
LumoKit is a Swift package for building on-device **Retrieval-Augmented Generation (RAG)** workflows. It combines **PicoDocs** for document ingestion with **VecturaKit** for vector storage and semantic search, giving you an end-to-end pipeline for creating searchable knowledge bases.
44

5-
The name **LumoKit** is derived from the Chinese characters **** (*liú*) meaning "flow" and **** (**) meaning "model." It symbolizes the idea of **flowing information through a model**, reflecting data retrieval for a large language model.
5+
The name **Lumo** blends the Mandarin characters **** (*liú*, “flow”) and **** (**, “model”), representing the flow of knowledge into machine learning models.
66

7-
## Support
7+
## Learn More
88

9-
Love this project? Check out my books to explore more of AI and iOS development:
9+
Deepen your understanding of AI and iOS development with these books:
1010
- [Exploring AI for iOS Development](https://academy.rudrank.com/product/ai)
1111
- [Exploring AI-Assisted Coding for iOS Development](https://academy.rudrank.com/product/ai-assisted-coding)
1212

13-
Your support helps to keep this project growing!
13+
## Table of Contents
14+
15+
- [Features](#features)
16+
- [API Overview](#api-overview)
17+
- [Architecture](#architecture)
18+
- [Requirements](#requirements)
19+
- [Installation](#installation)
20+
- [Getting Started](#getting-started)
21+
- [1. Configure VecturaKit and initialize LumoKit](#1-configure-vectorakit-and-initialize-lumokit)
22+
- [2. Parse a file and index its contents](#2-parse-a-file-and-index-its-contents)
23+
- [3. Run semantic search queries](#3-run-semantic-search-queries)
24+
- [4. Reset the database when needed](#4-reset-the-database-when-needed)
25+
- [Examples](#examples)
26+
- [Index a single file](#index-a-single-file)
27+
- [Index multiple files in a folder](#index-multiple-files-in-a-folder)
28+
- [Parse without indexing](#parse-without-indexing)
29+
- [Custom storage location](#custom-storage-location)
30+
- [Handling errors](#handling-errors)
31+
- [Error Handling](#error-handling)
32+
- [Tips](#tips)
33+
- [Contributing](#contributing)
34+
- [License](#license)
35+
36+
## Features
37+
38+
- **Document Parsing:** Uses PicoDocs to fetch and convert local files (PDF, Markdown, HTML, and more) into structured text.
39+
- **Chunking Pipeline:** Splits parsed text into configurable segments ideal for retrieval.
40+
- **Semantic Search:** Leverages VecturaKit’s vector database to score and rank relevant passages.
41+
- **Async-First API:** All indexing and search operations are async, ready for Swift concurrency.
42+
- **Database Management:** Reset or re-index data stores without leaving the app.
43+
44+
## API Overview
1445

15-
## Key Features
46+
```swift
47+
public final class LumoKit {
48+
public init(config: VecturaConfig) throws
49+
50+
public func parseAndIndex(url: URL, chunkSize: Int = 500) async throws
51+
public func parseDocument(from url: URL, chunkSize: Int = 500) async throws -> [String]
52+
public func chunkText(_ text: String, size: Int) throws -> [String]
1653

17-
- **Parse and Chunk Documents**: Use `PicoDocs` to extract content from files and split them into manageable chunks for efficient indexing.
18-
- **Semantic Search**: Perform similarity-based searches using `VecturaKit`'s vector database.
19-
- **Configurable Document Indexing**: Set custom chunk sizes to control how documents are segmented for retrieval.
20-
- **Reset Database**: Quickly reset the vector database to start fresh with new data.
54+
public func semanticSearch(
55+
query: String,
56+
numResults: Int = 5,
57+
threshold: Float = 0.7
58+
) async throws -> [VecturaSearchResult]
59+
60+
public func resetDB() async throws
61+
}
62+
63+
public enum LumoKitError: Error {
64+
case emptyDocument
65+
case invalidChunkSize
66+
}
67+
```
2168

22-
---
69+
## Architecture
70+
71+
```
72+
Source Document ──► PicoDocs parsing ──► LumoKit chunking ──► VecturaKit indexing ──► Semantic search
73+
```
74+
75+
## Requirements
76+
77+
- Swift 6.2+
78+
- iOS 18.0+, macOS 15.0+
2379

2480
## Installation
2581

26-
Add the following dependencies to your `Package.swift` file:
82+
Add LumoKit to your `Package.swift` using Swift Package Manager:
2783

2884
```swift
2985
dependencies: [
30-
.package(url: "https://github.com/rryam/LumoKit.git", from: "0.1.0"),
31-
],
86+
.package(url: "https://github.com/rryam/LumoKit.git", from: "0.1.0")
87+
]
3288
```
3389

34-
Then import the package in your project:
90+
Then attach the dependency to your target:
3591

3692
```swift
37-
import LumoKit
93+
.target(
94+
name: "AppModule",
95+
dependencies: [
96+
.product(name: "LumoKit", package: "LumoKit")
97+
]
98+
)
3899
```
39100

40-
## Usage
41-
42-
1. Initialize LumoKit
101+
## Getting Started
43102

44-
First, set up the configuration for VecturaKit and initialize LumoKit:
103+
### 1. Configure VecturaKit and initialize LumoKit
45104

46105
```swift
47106
import LumoKit
48107
import VecturaKit
49108

50109
let config = VecturaConfig(
51-
name: "my-vector-db",
52-
dimension: 384,
53-
searchOptions: VecturaConfig.SearchOptions(
110+
name: "knowledge-base",
111+
searchOptions: .init(
54112
defaultNumResults: 10,
55113
minThreshold: 0.7
56114
)
57115
)
58116

59-
let lumoKit = try LumoKit(config: config)
117+
let lumoKit = try await LumoKit(config: config)
60118
```
61119

62-
2. Parse and Index Documents
120+
### 2. Parse a file and index its contents
63121

64-
Parse a file and index its content into the vector database:
122+
```swift
123+
let url = URL(fileURLWithPath: "/path/to/document.pdf")
124+
try await lumoKit.parseAndIndex(url: url, chunkSize: 600)
125+
```
126+
127+
### 3. Run semantic search queries
65128

66129
```swift
67-
let fileURL = URL(fileURLWithPath: "/path/to/your/document.pdf")
68-
try await lumoKit.parseAndIndex(url: fileURL, chunkSize: 500)
130+
let results = try await lumoKit.semanticSearch(
131+
query: "Explain vector databases",
132+
numResults: 5,
133+
threshold: 0.65
134+
)
135+
136+
for result in results {
137+
print(result.text)
138+
}
69139
```
70140

71-
3. Perform Semantic Search
141+
### 4. Reset the database when needed
142+
143+
```swift
144+
try await lumoKit.resetDB()
145+
```
72146

73-
Search for relevant documents by querying the indexed database:
147+
## Examples
148+
149+
### Index a single file
74150

75151
```swift
76-
let results = try await lumoKit.semanticSearch(query: "What is Swift?", numResults: 5, threshold: 0.7)
152+
let url = URL(fileURLWithPath: "/path/to/notes.md")
153+
try await lumoKit.parseAndIndex(url: url, chunkSize: 500)
154+
```
77155

78-
for result in results {
79-
print("Document ID: \(result.id)")
80-
print("Text: \(result.text)")
81-
print("Score: \(result.score)")
156+
### Index multiple files in a folder
157+
158+
```swift
159+
let folder = URL(fileURLWithPath: "/path/to/docs")
160+
let fileManager = FileManager.default
161+
let exts: Set<String> = ["pdf", "md", "markdown", "html", "txt"]
162+
163+
if let urls = try? fileManager.contentsOfDirectory(at: folder, includingPropertiesForKeys: nil) {
164+
for url in urls where exts.contains(url.pathExtension.lowercased()) {
165+
try await lumoKit.parseAndIndex(url: url, chunkSize: 600)
166+
}
82167
}
83168
```
84169

85-
## How It Works
86-
- Document Parsing: Leverages PicoDocs to parse various file formats (e.g., PDF, Markdown).
87-
- Chunking: Splits the content into smaller chunks for efficient indexing.
88-
- Vector Storage: Uses VecturaKit to store embeddings and perform similarity searches.
89-
- Semantic Search: Retrieves the most relevant chunks for a given query.
170+
### Parse without indexing
90171

91-
## Example Workflow
172+
```swift
173+
let url = URL(fileURLWithPath: "/path/to/paper.pdf")
174+
let chunks = try await lumoKit.parseDocument(from: url, chunkSize: 400)
175+
print("chunks: \(chunks.count)")
176+
```
177+
178+
### Custom storage location
92179

93180
```swift
94-
let fileURL = URL(fileURLWithPath: "/path/to/document.pdf")
181+
import VecturaKit
95182

96-
// Parse and index document
97-
try await lumoKit.parseAndIndex(url: fileURL, chunkSize: 500)
183+
let supportDir = try FileManager.default.url(
184+
for: .applicationSupportDirectory,
185+
in: .userDomainMask,
186+
appropriateFor: nil,
187+
create: true
188+
)
98189

99-
// Perform semantic search
100-
let query = "Explain the importance of vector databases."
101-
let results = try await lumoKit.semanticSearch(query: query)
190+
let config = VecturaConfig(
191+
name: "kb-shared",
192+
directoryURL: supportDir,
193+
searchOptions: .init(defaultNumResults: 8, minThreshold: 0.7)
194+
)
102195

103-
for result in results {
104-
print("Relevant Text: \(result.text)")
105-
}
196+
let lumoKit = try await LumoKit(config: config)
197+
```
106198

107-
// Reset the database
108-
try await lumoKit.resetDB()
199+
### Handling errors
200+
201+
```swift
202+
do {
203+
_ = try await lumoKit.parseDocument(from: URL(fileURLWithPath: "/empty.pdf"), chunkSize: 0)
204+
} catch LumoKitError.invalidChunkSize {
205+
print("invalid chunk size")
206+
} catch LumoKitError.emptyDocument {
207+
print("no content to parse")
208+
} catch {
209+
print("unexpected error: \(error)")
210+
}
109211
```
110212

213+
## Error Handling
214+
215+
`LumoKitError` reports invalid states:
216+
- `.emptyDocument` – parsing produced no text content.
217+
- `.invalidChunkSize` – chunk size must be greater than zero.
218+
219+
Handle these cases to surface actionable messages to users or diagnostics.
220+
221+
## Tips
222+
223+
- Adjust `chunkSize` depending on the model’s context window; larger chunks improve coherence, smaller chunks improve specificity.
224+
- Provide a custom `directoryURL` in `VecturaConfig` to store the vector database in a shared app container.
225+
- Combine LumoKit with a language model to build a full RAG stack for summaries, answering questions, or chat experiences.
226+
111227
## Contributing
112-
Contributions are welcome! Please fork the repository and submit a pull request with your improvements or suggestions.
113228

114-
## License
115-
LumoKit is licensed under the MIT License. See the LICENSE file for more details.
229+
Contributions are welcome! Open an issue or submit a pull request with improvements.
230+
231+
## License
116232

117-
## Acknowledgments
118-
- PicoDocs: For powerful document parsing.
119-
- VecturaKit: For robust vector database functionality.
233+
LumoKit is available under the MIT license. See the [LICENSE](LICENSE) file for details.

Sources/LumoKit/LumoKit.swift

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ import Foundation
55
public final class LumoKit {
66
private let vectura: VecturaKit
77

8-
public init(config: VecturaConfig) throws {
9-
self.vectura = try VecturaKit(config: config)
8+
public init(config: VecturaConfig) async throws {
9+
self.vectura = try await VecturaKit(config: config)
1010
}
1111

1212
/// Parse and index a document from a given file URL

0 commit comments

Comments
 (0)