|
1 | 1 | # LumoKit |
2 | 2 |
|
3 | | -LumoKit is a lightweight Swift library for **Retrieval-Augmented Generation (RAG)** systems. It integrates with **PicoDocs** for document parsing and **VecturaKit** for semantic search and vector storage. |
| 3 | +LumoKit is a Swift package for building on-device **Retrieval-Augmented Generation (RAG)** workflows. It combines **PicoDocs** for document ingestion with **VecturaKit** for vector storage and semantic search, giving you an end-to-end pipeline for creating searchable knowledge bases. |
4 | 4 |
|
5 | | -The name **LumoKit** is derived from the Chinese characters **流** (*liú*) meaning "flow" and **模** (*mó*) meaning "model." It symbolizes the idea of **flowing information through a model**, reflecting data retrieval for a large language model. |
| 5 | +The name **Lumo** blends the Mandarin characters **流** (*liú*, “flow”) and **模** (*mó*, “model”), representing the flow of knowledge into machine learning models. |
6 | 6 |
|
7 | | -## Support |
| 7 | +## Learn More |
8 | 8 |
|
9 | | -Love this project? Check out my books to explore more of AI and iOS development: |
| 9 | +Deepen your understanding of AI and iOS development with these books: |
10 | 10 | - [Exploring AI for iOS Development](https://academy.rudrank.com/product/ai) |
11 | 11 | - [Exploring AI-Assisted Coding for iOS Development](https://academy.rudrank.com/product/ai-assisted-coding) |
12 | 12 |
|
13 | | -Your support helps to keep this project growing! |
| 13 | +## Table of Contents |
| 14 | + |
| 15 | +- [Features](#features) |
| 16 | +- [API Overview](#api-overview) |
| 17 | +- [Architecture](#architecture) |
| 18 | +- [Requirements](#requirements) |
| 19 | +- [Installation](#installation) |
| 20 | +- [Getting Started](#getting-started) |
| 21 | + - [1. Configure VecturaKit and initialize LumoKit](#1-configure-vectorakit-and-initialize-lumokit) |
| 22 | + - [2. Parse a file and index its contents](#2-parse-a-file-and-index-its-contents) |
| 23 | + - [3. Run semantic search queries](#3-run-semantic-search-queries) |
| 24 | + - [4. Reset the database when needed](#4-reset-the-database-when-needed) |
| 25 | +- [Examples](#examples) |
| 26 | + - [Index a single file](#index-a-single-file) |
| 27 | + - [Index multiple files in a folder](#index-multiple-files-in-a-folder) |
| 28 | + - [Parse without indexing](#parse-without-indexing) |
| 29 | + - [Custom storage location](#custom-storage-location) |
| 30 | + - [Handling errors](#handling-errors) |
| 31 | +- [Error Handling](#error-handling) |
| 32 | +- [Tips](#tips) |
| 33 | +- [Contributing](#contributing) |
| 34 | +- [License](#license) |
| 35 | + |
| 36 | +## Features |
| 37 | + |
| 38 | +- **Document Parsing:** Uses PicoDocs to fetch and convert local files (PDF, Markdown, HTML, and more) into structured text. |
| 39 | +- **Chunking Pipeline:** Splits parsed text into configurable segments ideal for retrieval. |
| 40 | +- **Semantic Search:** Leverages VecturaKit’s vector database to score and rank relevant passages. |
| 41 | +- **Async-First API:** All indexing and search operations are async, ready for Swift concurrency. |
| 42 | +- **Database Management:** Reset or re-index data stores without leaving the app. |
| 43 | + |
| 44 | +## API Overview |
14 | 45 |
|
15 | | -## Key Features |
| 46 | +```swift |
| 47 | +public final class LumoKit { |
| 48 | + public init(config: VecturaConfig) throws |
| 49 | + |
| 50 | + public func parseAndIndex(url: URL, chunkSize: Int = 500) async throws |
| 51 | + public func parseDocument(from url: URL, chunkSize: Int = 500) async throws -> [String] |
| 52 | + public func chunkText(_ text: String, size: Int) throws -> [String] |
16 | 53 |
|
17 | | -- **Parse and Chunk Documents**: Use `PicoDocs` to extract content from files and split them into manageable chunks for efficient indexing. |
18 | | -- **Semantic Search**: Perform similarity-based searches using `VecturaKit`'s vector database. |
19 | | -- **Configurable Document Indexing**: Set custom chunk sizes to control how documents are segmented for retrieval. |
20 | | -- **Reset Database**: Quickly reset the vector database to start fresh with new data. |
| 54 | + public func semanticSearch( |
| 55 | + query: String, |
| 56 | + numResults: Int = 5, |
| 57 | + threshold: Float = 0.7 |
| 58 | + ) async throws -> [VecturaSearchResult] |
| 59 | + |
| 60 | + public func resetDB() async throws |
| 61 | +} |
| 62 | + |
| 63 | +public enum LumoKitError: Error { |
| 64 | + case emptyDocument |
| 65 | + case invalidChunkSize |
| 66 | +} |
| 67 | +``` |
21 | 68 |
|
22 | | ---- |
| 69 | +## Architecture |
| 70 | + |
| 71 | +``` |
| 72 | +Source Document ──► PicoDocs parsing ──► LumoKit chunking ──► VecturaKit indexing ──► Semantic search |
| 73 | +``` |
| 74 | + |
| 75 | +## Requirements |
| 76 | + |
| 77 | +- Swift 6.2+ |
| 78 | +- iOS 18.0+, macOS 15.0+ |
23 | 79 |
|
24 | 80 | ## Installation |
25 | 81 |
|
26 | | -Add the following dependencies to your `Package.swift` file: |
| 82 | +Add LumoKit to your `Package.swift` using Swift Package Manager: |
27 | 83 |
|
28 | 84 | ```swift |
29 | 85 | dependencies: [ |
30 | | - .package(url: "https://github.com/rryam/LumoKit.git", from: "0.1.0"), |
31 | | -], |
| 86 | + .package(url: "https://github.com/rryam/LumoKit.git", from: "0.1.0") |
| 87 | +] |
32 | 88 | ``` |
33 | 89 |
|
34 | | -Then import the package in your project: |
| 90 | +Then attach the dependency to your target: |
35 | 91 |
|
36 | 92 | ```swift |
37 | | -import LumoKit |
| 93 | +.target( |
| 94 | + name: "AppModule", |
| 95 | + dependencies: [ |
| 96 | + .product(name: "LumoKit", package: "LumoKit") |
| 97 | + ] |
| 98 | +) |
38 | 99 | ``` |
39 | 100 |
|
40 | | -## Usage |
41 | | - |
42 | | -1. Initialize LumoKit |
| 101 | +## Getting Started |
43 | 102 |
|
44 | | -First, set up the configuration for VecturaKit and initialize LumoKit: |
| 103 | +### 1. Configure VecturaKit and initialize LumoKit |
45 | 104 |
|
46 | 105 | ```swift |
47 | 106 | import LumoKit |
48 | 107 | import VecturaKit |
49 | 108 |
|
50 | 109 | let config = VecturaConfig( |
51 | | - name: "my-vector-db", |
52 | | - dimension: 384, |
53 | | - searchOptions: VecturaConfig.SearchOptions( |
| 110 | + name: "knowledge-base", |
| 111 | + searchOptions: .init( |
54 | 112 | defaultNumResults: 10, |
55 | 113 | minThreshold: 0.7 |
56 | 114 | ) |
57 | 115 | ) |
58 | 116 |
|
59 | | -let lumoKit = try LumoKit(config: config) |
| 117 | +let lumoKit = try await LumoKit(config: config) |
60 | 118 | ``` |
61 | 119 |
|
62 | | -2. Parse and Index Documents |
| 120 | +### 2. Parse a file and index its contents |
63 | 121 |
|
64 | | -Parse a file and index its content into the vector database: |
| 122 | +```swift |
| 123 | +let url = URL(fileURLWithPath: "/path/to/document.pdf") |
| 124 | +try await lumoKit.parseAndIndex(url: url, chunkSize: 600) |
| 125 | +``` |
| 126 | + |
| 127 | +### 3. Run semantic search queries |
65 | 128 |
|
66 | 129 | ```swift |
67 | | -let fileURL = URL(fileURLWithPath: "/path/to/your/document.pdf") |
68 | | -try await lumoKit.parseAndIndex(url: fileURL, chunkSize: 500) |
| 130 | +let results = try await lumoKit.semanticSearch( |
| 131 | + query: "Explain vector databases", |
| 132 | + numResults: 5, |
| 133 | + threshold: 0.65 |
| 134 | +) |
| 135 | + |
| 136 | +for result in results { |
| 137 | + print(result.text) |
| 138 | +} |
69 | 139 | ``` |
70 | 140 |
|
71 | | -3. Perform Semantic Search |
| 141 | +### 4. Reset the database when needed |
| 142 | + |
| 143 | +```swift |
| 144 | +try await lumoKit.resetDB() |
| 145 | +``` |
72 | 146 |
|
73 | | -Search for relevant documents by querying the indexed database: |
| 147 | +## Examples |
| 148 | + |
| 149 | +### Index a single file |
74 | 150 |
|
75 | 151 | ```swift |
76 | | -let results = try await lumoKit.semanticSearch(query: "What is Swift?", numResults: 5, threshold: 0.7) |
| 152 | +let url = URL(fileURLWithPath: "/path/to/notes.md") |
| 153 | +try await lumoKit.parseAndIndex(url: url, chunkSize: 500) |
| 154 | +``` |
77 | 155 |
|
78 | | -for result in results { |
79 | | - print("Document ID: \(result.id)") |
80 | | - print("Text: \(result.text)") |
81 | | - print("Score: \(result.score)") |
| 156 | +### Index multiple files in a folder |
| 157 | + |
| 158 | +```swift |
| 159 | +let folder = URL(fileURLWithPath: "/path/to/docs") |
| 160 | +let fileManager = FileManager.default |
| 161 | +let exts: Set<String> = ["pdf", "md", "markdown", "html", "txt"] |
| 162 | + |
| 163 | +if let urls = try? fileManager.contentsOfDirectory(at: folder, includingPropertiesForKeys: nil) { |
| 164 | + for url in urls where exts.contains(url.pathExtension.lowercased()) { |
| 165 | + try await lumoKit.parseAndIndex(url: url, chunkSize: 600) |
| 166 | + } |
82 | 167 | } |
83 | 168 | ``` |
84 | 169 |
|
85 | | -## How It Works |
86 | | -- Document Parsing: Leverages PicoDocs to parse various file formats (e.g., PDF, Markdown). |
87 | | -- Chunking: Splits the content into smaller chunks for efficient indexing. |
88 | | -- Vector Storage: Uses VecturaKit to store embeddings and perform similarity searches. |
89 | | -- Semantic Search: Retrieves the most relevant chunks for a given query. |
| 170 | +### Parse without indexing |
90 | 171 |
|
91 | | -## Example Workflow |
| 172 | +```swift |
| 173 | +let url = URL(fileURLWithPath: "/path/to/paper.pdf") |
| 174 | +let chunks = try await lumoKit.parseDocument(from: url, chunkSize: 400) |
| 175 | +print("chunks: \(chunks.count)") |
| 176 | +``` |
| 177 | + |
| 178 | +### Custom storage location |
92 | 179 |
|
93 | 180 | ```swift |
94 | | -let fileURL = URL(fileURLWithPath: "/path/to/document.pdf") |
| 181 | +import VecturaKit |
95 | 182 |
|
96 | | -// Parse and index document |
97 | | -try await lumoKit.parseAndIndex(url: fileURL, chunkSize: 500) |
| 183 | +let supportDir = try FileManager.default.url( |
| 184 | + for: .applicationSupportDirectory, |
| 185 | + in: .userDomainMask, |
| 186 | + appropriateFor: nil, |
| 187 | + create: true |
| 188 | +) |
98 | 189 |
|
99 | | -// Perform semantic search |
100 | | -let query = "Explain the importance of vector databases." |
101 | | -let results = try await lumoKit.semanticSearch(query: query) |
| 190 | +let config = VecturaConfig( |
| 191 | + name: "kb-shared", |
| 192 | + directoryURL: supportDir, |
| 193 | + searchOptions: .init(defaultNumResults: 8, minThreshold: 0.7) |
| 194 | +) |
102 | 195 |
|
103 | | -for result in results { |
104 | | - print("Relevant Text: \(result.text)") |
105 | | -} |
| 196 | +let lumoKit = try await LumoKit(config: config) |
| 197 | +``` |
106 | 198 |
|
107 | | -// Reset the database |
108 | | -try await lumoKit.resetDB() |
| 199 | +### Handling errors |
| 200 | + |
| 201 | +```swift |
| 202 | +do { |
| 203 | + _ = try await lumoKit.parseDocument(from: URL(fileURLWithPath: "/empty.pdf"), chunkSize: 0) |
| 204 | +} catch LumoKitError.invalidChunkSize { |
| 205 | + print("invalid chunk size") |
| 206 | +} catch LumoKitError.emptyDocument { |
| 207 | + print("no content to parse") |
| 208 | +} catch { |
| 209 | + print("unexpected error: \(error)") |
| 210 | +} |
109 | 211 | ``` |
110 | 212 |
|
| 213 | +## Error Handling |
| 214 | + |
| 215 | +`LumoKitError` reports invalid states: |
| 216 | +- `.emptyDocument` – parsing produced no text content. |
| 217 | +- `.invalidChunkSize` – chunk size must be greater than zero. |
| 218 | + |
| 219 | +Handle these cases to surface actionable messages to users or diagnostics. |
| 220 | + |
| 221 | +## Tips |
| 222 | + |
| 223 | +- Adjust `chunkSize` depending on the model’s context window; larger chunks improve coherence, smaller chunks improve specificity. |
| 224 | +- Provide a custom `directoryURL` in `VecturaConfig` to store the vector database in a shared app container. |
| 225 | +- Combine LumoKit with a language model to build a full RAG stack for summaries, answering questions, or chat experiences. |
| 226 | + |
111 | 227 | ## Contributing |
112 | | -Contributions are welcome! Please fork the repository and submit a pull request with your improvements or suggestions. |
113 | 228 |
|
114 | | -## License |
115 | | -LumoKit is licensed under the MIT License. See the LICENSE file for more details. |
| 229 | +Contributions are welcome! Open an issue or submit a pull request with improvements. |
| 230 | + |
| 231 | +## License |
116 | 232 |
|
117 | | -## Acknowledgments |
118 | | -- PicoDocs: For powerful document parsing. |
119 | | -- VecturaKit: For robust vector database functionality. |
| 233 | +LumoKit is available under the MIT license. See the [LICENSE](LICENSE) file for details. |
0 commit comments