Align with VecturaKit async init and expand README

rudrankriyam · rudrankriyam · commit 5bd891dfbf84 · 2025-10-14T01:16:47.000+05:30
- Make LumoKit init async to match VecturaKit’s async initializer
- Update README: add TOC, API overview, and detailed examples (single file, folder batch, parse-only, custom storage, error handling)
- Use full variable name 'fileManager' in examples for clarity
diff --git a/README.md b/README.md
@@ -1,119 +1,233 @@
 # LumoKit
 
-LumoKit is a lightweight Swift library for **Retrieval-Augmented Generation (RAG)** systems. It integrates with **PicoDocs** for document parsing and **VecturaKit** for semantic search and vector storage.
+LumoKit is a Swift package for building on-device **Retrieval-Augmented Generation (RAG)** workflows. It combines **PicoDocs** for document ingestion with **VecturaKit** for vector storage and semantic search, giving you an end-to-end pipeline for creating searchable knowledge bases.
 
-The name **LumoKit** is derived from the Chinese characters **流** (*liú*) meaning "flow" and **模** (*mó*) meaning "model." It symbolizes the idea of **flowing information through a model**, reflecting data retrieval for a large language model.
+The name **Lumo** blends the Mandarin characters **流** (*liú*, “flow”) and **模** (*mó*, “model”), representing the flow of knowledge into machine learning models.
 
-## Support
+## Learn More
 
-Love this project? Check out my books to explore more of AI and iOS development:
+Deepen your understanding of AI and iOS development with these books:
 - [Exploring AI for iOS Development](https://academy.rudrank.com/product/ai)
 - [Exploring AI-Assisted Coding for iOS Development](https://academy.rudrank.com/product/ai-assisted-coding)
 
-Your support helps to keep this project growing!
+## Table of Contents
+
+- [Features](#features)
+- [API Overview](#api-overview)
+- [Architecture](#architecture)
+- [Requirements](#requirements)
+- [Installation](#installation)
+- [Getting Started](#getting-started)
+  - [1. Configure VecturaKit and initialize LumoKit](#1-configure-vectorakit-and-initialize-lumokit)
+  - [2. Parse a file and index its contents](#2-parse-a-file-and-index-its-contents)
+  - [3. Run semantic search queries](#3-run-semantic-search-queries)
+  - [4. Reset the database when needed](#4-reset-the-database-when-needed)
+- [Examples](#examples)
+  - [Index a single file](#index-a-single-file)
+  - [Index multiple files in a folder](#index-multiple-files-in-a-folder)
+  - [Parse without indexing](#parse-without-indexing)
+  - [Custom storage location](#custom-storage-location)
+  - [Handling errors](#handling-errors)
+- [Error Handling](#error-handling)
+- [Tips](#tips)
+- [Contributing](#contributing)
+- [License](#license)
+
+## Features
+
+- **Document Parsing:** Uses PicoDocs to fetch and convert local files (PDF, Markdown, HTML, and more) into structured text.
+- **Chunking Pipeline:** Splits parsed text into configurable segments ideal for retrieval.
+- **Semantic Search:** Leverages VecturaKit’s vector database to score and rank relevant passages.
+- **Async-First API:** All indexing and search operations are async, ready for Swift concurrency.
+- **Database Management:** Reset or re-index data stores without leaving the app.
+
+## API Overview
 
-## Key Features
+```swift
+public final class LumoKit {
+    public init(config: VecturaConfig) throws
+
+    public func parseAndIndex(url: URL, chunkSize: Int = 500) async throws
+    public func parseDocument(from url: URL, chunkSize: Int = 500) async throws -> [String]
+    public func chunkText(_ text: String, size: Int) throws -> [String]
 
-- **Parse and Chunk Documents**: Use `PicoDocs` to extract content from files and split them into manageable chunks for efficient indexing.
-- **Semantic Search**: Perform similarity-based searches using `VecturaKit`'s vector database.
-- **Configurable Document Indexing**: Set custom chunk sizes to control how documents are segmented for retrieval.
-- **Reset Database**: Quickly reset the vector database to start fresh with new data.
+    public func semanticSearch(
+        query: String,
+        numResults: Int = 5,
+        threshold: Float = 0.7
+    ) async throws -> [VecturaSearchResult]
+
+    public func resetDB() async throws
+}
+
+public enum LumoKitError: Error {
+    case emptyDocument
+    case invalidChunkSize
+}
+```
 
----
+## Architecture
+
+```
+Source Document ──► PicoDocs parsing ──► LumoKit chunking ──► VecturaKit indexing ──► Semantic search
+```
+
+## Requirements
+
+- Swift 6.2+
+- iOS 18.0+, macOS 15.0+
 
 ## Installation
 
-Add the following dependencies to your `Package.swift` file:
+Add LumoKit to your `Package.swift` using Swift Package Manager:
 
 ```swift
 dependencies: [
-    .package(url: "https://github.com/rryam/LumoKit.git", from: "0.1.0"),
-],
+    .package(url: "https://github.com/rryam/LumoKit.git", from: "0.1.0")
+]
 ```
 
-Then import the package in your project:
+Then attach the dependency to your target:
 
 ```swift
-import LumoKit
+.target(
+    name: "AppModule",
+    dependencies: [
+        .product(name: "LumoKit", package: "LumoKit")
+    ]
+)
 ```
 
-## Usage
-
-1. Initialize LumoKit
+## Getting Started
 
-First, set up the configuration for VecturaKit and initialize LumoKit:
+### 1. Configure VecturaKit and initialize LumoKit
 
 ```swift
 import LumoKit
 import VecturaKit
 
 let config = VecturaConfig(
-    name: "my-vector-db",
-    dimension: 384,
-    searchOptions: VecturaConfig.SearchOptions(
+    name: "knowledge-base",
+    searchOptions: .init(
         defaultNumResults: 10,
         minThreshold: 0.7
     )
 )
 
-let lumoKit = try LumoKit(config: config)
+let lumoKit = try await LumoKit(config: config)
 ```
 
-2. Parse and Index Documents
+### 2. Parse a file and index its contents
 
-Parse a file and index its content into the vector database:
+```swift
+let url = URL(fileURLWithPath: "/path/to/document.pdf")
+try await lumoKit.parseAndIndex(url: url, chunkSize: 600)
+```
+
+### 3. Run semantic search queries
 
 ```swift
-let fileURL = URL(fileURLWithPath: "/path/to/your/document.pdf")
-try await lumoKit.parseAndIndex(url: fileURL, chunkSize: 500)
+let results = try await lumoKit.semanticSearch(
+    query: "Explain vector databases",
+    numResults: 5,
+    threshold: 0.65
+)
+
+for result in results {
+    print(result.text)
+}
 ```
 
-3. Perform Semantic Search
+### 4. Reset the database when needed
+
+```swift
+try await lumoKit.resetDB()
+```
 
-Search for relevant documents by querying the indexed database:
+## Examples
+
+### Index a single file
 
 ```swift
-let results = try await lumoKit.semanticSearch(query: "What is Swift?", numResults: 5, threshold: 0.7)
+let url = URL(fileURLWithPath: "/path/to/notes.md")
+try await lumoKit.parseAndIndex(url: url, chunkSize: 500)
+```
 
-for result in results {
-    print("Document ID: \(result.id)")
-    print("Text: \(result.text)")
-    print("Score: \(result.score)")
+### Index multiple files in a folder
+
+```swift
+let folder = URL(fileURLWithPath: "/path/to/docs")
+let fileManager = FileManager.default
+let exts: Set<String> = ["pdf", "md", "markdown", "html", "txt"]
+
+if let urls = try? fileManager.contentsOfDirectory(at: folder, includingPropertiesForKeys: nil) {
+    for url in urls where exts.contains(url.pathExtension.lowercased()) {
+        try await lumoKit.parseAndIndex(url: url, chunkSize: 600)
+    }
 }
 ```
 
-## How It Works
-- Document Parsing: Leverages PicoDocs to parse various file formats (e.g., PDF, Markdown).
-- Chunking: Splits the content into smaller chunks for efficient indexing.
-- Vector Storage: Uses VecturaKit to store embeddings and perform similarity searches.
-- Semantic Search: Retrieves the most relevant chunks for a given query.
+### Parse without indexing
 
-## Example Workflow
+```swift
+let url = URL(fileURLWithPath: "/path/to/paper.pdf")
+let chunks = try await lumoKit.parseDocument(from: url, chunkSize: 400)
+print("chunks: \(chunks.count)")
+```
+
+### Custom storage location
 
 ```swift
-let fileURL = URL(fileURLWithPath: "/path/to/document.pdf")
+import VecturaKit
 
-// Parse and index document
-try await lumoKit.parseAndIndex(url: fileURL, chunkSize: 500)
+let supportDir = try FileManager.default.url(
+    for: .applicationSupportDirectory,
+    in: .userDomainMask,
+    appropriateFor: nil,
+    create: true
+)
 
-// Perform semantic search
-let query = "Explain the importance of vector databases."
-let results = try await lumoKit.semanticSearch(query: query)
+let config = VecturaConfig(
+    name: "kb-shared",
+    directoryURL: supportDir,
+    searchOptions: .init(defaultNumResults: 8, minThreshold: 0.7)
+)
 
-for result in results {
-    print("Relevant Text: \(result.text)")
-}
+let lumoKit = try await LumoKit(config: config)
+```
 
-// Reset the database
-try await lumoKit.resetDB()
+### Handling errors
+
+```swift
+do {
+    _ = try await lumoKit.parseDocument(from: URL(fileURLWithPath: "/empty.pdf"), chunkSize: 0)
+} catch LumoKitError.invalidChunkSize {
+    print("invalid chunk size")
+} catch LumoKitError.emptyDocument {
+    print("no content to parse")
+} catch {
+    print("unexpected error: \(error)")
+}
 ```
 
+## Error Handling
+
+`LumoKitError` reports invalid states:
+- `.emptyDocument` – parsing produced no text content.
+- `.invalidChunkSize` – chunk size must be greater than zero.
+
+Handle these cases to surface actionable messages to users or diagnostics.
+
+## Tips
+
+- Adjust `chunkSize` depending on the model’s context window; larger chunks improve coherence, smaller chunks improve specificity.
+- Provide a custom `directoryURL` in `VecturaConfig` to store the vector database in a shared app container.
+- Combine LumoKit with a language model to build a full RAG stack for summaries, answering questions, or chat experiences.
+
 ## Contributing
-Contributions are welcome! Please fork the repository and submit a pull request with your improvements or suggestions.
 
-## License 
-LumoKit is licensed under the MIT License. See the LICENSE file for more details.
+Contributions are welcome! Open an issue or submit a pull request with improvements.
+
+## License
 
-## Acknowledgments
-- PicoDocs: For powerful document parsing.
-- VecturaKit: For robust vector database functionality.
+LumoKit is available under the MIT license. See the [LICENSE](LICENSE) file for details.
diff --git a/Sources/LumoKit/LumoKit.swift b/Sources/LumoKit/LumoKit.swift
@@ -5,8 +5,8 @@ import Foundation
 public final class LumoKit {
     private let vectura: VecturaKit
 
-    public init(config: VecturaConfig) throws {
-        self.vectura = try VecturaKit(config: config)
+    public init(config: VecturaConfig) async throws {
+        self.vectura = try await VecturaKit(config: config)
     }
 
     /// Parse and index a document from a given file URL

Original file line number	Diff line number	Diff line change
`@@ -5,8 +5,8 @@ import Foundation`
`5`	`5`	`public final class LumoKit {`
`6`	`6`	`private let vectura: VecturaKit`
`7`	`7`
`8`		`- public init(config: VecturaConfig) throws {`
`9`		`- self.vectura = try VecturaKit(config: config)`
	`8`	`+ public init(config: VecturaConfig) async throws {`
	`9`	`+ self.vectura = try await VecturaKit(config: config)`
`10`	`10`	`}`
`11`	`11`
`12`	`12`	`/// Parse and index a document from a given file URL`