-
Notifications
You must be signed in to change notification settings - Fork 58
Add comprehensive MkDocs documentation with modern UI #298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
3ce2c8d
642aa06
e9b8349
0282c9f
387aa46
de1cd91
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,51 @@ | ||
| name: Deploy Documentation | ||
|
|
||
| on: | ||
| push: | ||
| branches: | ||
| - main | ||
| paths: | ||
| - 'docs/**' | ||
| - 'mkdocs.yml' | ||
| - '.github/workflows/docs.yml' | ||
| - 'requirements-docs.txt' | ||
| workflow_dispatch: | ||
|
|
||
| permissions: | ||
| contents: write | ||
|
|
||
| jobs: | ||
| deploy: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - name: Checkout repository | ||
| uses: actions/checkout@v4 | ||
| with: | ||
| fetch-depth: 0 | ||
|
|
||
| - name: Set up Python | ||
| uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: '3.11' | ||
|
|
||
| - name: Cache pip dependencies | ||
| uses: actions/cache@v3 | ||
| with: | ||
| path: ~/.cache/pip | ||
| key: ${{ runner.os }}-pip-${{ hashFiles('requirements-docs.txt') }} | ||
| restore-keys: | | ||
| ${{ runner.os }}-pip- | ||
|
|
||
| - name: Install dependencies | ||
| run: | | ||
| pip install --upgrade pip | ||
| pip install -r requirements-docs.txt | ||
|
|
||
| - name: Configure Git | ||
| run: | | ||
| git config --global user.name "github-actions[bot]" | ||
| git config --global user.email "github-actions[bot]@users.noreply.github.com" | ||
|
|
||
| - name: Deploy to GitHub Pages | ||
| run: | | ||
| mkdocs gh-deploy --force --clean --verbose |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,70 @@ | ||
| # Vectara Ingest Documentation | ||
|
|
||
| This directory contains the documentation for vectara-ingest, built with MkDocs and Material theme. | ||
|
|
||
| ## Building Locally | ||
|
|
||
| To build and preview the documentation locally: | ||
|
|
||
| ```bash | ||
| # Install documentation dependencies | ||
| pip install -r requirements-docs.txt | ||
|
|
||
| # Serve documentation locally | ||
| mkdocs serve | ||
| ``` | ||
|
|
||
| Then open http://127.0.0.1:8000 in your browser. | ||
|
|
||
| ## Building Static Site | ||
|
|
||
| ```bash | ||
| mkdocs build | ||
| ``` | ||
|
|
||
| The static site will be generated in the `site/` directory. | ||
|
|
||
| ## Deployment | ||
|
|
||
| Documentation is automatically deployed to GitHub Pages when changes are pushed to the `main` branch. | ||
|
|
||
| The deployment is handled by `.github/workflows/docs.yml`. | ||
|
|
||
| ## Documentation Structure | ||
|
|
||
| ``` | ||
| docs/ | ||
| ├── index.md # Home page | ||
| ├── installation.md # Installation guide | ||
| ├── getting-started.md # Quick start tutorial | ||
| ├── configuration.md # Configuration reference | ||
| ├── secrets-management.md # Secrets and API keys | ||
| ├── crawlers/ # Crawler documentation | ||
| │ ├── index.md # Crawlers overview | ||
| │ ├── website.md # Website crawler | ||
| │ ├── rss.md # RSS crawler | ||
| │ └── ... # Other crawlers | ||
| ├── features/ # Feature documentation | ||
| │ ├── document-processing.md | ||
| │ ├── table-extraction.md | ||
| │ └── ... | ||
| ├── deployment/ # Deployment guides | ||
| │ ├── docker.md | ||
| │ ├── render.md | ||
| │ └── ... | ||
| ├── advanced/ # Advanced topics | ||
| │ ├── custom-crawler.md | ||
| │ ├── saml-auth.md | ||
| │ └── ... | ||
| └── contributing.md # Contributing guide | ||
| ``` | ||
|
|
||
| ## Contributing to Documentation | ||
|
|
||
| 1. Edit markdown files in the `docs/` directory | ||
| 2. Preview changes with `mkdocs serve` | ||
| 3. Commit and push to trigger automatic deployment | ||
|
|
||
| ## Navigation | ||
|
|
||
| Navigation is configured in `mkdocs.yml` under the `nav:` section. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| # Documentation Page | ||
|
|
||
| *This page is under construction. Check back soon for detailed documentation.* | ||
|
|
||
| ## Resources | ||
|
|
||
| - [Home](../index.md) | ||
| - [Getting Started](../getting-started.md) | ||
| - [Configuration Reference](../configuration.md) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| # Documentation Page | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's please remove all pages that are under constructions, or add real content to them. Chukning can actually be useful - you can document using chunking directly with the platform, or using docling chunking or unstructured chunking. |
||
|
|
||
| *This page is under construction. Check back soon for detailed documentation.* | ||
|
|
||
| ## Resources | ||
|
|
||
| - [Home](../index.md) | ||
| - [Getting Started](../getting-started.md) | ||
| - [Configuration Reference](../configuration.md) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| # Documentation Page | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove if not used |
||
|
|
||
| *This page is under construction. Check back soon for detailed documentation.* | ||
|
|
||
| ## Resources | ||
|
|
||
| - [Home](../index.md) | ||
| - [Getting Started](../getting-started.md) | ||
| - [Configuration Reference](../configuration.md) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| # Documentation Page | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this a separate page? Shoyld be part of "chunking", no? |
||
|
|
||
| *This page is under construction. Check back soon for detailed documentation.* | ||
|
|
||
| ## Resources | ||
|
|
||
| - [Home](../index.md) | ||
| - [Getting Started](../getting-started.md) | ||
| - [Configuration Reference](../configuration.md) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| # Documentation Page | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please add proper content here. This one should be documented IMO. |
||
|
|
||
| *This page is under construction. Check back soon for detailed documentation.* | ||
|
|
||
| ## Resources | ||
|
|
||
| - [Home](../index.md) | ||
| - [Getting Started](../getting-started.md) | ||
| - [Configuration Reference](../configuration.md) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,142 @@ | ||
| # Custom Crawler Setup Guide | ||
|
|
||
| This guide explains how to use custom or private crawler files with vectara-ingest without committing them to the repository. | ||
|
|
||
| ## Overview | ||
|
|
||
| The `crawler_file` configuration option allows you to specify a custom crawler file that will be automatically copied to the `crawlers/` directory before the Docker image is built. This is useful for: | ||
|
|
||
| - Private crawlers that should not be committed to version control | ||
| - Organization-specific crawlers | ||
| - Testing new crawlers without modifying the repository | ||
|
|
||
| ## Setup Instructions | ||
|
|
||
| ### Step 1: Create Your Custom Crawler | ||
|
|
||
| Create your custom crawler Python file anywhere on your system. The crawler should follow the standard vectara-ingest crawler structure. | ||
|
|
||
| Example: `/home/myuser/my_custom_crawler.py` | ||
|
|
||
| ```python | ||
| from core.crawler import Crawler | ||
|
|
||
| class MyCustomCrawler(Crawler): | ||
| def __init__(self, cfg, endpoint, corpus_key, api_key): | ||
| super().__init__(cfg, endpoint, corpus_key, api_key) | ||
| # Your initialization code | ||
|
|
||
| def crawl(self): | ||
| # Your crawling logic | ||
| pass | ||
| ``` | ||
|
|
||
| ### Step 2: Add crawler_file to Your Configuration | ||
|
|
||
| In your configuration YAML file, add the `crawler_file` parameter under the `vectara` section: | ||
|
|
||
| ```yaml | ||
| vectara: | ||
| corpus_key: my_corpus | ||
| reindex: true | ||
| create_corpus: false | ||
|
|
||
| # Path to your custom crawler file | ||
| crawler_file: /home/myuser/my_custom_crawler.py | ||
|
|
||
| crawling: | ||
| # The crawler_type should match your crawler's naming convention | ||
| # For my_custom_crawler.py, use: my_custom | ||
| crawler_type: my_custom | ||
|
|
||
| my_custom_crawler: | ||
| # Your crawler-specific configuration | ||
| # ... | ||
| ``` | ||
|
|
||
| ### Step 3: Run the Ingest Script | ||
|
|
||
| Run the ingest script as usual: | ||
|
|
||
| ```bash | ||
| sh run.sh config/my-config.yaml default | ||
| ``` | ||
|
|
||
| The `run.sh` script will: | ||
| 1. Read the `crawler_file` path from your configuration | ||
| 2. Verify the file exists | ||
| 3. Copy it to the `crawlers/` directory | ||
| 4. Build the Docker image with your custom crawler included | ||
| 5. Run the crawler | ||
|
|
||
| ## Configuration Details | ||
|
|
||
| ### crawler_file Parameter | ||
|
|
||
| - **Location**: Under the `vectara` section in your config YAML | ||
| - **Type**: String (absolute or relative path to Python file) | ||
| - **Required**: No (only needed if using a custom crawler) | ||
| - **Example**: `crawler_file: /path/to/my_crawler.py` | ||
|
|
||
| ### Naming Convention | ||
|
|
||
| The `crawler_type` should match the crawler class name pattern: | ||
|
|
||
| - If your file is `my_custom_crawler.py` with class `MyCustomCrawler` | ||
| - Use `crawler_type: my_custom` | ||
| - Add configuration section named `my_custom_crawler` | ||
|
|
||
| ## Error Handling | ||
|
|
||
| If the custom crawler file is not found, the script will exit with an error: | ||
|
|
||
| ``` | ||
| Error: Custom crawler file not found at '/path/to/crawler.py' | ||
| ``` | ||
|
|
||
| Make sure: | ||
| - The file path is correct | ||
| - The file exists and is readable | ||
| - You have proper permissions to access the file | ||
|
|
||
| ## Git and Version Control | ||
|
|
||
| Custom crawler files copied to the `crawlers/` directory are automatically excluded from git commits through patterns in `.gitignore`: | ||
|
|
||
| - `crawlers/*_custom_crawler.py` | ||
| - `crawlers/custom_*.py` | ||
|
|
||
| To ensure your custom crawler is not accidentally committed: | ||
| 1. Name your crawler file with `custom` prefix or suffix | ||
| 2. Or keep it outside the repository and only reference it via `crawler_file` | ||
|
|
||
| ## Example Configuration | ||
|
|
||
| Complete example for a custom crawler: | ||
|
|
||
| ```yaml | ||
| vectara: | ||
| corpus_key: proprietary_data | ||
| reindex: true | ||
| create_corpus: false | ||
| crawler_file: /home/user/proprietary_crawler.py | ||
|
|
||
| crawling: | ||
| crawler_type: proprietary | ||
|
|
||
| proprietary_crawler: | ||
| # Your custom crawler configuration | ||
| api_endpoint: https://internal.company.com/api | ||
| batch_size: 100 | ||
| ``` | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| **Issue**: Script exits with "Custom crawler file not found" | ||
| - **Solution**: Verify the file path is correct and the file exists | ||
|
|
||
| **Issue**: Crawler not being recognized | ||
| - **Solution**: Ensure `crawler_type` matches your crawler class naming convention | ||
|
|
||
| **Issue**: Custom crawler appears in git status | ||
| - **Solution**: Rename the file to include `custom` in the filename or add it to `.gitignore` |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| # Documentation Page | ||
|
|
||
| *This page is under construction. Check back soon for detailed documentation.* | ||
|
|
||
| ## Resources | ||
|
|
||
| - [Home](../index.md) | ||
| - [Getting Started](../getting-started.md) | ||
| - [Configuration Reference](../configuration.md) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| # Documentation Page | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is this one supposed to document? Either remove if it's some other place, or add proper content pls. |
||
|
|
||
| *This page is under construction. Check back soon for detailed documentation.* | ||
|
|
||
| ## Resources | ||
|
|
||
| - [Home](../index.md) | ||
| - [Getting Started](../getting-started.md) | ||
| - [Configuration Reference](../configuration.md) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for? Why do we need an API reference page in the docs?