Skip to content

render-examples/render-workflows-llamaindex

Repository files navigation

Document Intelligence Pipeline

A production-ready document AI pipeline combining Render Workflows for orchestration and LlamaCloud for intelligent document processing. Upload any document and watch it get classified, parsed, and structured in real-time.

Deploy to Render

Render Workflows LlamaCloud Discord LlamaIndex Discord

What This Demo Shows

This repo demonstrates how to build document AI applications using:

Platform Role
Render Workflows Orchestrates long-running document processing tasks with automatic retries, timeouts, and monitoring
LlamaCloud Provides the AI-powered document intelligence: classification, parsing, and structured extraction
Render Postgres Stores processed documents and extracted data
Render Web Services Hosts the Express API and serves the real-time UI

Architecture

Architecture

How It Works

  1. Browser uploads a document to the Express API on Render
  2. Express streams progress via SSE and dispatches work to Render Workflows
  3. Render Workflows executes five tasks, each calling a LlamaCloud API:
Render Workflow Task LlamaCloud API What It Does
upload_to_llamacloud Files API Registers the document and returns a file_id
classify_document Classify API Identifies document type (invoice, contract, form, etc.)
parse_document LlamaParse Extracts clean markdown and text from 130+ file formats
extract_fields LlamaExtract Pulls structured fields based on document type
store_results Saves everything to Render Postgres
  1. Results stream back to the browser in real-time

Quick Start

Prerequisites

Deploy

  1. Click Deploy to Render above

  2. You'll be prompted for:

  3. Create the Workflow service manually:

    • Go to Render DashboardNewWorkflow
    • Connect this repository
    • Build command: npm install && npm run build
    • Start command: node dist/tasks/index.js
    • Name: render-workflows-llamaindex-workflow
    • Add env vars: LLAMA_CLOUD_API_KEY, DATABASE_URL (from your Postgres)
  4. Open your web service URL and upload a document!

Features

Feature Description
Real-time progress Watch each pipeline stage complete via Server-Sent Events
130+ file formats LlamaParse handles PDF, DOCX, XLSX, images, HTML, and more
Smart classification LlamaCloud Classify identifies document types automatically
Structured extraction LlamaExtract pulls typed fields based on document type
Ephemeral sessions Each user gets isolated data that auto-deletes (configurable)
Optional search Enable semantic search with a LlamaCloud pipeline

Configuration

Variable Where Description
RENDER_API_KEY Web service Render API key for dispatching workflow tasks
LLAMA_CLOUD_API_KEY Both services LlamaCloud API key for document AI
DATABASE_URL Both services Render Postgres connection string
LLAMACLOUD_PIPELINE_ID Both (optional) Enable semantic search
SESSION_LIFETIME_MINUTES Web service Session duration before cleanup (default: 15)
MAX_UPLOAD_BYTES Web service (optional) Default 3,145,728 (3 MiB), hard-capped at that value. The first pipeline step (upload_to_llamacloud) passes the file as base64 in startTask arguments. Render Workflows limits per-invocation argument size to 4MB; base64 inflates by ~4/3, so ~3MB raw is the safe maximum. Set lower to tighten the demo.

Upload size limit (important)

The demo sends file bytes to the upload_to_llamacloud task as a base64 string in the workflow task parameters. Per Render’s limits, task arguments are capped at 4MB, so the effective max raw file size is 3 MiB (3,145,728 bytes). The app enforces this in the API, the orchestrator, and the UI. To support larger files in production, store uploads in object storage and pass only a file ID to the task (see Render Workflows).

Privacy & Demo Mode

Warning

For public demos: This app includes a prominent warning against uploading sensitive documents. Session data is deleted from Postgres on expiry, but if LLAMACLOUD_PIPELINE_ID is set, indexed text persists in LlamaCloud.

To run as a privacy-safe demo:

  • Leave LLAMACLOUD_PIPELINE_ID empty (disables Search/Ask, but classify/parse/extract still work)
  • Or create a LlamaCloud pipeline you periodically clear

Project Structure

main.ts                      Express API + SSE streaming
pipeline/orchestrator.ts     Dispatches Render Workflow tasks
tasks/
  upload.ts                  → LlamaCloud Files API
  classify.ts                → LlamaCloud Classify API
  parse.ts                   → LlamaParse
  extract.ts                 → LlamaExtract
  store.ts                   → Render Postgres
shared/
  db.ts                      Postgres queries
  llama-client.ts            LlamaCloud SDK client
render.yaml                  Render Blueprint

API Routes

All document routes are session-scoped under /s/{token}/.

Method Path Description
GET / Creates session, redirects to /s/{token}
POST /s/{token}/upload Upload file, returns SSE progress stream
POST /s/{token}/upload-url Fetch from URL, returns SSE progress stream
GET /s/{token}/documents List documents in session
GET /s/{token}/documents/:id Get single document details
DELETE /s/{token}/documents/:id Delete a document
POST /s/{token}/search Semantic search (requires LLAMACLOUD_PIPELINE_ID)
POST /s/{token}/ask RAG retrieval (requires LLAMACLOUD_PIPELINE_ID)

Troubleshooting

Problem Solution
Workflow tasks fail immediately Ensure WORKFLOW_SLUG matches your workflow service name exactly
Database connection errors Use the Postgres Internal URL, not External
Search returns "not configured" Set LLAMACLOUD_PIPELINE_ID on both services
"Unsupported file type" Ensure filename has a valid extension (.pdf, .docx, etc.)
"File too large" / upload over ~3MB Workflow argument size — use a smaller file or redesign with external storage + file reference
LlamaCloud rate limits Tasks retry automatically; check your LlamaCloud dashboard

Learn More

Render:

LlamaIndex:

License

MIT

About

Distributed compute that processes documents.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors