Document Intelligence Pipeline

A production-ready document AI pipeline combining Render Workflows for orchestration and LlamaCloud for intelligent document processing. Upload any document and watch it get classified, parsed, and structured in real-time.

What This Demo Shows

This repo demonstrates how to build document AI applications using:

Platform	Role
Render Workflows	Orchestrates long-running document processing tasks with automatic retries, timeouts, and monitoring
LlamaCloud	Provides the AI-powered document intelligence: classification, parsing, and structured extraction
Render Postgres	Stores processed documents and extracted data
Render Web Services	Hosts the Express API and serves the real-time UI

Architecture

How It Works

Browser uploads a document to the Express API on Render
Express streams progress via SSE and dispatches work to Render Workflows
Render Workflows executes five tasks, each calling a LlamaCloud API:

Render Workflow Task	LlamaCloud API	What It Does
`upload_to_llamacloud`	Files API	Registers the document and returns a `file_id`
`classify_document`	Classify API	Identifies document type (invoice, contract, form, etc.)
`parse_document`	LlamaParse	Extracts clean markdown and text from 130+ file formats
`extract_fields`	LlamaExtract	Pulls structured fields based on document type
`store_results`	—	Saves everything to Render Postgres

Results stream back to the browser in real-time

Quick Start

Prerequisites

Render account (free tier works)
LlamaCloud account (free tier available)

Deploy

Click Deploy to Render above
You'll be prompted for:
- RENDER_API_KEY — Get one here
- LLAMA_CLOUD_API_KEY — Get one here
Create the Workflow service manually:
- Go to Render Dashboard → New → Workflow
- Connect this repository
- Build command: npm install && npm run build
- Start command: node dist/tasks/index.js
- Name: render-workflows-llamaindex-workflow
- Add env vars: LLAMA_CLOUD_API_KEY, DATABASE_URL (from your Postgres)
Open your web service URL and upload a document!

Features

Feature	Description
Real-time progress	Watch each pipeline stage complete via Server-Sent Events
130+ file formats	LlamaParse handles PDF, DOCX, XLSX, images, HTML, and more
Smart classification	LlamaCloud Classify identifies document types automatically
Structured extraction	LlamaExtract pulls typed fields based on document type
Ephemeral sessions	Each user gets isolated data that auto-deletes (configurable)
Optional search	Enable semantic search with a LlamaCloud pipeline

Configuration

Variable	Where	Description
`RENDER_API_KEY`	Web service	Render API key for dispatching workflow tasks
`LLAMA_CLOUD_API_KEY`	Both services	LlamaCloud API key for document AI
`DATABASE_URL`	Both services	Render Postgres connection string
`LLAMACLOUD_PIPELINE_ID`	Both (optional)	Enable semantic search
`SESSION_LIFETIME_MINUTES`	Web service	Session duration before cleanup (default: 15)
`MAX_UPLOAD_BYTES`	Web service (optional)	Default 3,145,728 (3 MiB), hard-capped at that value. The first pipeline step (`upload_to_llamacloud`) passes the file as base64 in `startTask` arguments. Render Workflows limits per-invocation argument size to 4MB; base64 inflates by ~4/3, so ~3MB raw is the safe maximum. Set lower to tighten the demo.

Upload size limit (important)

The demo sends file bytes to the upload_to_llamacloud task as a base64 string in the workflow task parameters. Per Render’s limits, task arguments are capped at 4MB, so the effective max raw file size is 3 MiB (3,145,728 bytes). The app enforces this in the API, the orchestrator, and the UI. To support larger files in production, store uploads in object storage and pass only a file ID to the task (see Render Workflows).

Privacy & Demo Mode

Warning

For public demos: This app includes a prominent warning against uploading sensitive documents. Session data is deleted from Postgres on expiry, but if LLAMACLOUD_PIPELINE_ID is set, indexed text persists in LlamaCloud.

To run as a privacy-safe demo:

Leave LLAMACLOUD_PIPELINE_ID empty (disables Search/Ask, but classify/parse/extract still work)
Or create a LlamaCloud pipeline you periodically clear

Project Structure

main.ts                      Express API + SSE streaming
pipeline/orchestrator.ts     Dispatches Render Workflow tasks
tasks/
  upload.ts                  → LlamaCloud Files API
  classify.ts                → LlamaCloud Classify API
  parse.ts                   → LlamaParse
  extract.ts                 → LlamaExtract
  store.ts                   → Render Postgres
shared/
  db.ts                      Postgres queries
  llama-client.ts            LlamaCloud SDK client
render.yaml                  Render Blueprint

API Routes

All document routes are session-scoped under /s/{token}/.

Method	Path	Description
`GET`	`/`	Creates session, redirects to `/s/{token}`
`POST`	`/s/{token}/upload`	Upload file, returns SSE progress stream
`POST`	`/s/{token}/upload-url`	Fetch from URL, returns SSE progress stream
`GET`	`/s/{token}/documents`	List documents in session
`GET`	`/s/{token}/documents/:id`	Get single document details
`DELETE`	`/s/{token}/documents/:id`	Delete a document
`POST`	`/s/{token}/search`	Semantic search (requires `LLAMACLOUD_PIPELINE_ID`)
`POST`	`/s/{token}/ask`	RAG retrieval (requires `LLAMACLOUD_PIPELINE_ID`)

Troubleshooting

Problem	Solution
Workflow tasks fail immediately	Ensure `WORKFLOW_SLUG` matches your workflow service name exactly
Database connection errors	Use the Postgres Internal URL, not External
Search returns "not configured"	Set `LLAMACLOUD_PIPELINE_ID` on both services
"Unsupported file type"	Ensure filename has a valid extension (`.pdf`, `.docx`, etc.)
"File too large" / upload over ~3MB	Workflow argument size — use a smaller file or redesign with external storage + file reference
LlamaCloud rate limits	Tasks retry automatically; check your LlamaCloud dashboard

Learn More

Render:

LlamaIndex:

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
pipeline		pipeline
shared		shared
static		static
tasks		tasks
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
file-type-shim.ts		file-type-shim.ts
main.ts		main.ts
package-lock.json		package-lock.json
package.json		package.json
pipeline-stream.ts		pipeline-stream.ts
render.yaml		render.yaml
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document Intelligence Pipeline

What This Demo Shows

Architecture

How It Works

Quick Start

Prerequisites

Deploy

Features

Configuration

Upload size limit (important)

Privacy & Demo Mode

Project Structure

API Routes

Troubleshooting

Learn More

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Document Intelligence Pipeline

What This Demo Shows

Architecture

How It Works

Quick Start

Prerequisites

Deploy

Features

Configuration

Upload size limit (important)

Privacy & Demo Mode

Project Structure

API Routes

Troubleshooting

Learn More

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages