In modern educational and collaborative environments, manually capturing whiteboard notes interrupts the flow of teaching and often loses crucial context. While automated solutions exist, they heavily rely on cloud-based Computer Vision and Large Language Model (LLM) APIs. This introduces significant latency, incurs recurring API costs, and most importantly, compromises data privacy by sending sensitive classroom or proprietary corporate data to third-party servers. There is a critical need for an entirely offline, high-performance archival system that can read, understand, and structure board contents natively.
This system is architected to run entirely on local edge hardware, eliminating cloud dependency.
- NVIDIA Jetson Nano (4GB): Acts as the primary vision node. It handles high-framerate GStreamer video ingestion, hardware-accelerated spatial triggering (via TensorRT), perspective rectification, and GPU-bound Optical Character Recognition (OCR).
- Local Processing Host: Acts as the semantic reasoning node, running a quantized local LLM for Retrieval-Augmented Generation (RAG).
- Justification & Benefits: Edge computing ensures zero latency during the capture trigger phase, 100% data privacy (no data leaves the local subnet), and maximum utilization of the Maxwell GPU to prevent thermal throttling and CPU bottlenecks.
The system operates on a highly optimized, multi-threaded pipeline:
- Input (GStreamer Queue): A dedicated hardware thread ingests NV12 visual data natively via the CSI camera interface, utilizing a constrained
queue.Queue(maxsize=1)to guarantee zero frame drops. - Preprocessing & Triggering: A YOLOv8n model continuously scans the feed. Instead of passive timers, captures are triggered intelligently when the instructor physically steps out of the frame or performs a gesture tracking intersection.
- Geometry & Contrast: Canny edge detection and homography algorithms dynamically flatten the board to a top-down perspective, while LAB space luminance isolation strips shadows to pronounce text.
- GPU Text Extraction: The rectified image is passed to EasyOCR, executing directly on the Jetson's GPU to extract raw text coordinates via zero-disk roundtrip processing arrays.
- Output (Local RAG): Text is stored in a local SQLite database. A localized Llama 3.1 8B model restructures the messy OCR into clean Markdown and allows users to semantically chat with their notes via a Flask Web UI natively bridging vectors via ChromaDB.
The architecture leverages a hybrid cascade of three distinct models:
- Spatial Trigger: YOLOv8n (Ultralytics). Compiled strictly into a
.enginefile via TensorRT for FP16 precision inference on the Jetson Nano. - OCR Engine: EasyOCR (PyTorch). Replaced legacy Tesseract to allow text recognition to execute entirely on the CUDA cores.
- Semantic Engine: Meta Llama 3.1 (8B). Served locally via Ollama to handle the RAG reasoning and chat interface while strictly avoiding output hallucination.
- YOLOv8n TensorRT Export: Standard PyTorch
.ptmodels cause Out-Of-Memory (OOM) faults on the 4GB Jetson. The model was first exported to.onnxoff-device, then compiled natively on the Jetson usingtrtexec --onnx=yolov8n.onnx --saveEngine=yolov8n.engine --fp16with a mounted 4GB swap file. - OCR Language Weights: Utilizes pre-trained English and Mathematical inference weights provided by the EasyOCR repository.
- LLM Quantization: The Llama model easily fits into edge/consumer paradigms, retaining mathematical context with sub-100ms vector lookups entirely offline.
The system outputs a fully searchable, localized web dashboard featuring structured Markdown notes and a semantic RAG chat interface.
- Inference Time (Trigger): YOLOv8n TensorRT achieves real-time inference (>30 FPS) with negligible VRAM overhead natively bypassing Python overhead via
pycuda. - Performance Comparison (OCR):
- Previous Version (CPU Tesseract): ~4-6 seconds per 720p frame, maxing out CPU to 100%.
- Current Version (GPU EasyOCR): ~1.2 seconds per frame, freeing CPU for Flask web serving and seamless Database insertions.
| Aspect | Tesseract | EasyOCR | Why EasyOCR Was Chosen |
|---|---|---|---|
| Recognition Accuracy | Strong on clean, structured text; accuracy drops on noisy or complex board images | Better on noisy, distorted, handwritten, or mixed-text scenes | The project targets real whiteboard captures — robustness on imperfect input matters more than clean-document OCR |
| Output Quality for RAG | More likely to produce garbled text on messy input, reducing reliability for downstream retrieval | Produces more usable text in uncontrolled scenes, improving semantic search and chat answers | Higher quality OCR output means fewer incorrect facts entering the Llama 3.1 RAG pipeline |
| GPU Acceleration | Primarily CPU-bound; competes with the rest of the pipeline for limited ARM CPU time | Native gpu=True support; runs inference directly on CUDA cores |
The system already leverages GPU acceleration (TensorRT, PyTorch) — OCR must fit the same execution model |
| Reliability for This App | Ideal for neat scanned documents; less suited for live board captures | Purpose-built for captured, imperfect, real-world classroom content | The app prioritizes practical extraction quality over document-grade OCR purity |
- NVIDIA Jetson Nano (4GB) running JetPack 4.6.x (L4T r32.7.1)
- CSI Camera Module (e.g., IMX219)
- Local Ollama Instance installed (Optional host parameter based on memory)
-
Clone the Repository:
git clone https://github.com/Varun-39/jetson-smart-board-recorder.git cd jetson-smart-board-recorder -
Initialize SQLite Database Backend:
python3 db_setup.py
-
Establish Dependency Tree:
pip3 install -r requirements.txt
(Note: Best hardware abstraction is achieved using the native
nvcr.io/nvidia/l4t-ml:r32.7.1-py3docker container preventing manual cv2 compilation faults.) -
Compile TensorRT Engine (Strict Hardware Execution): Execute the bash script to compile the underlying ONNX YOLO parameters natively to your Maxwell GPU. Mandatory: Map 4GB+ Swap space prior to execution.
./export_trt.sh
-
Deploy Pipeline: Launch the overarching state machine via bare metal or Compose orchestrator.
python3 main.py # OR sudo docker-compose up -d -
Serve RAG Backend: Spin up the LLaMA container listening natively on
localhost:11434.ollama pull llama3.1:8b ollama serve
Navigate any browser to http://localhost:5000 (or your Jetson's specific local IP).
- Perform lightning-fast fuzzy searches targeting historical lectures.
- Convert sessions down to universal Unicode PDFs instantly.
- Ask the local Ollama LLM questions exclusively mapped to the active vectors recorded in real-time by the Jetson Smart Board Engine!

