🛠️ I build high-reliability infrastructure for AI systems, focusing on the hardest engineering challenges: low-latency media transport, inference observability, distributed storage, and structured reasoning pipelines.
My work sits at the intersection of Site Reliability Engineering (SRE) and ML Systems. I prioritize observability over black-box monitoring, correctness over convenience, and performance over abstraction.
These projects demonstrate specific skills for scaling AI:
- Reliability: Monitoring the "token path" and GPU behavior.
- Networking: Building real-time media infrastructure (WebRTC/SFU).
- Storage: Engineering database internals from scratch.
- Data: Structuring knowledge for complex reasoning.
Knowledge Graph Construction Engine (Knowledge Innovation)
Standard RAG struggles with multi-hop reasoning. This backend system ingests unstructured documents and autonomously builds a Knowledge Graph inside Postgres to capture relationships between entities.
- Extraction: Uses an LLM to extract entities and relations (
Server A→ depends_on →Database B) from raw text. - Storage: Utilizes Recursive CTEs in Postgres to traverse the graph efficiently without a separate graph DB.
- Querying: An API that answers questions like "If Node X goes down, what features are impacted?" by traversing the graph structure.
Tech Stack: Python (FastAPI) Postgres Recursive CTEs Pydantic
Reliable framework for autonomous operations (Knowledge Innovation)
Internal operations (support, integrity) are often manual and brittle. This platform uses a state-machine-driven approach to orchestrate AI agents for robust task execution.
- Architecture: Specialized agents (
Classifier,ToolRunner,Escalator) managed by a central Orchestrator. - Safety: Every agent "thought" and "action" is logged to Postgres for compliance and debugging.
- Tooling: Agents can safely execute backend functions (e.g.,
refund_user,reset_api_key) via a defined interface.
Tech Stack: Python Celery LangChain Postgres
Observability stack for LLM inference (AI Reliability)
Standard web metrics (CPU/RAM) fail for LLMs. This project provides a dedicated monitoring dashboard for the "token path"—the lifecycle of a prompt from request to final token generation.
- Metrics: Tracks Time-to-First-Token (TTFT) and Inter-Token Latency (ITL) to diagnose user-perceived lag.
- Accelerator Awareness: Monitors GPU VRAM vs. Compute utilization to identify memory-bound bottlenecks.
- Implementation: Prometheus exporters for
vLLM/TGI, custom Grafana dashboards. - Impact: Enables proactive detection of "tail latencies" in model serving.
Tech Stack: Python Prometheus Grafana Docker NVIDIA SMI
Real-time WebRTC interface for LLMs (Realtime WebRTC)
Voice interfaces to AI usually rely on slow HTTP APIs (Record -> Upload -> Process -> Download -> Play). This project implements a full-stack WebRTC pipeline for real-time duplex streaming.
- Low Latency: Streams audio directly from browser to inference server and back, cutting latency from seconds to milliseconds.
- Signaling: Custom WebSocket signaling server handling SDP offer/answer exchanges and ICE candidates.
- VAD: Implements Voice Activity Detection to allow natural interruptions during AI speech.
Tech Stack: React Node.js/Go WebRTC WebSocket Live Demo
Custom Selective Forwarding Unit in Go (Realtime WebRTC)
Most engineers use black-box SDKs (Twilio/Agora). I built the core media server infrastructure from scratch to understand how to move audio/video data instantly.
- Core Logic: Built a minimalist Selective Forwarding Unit (SFU) using the Pion WebRTC library. It routes video streams between users without decoding them (preserving CPU).
- Adaptive Streaming: Implemented Simulcast routing—the server detects if a user's network slows down and automatically switches them to a lower-resolution stream.
- Signaling: Custom WebSocket signaling server for SDP offer/answer exchange.
Tech Stack: Go Pion WebRTC WebSockets Docker
Custom Key-Value Store with LSM Tree (Online Storage)
To understand how databases scale, I built one. This is a from-scratch implementation of a Log-Structured Merge-tree (LSM), the architecture used by RocksDB and Cassandra.
- Write Path: Implements an in-memory MemTable (Skip List) and persistent SSTables for high-throughput writes.
- Crash Recovery: Implements a Write-Ahead Log (WAL) to ensure durability during failures.
- Read Optimization: Implements Bloom Filters to reduce disk I/O for non-existent keys.
- Compaction: Background process merging SSTables to reclaim space and speed up reads.
Tech Stack: C++ (or Rust) Posix Threads File I/O
This portfolio is focused on systems fundamentals.
Each project targets a specific bottleneck in modern AI companies: inference latency, media transport, data durability, and knowledge structure.



