Ritwika Kancharla ritwikareddykancharla

Hi, I’m Ritwika — Software Engineer for AI Infrastructure 🛡️🤖

🛠️ I build high-reliability infrastructure for AI systems, focusing on the hardest engineering challenges: low-latency media transport, inference observability, distributed storage, and structured reasoning pipelines.

My work sits at the intersection of Site Reliability Engineering (SRE) and ML Systems. I prioritize observability over black-box monitoring, correctness over convenience, and performance over abstraction.

These projects demonstrate specific skills for scaling AI:

Reliability: Monitoring the "token path" and GPU behavior.
Networking: Building real-time media infrastructure (WebRTC/SFU).
Storage: Engineering database internals from scratch.
Data: Structuring knowledge for complex reasoning.

🧪 Recent Systems Projects

🕸️ graph-enhanced-rag

Knowledge Graph Construction Engine (Knowledge Innovation)

Standard RAG struggles with multi-hop reasoning. This backend system ingests unstructured documents and autonomously builds a Knowledge Graph inside Postgres to capture relationships between entities.

Extraction: Uses an LLM to extract entities and relations (Server A → depends_on → Database B) from raw text.
Storage: Utilizes Recursive CTEs in Postgres to traverse the graph efficiently without a separate graph DB.
Querying: An API that answers questions like "If Node X goes down, what features are impacted?" by traversing the graph structure.

Tech Stack: Python (FastAPI) Postgres Recursive CTEs Pydantic

🤖 agentic-ops-platform

Reliable framework for autonomous operations (Knowledge Innovation)

Internal operations (support, integrity) are often manual and brittle. This platform uses a state-machine-driven approach to orchestrate AI agents for robust task execution.

Architecture: Specialized agents (Classifier, ToolRunner, Escalator) managed by a central Orchestrator.
Safety: Every agent "thought" and "action" is logged to Postgres for compliance and debugging.
Tooling: Agents can safely execute backend functions (e.g., refund_user, reset_api_key) via a defined interface.

Tech Stack: Python Celery LangChain Postgres

📡 token-path-observability

Observability stack for LLM inference (AI Reliability)

Standard web metrics (CPU/RAM) fail for LLMs. This project provides a dedicated monitoring dashboard for the "token path"—the lifecycle of a prompt from request to final token generation.

Metrics: Tracks Time-to-First-Token (TTFT) and Inter-Token Latency (ITL) to diagnose user-perceived lag.
Accelerator Awareness: Monitors GPU VRAM vs. Compute utilization to identify memory-bound bottlenecks.
Implementation: Prometheus exporters for vLLM/TGI, custom Grafana dashboards.
Impact: Enables proactive detection of "tail latencies" in model serving.

Tech Stack: Python Prometheus Grafana Docker NVIDIA SMI

🎙️ browser-to-llm-voice-pipeline

Real-time WebRTC interface for LLMs (Realtime WebRTC)

Voice interfaces to AI usually rely on slow HTTP APIs (Record -> Upload -> Process -> Download -> Play). This project implements a full-stack WebRTC pipeline for real-time duplex streaming.

Low Latency: Streams audio directly from browser to inference server and back, cutting latency from seconds to milliseconds.
Signaling: Custom WebSocket signaling server handling SDP offer/answer exchanges and ICE candidates.
VAD: Implements Voice Activity Detection to allow natural interruptions during AI speech.

Tech Stack: React Node.js/Go WebRTC WebSocket Live Demo

🖥️ realtime-sfu-engine

Custom Selective Forwarding Unit in Go (Realtime WebRTC)

Most engineers use black-box SDKs (Twilio/Agora). I built the core media server infrastructure from scratch to understand how to move audio/video data instantly.

Core Logic: Built a minimalist Selective Forwarding Unit (SFU) using the Pion WebRTC library. It routes video streams between users without decoding them (preserving CPU).
Adaptive Streaming: Implemented Simulcast routing—the server detects if a user's network slows down and automatically switches them to a lower-resolution stream.
Signaling: Custom WebSocket signaling server for SDP offer/answer exchange.

Tech Stack: Go Pion WebRTC WebSockets Docker

🌳 lsm-storage-engine

Custom Key-Value Store with LSM Tree (Online Storage)

To understand how databases scale, I built one. This is a from-scratch implementation of a Log-Structured Merge-tree (LSM), the architecture used by RocksDB and Cassandra.

Write Path: Implements an in-memory MemTable (Skip List) and persistent SSTables for high-throughput writes.
Crash Recovery: Implements a Write-Ahead Log (WAL) to ensure durability during failures.
Read Optimization: Implements Bloom Filters to reduce disk I/O for non-existent keys.
Compaction: Background process merging SSTables to reclaim space and speed up reads.

Tech Stack: C++ (or Rust) Posix Threads File I/O

📌 Notes

This portfolio is focused on systems fundamentals.
Each project targets a specific bottleneck in modern AI companies: inference latency, media transport, data durability, and knowledge structure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly