Skip to content
View ritwikareddykancharla's full-sized avatar
🏠
Working from home
🏠
Working from home

Block or report ritwikareddykancharla

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Hi, I’m Ritwika — Software Engineer for AI Infrastructure 🛡️🤖

🛠️ I build high-reliability infrastructure for AI systems, focusing on the hardest engineering challenges: low-latency media transport, inference observability, distributed storage, and structured reasoning pipelines.

My work sits at the intersection of Site Reliability Engineering (SRE) and ML Systems. I prioritize observability over black-box monitoring, correctness over convenience, and performance over abstraction.

These projects demonstrate specific skills for scaling AI:

  • Reliability: Monitoring the "token path" and GPU behavior.
  • Networking: Building real-time media infrastructure (WebRTC/SFU).
  • Storage: Engineering database internals from scratch.
  • Data: Structuring knowledge for complex reasoning.

🧪 Recent Systems Projects


Knowledge Graph Construction Engine (Knowledge Innovation)

Standard RAG struggles with multi-hop reasoning. This backend system ingests unstructured documents and autonomously builds a Knowledge Graph inside Postgres to capture relationships between entities.

  • Extraction: Uses an LLM to extract entities and relations (Server Adepends_onDatabase B) from raw text.
  • Storage: Utilizes Recursive CTEs in Postgres to traverse the graph efficiently without a separate graph DB.
  • Querying: An API that answers questions like "If Node X goes down, what features are impacted?" by traversing the graph structure.

Tech Stack: Python (FastAPI) Postgres Recursive CTEs Pydantic


Reliable framework for autonomous operations (Knowledge Innovation)

Internal operations (support, integrity) are often manual and brittle. This platform uses a state-machine-driven approach to orchestrate AI agents for robust task execution.

  • Architecture: Specialized agents (Classifier, ToolRunner, Escalator) managed by a central Orchestrator.
  • Safety: Every agent "thought" and "action" is logged to Postgres for compliance and debugging.
  • Tooling: Agents can safely execute backend functions (e.g., refund_user, reset_api_key) via a defined interface.

Tech Stack: Python Celery LangChain Postgres


Observability stack for LLM inference (AI Reliability)

Standard web metrics (CPU/RAM) fail for LLMs. This project provides a dedicated monitoring dashboard for the "token path"—the lifecycle of a prompt from request to final token generation.

  • Metrics: Tracks Time-to-First-Token (TTFT) and Inter-Token Latency (ITL) to diagnose user-perceived lag.
  • Accelerator Awareness: Monitors GPU VRAM vs. Compute utilization to identify memory-bound bottlenecks.
  • Implementation: Prometheus exporters for vLLM/TGI, custom Grafana dashboards.
  • Impact: Enables proactive detection of "tail latencies" in model serving.

Tech Stack: Python Prometheus Grafana Docker NVIDIA SMI


Real-time WebRTC interface for LLMs (Realtime WebRTC)

Voice interfaces to AI usually rely on slow HTTP APIs (Record -> Upload -> Process -> Download -> Play). This project implements a full-stack WebRTC pipeline for real-time duplex streaming.

  • Low Latency: Streams audio directly from browser to inference server and back, cutting latency from seconds to milliseconds.
  • Signaling: Custom WebSocket signaling server handling SDP offer/answer exchanges and ICE candidates.
  • VAD: Implements Voice Activity Detection to allow natural interruptions during AI speech.

Tech Stack: React Node.js/Go WebRTC WebSocket Live Demo


Custom Selective Forwarding Unit in Go (Realtime WebRTC)

Most engineers use black-box SDKs (Twilio/Agora). I built the core media server infrastructure from scratch to understand how to move audio/video data instantly.

  • Core Logic: Built a minimalist Selective Forwarding Unit (SFU) using the Pion WebRTC library. It routes video streams between users without decoding them (preserving CPU).
  • Adaptive Streaming: Implemented Simulcast routing—the server detects if a user's network slows down and automatically switches them to a lower-resolution stream.
  • Signaling: Custom WebSocket signaling server for SDP offer/answer exchange.

Tech Stack: Go Pion WebRTC WebSockets Docker


Custom Key-Value Store with LSM Tree (Online Storage)

To understand how databases scale, I built one. This is a from-scratch implementation of a Log-Structured Merge-tree (LSM), the architecture used by RocksDB and Cassandra.

  • Write Path: Implements an in-memory MemTable (Skip List) and persistent SSTables for high-throughput writes.
  • Crash Recovery: Implements a Write-Ahead Log (WAL) to ensure durability during failures.
  • Read Optimization: Implements Bloom Filters to reduce disk I/O for non-existent keys.
  • Compaction: Background process merging SSTables to reclaim space and speed up reads.

Tech Stack: C++ (or Rust) Posix Threads File I/O


📌 Notes

This portfolio is focused on systems fundamentals.
Each project targets a specific bottleneck in modern AI companies: inference latency, media transport, data durability, and knowledge structure.

Pinned Loading

  1. agentic-ops-platform agentic-ops-platform Public

    A reliable framework for automating operations using orchestrated AI agents, state machines, and strict audit logging for production safety.

    Python

  2. browser-to-llm-voice-pipeline browser-to-llm-voice-pipeline Public

    A real-time WebRTC implementation for duplex voice streaming with LLMs, featuring custom signaling and Voice Activity Detection (VAD).

    TypeScript

  3. graph-enhanced-rag graph-enhanced-rag Public

    A backend engine that autonomously constructs Knowledge Graphs from unstructured text using LLMs and Postgres Recursive CTEs for multi-hop reasoning.

    Python

  4. lsm-storage-engine lsm-storage-engine Public

    A custom key-value store built on a Log-Structured Merge-tree (LSM) architecture in Rust, featuring MemTables, Bloom Filters, and crash recovery.

  5. realtime-sfu-engine realtime-sfu-engine Public

    A minimalist Selective Forwarding Unit (SFU) written in Go with Pion WebRTC, implementing Simulcast and adaptive bitrate control.

  6. token-path-observability token-path-observability Public

    An observability stack for LLM inference servers (vLLM/TGI), monitoring token path metrics like TTFT, ITL, and GPU memory bottlenecks.

    Python