NovaChat 🚀

Multimodal AI Agent for Enterprise Conversations

NovaChat is an intelligent conversation analysis platform that transforms enterprise call transcripts into actionable insights through AI-powered analysis, interactive visualizations, and comprehensive reporting.

🌟 Features

🎯 Intelligent Query Routing

Chain-of-Thought Reasoning: Advanced LLM router that understands context and conversation history
Multi-Tool Selection: Automatically chooses the best analysis tool based on query intent
Context Awareness: References previous responses and conversation flow

📊 Advanced Analytics & Visualizations

Speaker Activity Analysis: Track speaking time, turn counts, and engagement patterns
Sentiment Trend Analysis: Monitor emotional progression throughout conversations
Topic Categorization: Automatically categorize conversation segments by themes
Custom Chart Generation: Create bar charts, pie charts, and line graphs on demand

📄 Comprehensive Reporting

PDF Report Generation: Professional downloadable reports with detailed analysis
Audio Summaries: Text-to-speech narration of key insights
Interactive Dashboards: Real-time conversation profiling and metrics

🔊 Audio Processing

Multi-format Support: Process MP3, WAV, M4A audio files
Automatic Transcription: Convert speech to text with speaker identification
Real-time Analysis: Instant insights from uploaded conversations

🏗️ Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Streamlit UI  │───▶│  Query Router    │───▶│  Analysis Tools │
│                 │    │ (Chain-of-Thought)│    │                 │
└─────────────────┘    └──────────────────┘    └─────────────────┘
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│ Audio Processor │    │   ChromaDB       │    │  Google Gemini  │
│                 │    │   Vector Store   │    │      LLM        │
└─────────────────┘    └──────────────────┘    └─────────────────┘

🚀 Quick Start

Prerequisites

Python 3.8+
Google Gemini API Key
ffmpeg (for audio processing)

Installation

Clone the repository

git clone https://github.com/yourusername/NovaChat.git
cd NovaChat

Install dependencies
```
pip install -r requirements.txt
```

Set up environment variables

# Create .env file
echo "GOOGLE_API_KEY=your_gemini_api_key_here" > .env

Run the application
```
streamlit run app.py
```
Open in browser Navigate to http://localhost:8501

📁 Project Structure

NovaChat/
├── app.py                      # Main Streamlit application
├── query_engine.py            # Intelligent query routing system
├── outputs.py                 # Analysis tools and LLM integration
├── pdf_builder.py             # PDF report generation
├── normalization.py           # Audio processing and transcription
├── ingest.py                  # Vector database ingestion
├── profiler.py               # Conversation profiling
├── transcript_parser.py       # JSON transcript parsing
├── config.py                 # Configuration settings
├── requirements.txt          # Python dependencies
├── .env                      # Environment variables
├── chroma_db/               # Vector database storage
├── temp_uploads/            # Temporary file storage
└── README.md                # This file

🛠️ Core Components

Query Router (`query_engine.py`)

Smart routing system that uses chain-of-thought reasoning to select the best analysis tool:

# Example: Intelligent routing based on query intent
query = "Show me how Michelle responded to fee discussions"
# Router analyzes: intent → context → tool selection → execution

Analysis Tools (`outputs.py`)

Specialized functions for different types of analysis:

build_holistic_analysis_chart() - Complex multi-dimensional analysis
build_speaker_activity_chart() - Speaker engagement metrics
build_sentiment_trend_chart() - Emotional progression tracking
build_pdf_response() - Comprehensive report generation

Audio Processing (`normalization.py`)

Handles audio file conversion and transcription:

# Automatic transcription with speaker identification
transcribe_audio("conversation.mp3") → structured_transcript.json

💡 Usage Examples

Basic Conversation Analysis

Upload audio → "Summarize the main discussion points"

Advanced Analytics

"Create a breakdown chart showing time spent on fees vs. program details"
"Analyze Michelle's engagement level throughout the call"
"Generate a sentiment trend for Speaker0"

Report Generation

"Generate a comprehensive report analyzing Michelle's concerns and responses"
"Create a PDF summary of this sales conversation"

Speaker Analysis

"Who talked the most and for how long?"
"Show me the number of times each speaker contributed"
"Compare speaking patterns between participants"

🔧 Configuration

Environment Variables (`.env`)

GOOGLE_API_KEY=your_gemini_api_key_here
CHROMA_DB_PATH=./chroma_db
COLLECTION_NAME=conversation_segments
EMBED_MODEL_NAME=all-MiniLM-L6-v2

Model Configuration (`config.py`)

MODEL_ID = "gemini-2.5-flash"  # Google Gemini model
EMBED_MODEL_NAME = "all-MiniLM-L6-v2"  # Sentence transformer
CHROMA_DB_PATH = "./chroma_db"

📊 Supported File Formats

Audio Files

MP3, WAV, M4A, FLAC
Automatic conversion to WAV for processing

Transcript Files

JSON format with speaker identification
Structured conversation segments with timestamps

Output Formats

Interactive charts (matplotlib)
PDF reports (comprehensive analysis)
MP3 audio summaries (text-to-speech)

🤖 AI Models & Integration

Google Gemini 2.5 Flash: Primary LLM for analysis and reasoning
Sentence Transformers: Vector embeddings for semantic search
CardiffNLP RoBERTa: Sentiment analysis pipeline
Google Text-to-Speech: Audio summary generation

🎯 Key Features in Detail

Chain-of-Thought Routing

Step 1: Analyze query intent (chart vs. report vs. summary)
Step 2: Consider conversation history and context
Step 3: Reference previous responses if applicable
Step 4: Select optimal tool for task
Step 5: Execute with full context awareness

Conversation Profiling

Automatic detection of conversation type (Sales, Support, etc.)
Key entity extraction (names, products, companies)
Context preservation across queries

Vector-Based Retrieval

ChromaDB integration for semantic search
Contextual segment retrieval
Relevant information extraction for queries

🚨 Troubleshooting

Common Issues

API Key Errors

# Verify .env file exists and contains valid API key
cat .env

Audio Processing Fails

# Install ffmpeg
# Ubuntu/Debian: sudo apt install ffmpeg
# macOS: brew install ffmpeg
# Windows: Download from ffmpeg.org

Dependencies Missing

pip install --upgrade -r requirements.txt

🖥️ Presentation & Demo

Project Presentation (PPT): View Google Slides
Demo Video: Watch Demo

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Google Gemini for advanced language model capabilities
ChromaDB for vector database functionality
Streamlit for the interactive web interface
HuggingFace for pre-trained models and pipelines

Built with ❤️ for enterprise conversation intelligence

For questions or support, please open an issue or contact the development team.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
app.py		app.py
chroma_embedding.py		chroma_embedding.py
config.py		config.py
ingest.py		ingest.py
normalization.py		normalization.py
outputs.py		outputs.py
pdf_builder.py		pdf_builder.py
profiler.py		profiler.py
query_engine.py		query_engine.py
readme.md		readme.md
requirements.txt		requirements.txt
transcript_parser.py		transcript_parser.py
voice_agent.py		voice_agent.py

Aditya-ad48/NovaChat

Folders and files

Latest commit

History

Repository files navigation