Multimodal AI Agent for Enterprise Conversations
NovaChat is an intelligent conversation analysis platform that transforms enterprise call transcripts into actionable insights through AI-powered analysis, interactive visualizations, and comprehensive reporting.
- Chain-of-Thought Reasoning: Advanced LLM router that understands context and conversation history
- Multi-Tool Selection: Automatically chooses the best analysis tool based on query intent
- Context Awareness: References previous responses and conversation flow
- Speaker Activity Analysis: Track speaking time, turn counts, and engagement patterns
- Sentiment Trend Analysis: Monitor emotional progression throughout conversations
- Topic Categorization: Automatically categorize conversation segments by themes
- Custom Chart Generation: Create bar charts, pie charts, and line graphs on demand
- PDF Report Generation: Professional downloadable reports with detailed analysis
- Audio Summaries: Text-to-speech narration of key insights
- Interactive Dashboards: Real-time conversation profiling and metrics
- Multi-format Support: Process MP3, WAV, M4A audio files
- Automatic Transcription: Convert speech to text with speaker identification
- Real-time Analysis: Instant insights from uploaded conversations
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Streamlit UI │───▶│ Query Router │───▶│ Analysis Tools │
│ │ │ (Chain-of-Thought)│ │ │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Audio Processor │ │ ChromaDB │ │ Google Gemini │
│ │ │ Vector Store │ │ LLM │
└─────────────────┘ └──────────────────┘ └─────────────────┘
- Python 3.8+
- Google Gemini API Key
- ffmpeg (for audio processing)
-
Clone the repository
git clone https://github.com/yourusername/NovaChat.git cd NovaChat -
Install dependencies
pip install -r requirements.txt
-
Set up environment variables
# Create .env file echo "GOOGLE_API_KEY=your_gemini_api_key_here" > .env
-
Run the application
streamlit run app.py
-
Open in browser Navigate to
http://localhost:8501
NovaChat/
├── app.py # Main Streamlit application
├── query_engine.py # Intelligent query routing system
├── outputs.py # Analysis tools and LLM integration
├── pdf_builder.py # PDF report generation
├── normalization.py # Audio processing and transcription
├── ingest.py # Vector database ingestion
├── profiler.py # Conversation profiling
├── transcript_parser.py # JSON transcript parsing
├── config.py # Configuration settings
├── requirements.txt # Python dependencies
├── .env # Environment variables
├── chroma_db/ # Vector database storage
├── temp_uploads/ # Temporary file storage
└── README.md # This file
Smart routing system that uses chain-of-thought reasoning to select the best analysis tool:
# Example: Intelligent routing based on query intent
query = "Show me how Michelle responded to fee discussions"
# Router analyzes: intent → context → tool selection → executionSpecialized functions for different types of analysis:
build_holistic_analysis_chart()- Complex multi-dimensional analysisbuild_speaker_activity_chart()- Speaker engagement metricsbuild_sentiment_trend_chart()- Emotional progression trackingbuild_pdf_response()- Comprehensive report generation
Handles audio file conversion and transcription:
# Automatic transcription with speaker identification
transcribe_audio("conversation.mp3") → structured_transcript.jsonUpload audio → "Summarize the main discussion points"
"Create a breakdown chart showing time spent on fees vs. program details"
"Analyze Michelle's engagement level throughout the call"
"Generate a sentiment trend for Speaker0"
"Generate a comprehensive report analyzing Michelle's concerns and responses"
"Create a PDF summary of this sales conversation"
"Who talked the most and for how long?"
"Show me the number of times each speaker contributed"
"Compare speaking patterns between participants"
GOOGLE_API_KEY=your_gemini_api_key_here
CHROMA_DB_PATH=./chroma_db
COLLECTION_NAME=conversation_segments
EMBED_MODEL_NAME=all-MiniLM-L6-v2MODEL_ID = "gemini-2.5-flash" # Google Gemini model
EMBED_MODEL_NAME = "all-MiniLM-L6-v2" # Sentence transformer
CHROMA_DB_PATH = "./chroma_db"- MP3, WAV, M4A, FLAC
- Automatic conversion to WAV for processing
- JSON format with speaker identification
- Structured conversation segments with timestamps
- Interactive charts (matplotlib)
- PDF reports (comprehensive analysis)
- MP3 audio summaries (text-to-speech)
- Google Gemini 2.5 Flash: Primary LLM for analysis and reasoning
- Sentence Transformers: Vector embeddings for semantic search
- CardiffNLP RoBERTa: Sentiment analysis pipeline
- Google Text-to-Speech: Audio summary generation
Step 1: Analyze query intent (chart vs. report vs. summary)
Step 2: Consider conversation history and context
Step 3: Reference previous responses if applicable
Step 4: Select optimal tool for task
Step 5: Execute with full context awareness
- Automatic detection of conversation type (Sales, Support, etc.)
- Key entity extraction (names, products, companies)
- Context preservation across queries
- ChromaDB integration for semantic search
- Contextual segment retrieval
- Relevant information extraction for queries
-
API Key Errors
# Verify .env file exists and contains valid API key cat .env -
Audio Processing Fails
# Install ffmpeg # Ubuntu/Debian: sudo apt install ffmpeg # macOS: brew install ffmpeg # Windows: Download from ffmpeg.org
-
Dependencies Missing
pip install --upgrade -r requirements.txt
- Project Presentation (PPT): View Google Slides
- Demo Video: Watch Demo
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Google Gemini for advanced language model capabilities
- ChromaDB for vector database functionality
- Streamlit for the interactive web interface
- HuggingFace for pre-trained models and pipelines
Built with ❤️ for enterprise conversation intelligence
For questions or support, please open an issue or contact the development team.