A sophisticated, AI-powered outbound lead generation system built with LangGraph 1.0.0 that automates the entire process from prospect discovery to personalized outreach and performance optimization β with ChromaDB memory persistence that enables continuous learning and prevents duplicate outreach.
This project implements an intelligent, self-improving lead generation system that:
- β Discovers and enriches B2B prospects from multiple sources
- β Generates personalized, AI-powered email outreach
- β Tracks engagement (opens, clicks, replies) in real-time
- β Learns from historical performance to improve future campaigns
- β Prevents duplicate outreach using memory persistence
- β Provides human-in-the-loop approval gates for quality control
- π Intelligent Prospect Discovery: Multi-source lead enrichment using Apollo.io, Clay, and Clearbit
- βοΈ Personalized Email Outreach: AI-generated, context-aware email campaigns with GPT-4o-mini
- π Response Tracking: Real-time monitoring of opens, clicks, and replies via SendGrid
- π§ Continuous Learning: FeedbackTrainer analyzes performance and suggests data-driven improvements
- πΎ Memory Persistence: ChromaDB-based storage prevents duplicate outreach and enables historical learning
- π Human-in-the-Loop: Manual approval gates for lead selection and email content
- π Performance Analytics: Detailed metrics and trend analysis across campaigns
- π Organized Export: Prospects automatically saved to organized folders (CSV/JSON/Excel) with timestamps
The system uses LangGraph 1.0.0 to orchestrate a multi-agent workflow with ChromaDB memory persistence:
ProspectSearchAgent β OutreachExecutorAgent β ResponseTrackerAgent β FeedbackTrainerAgent
β β β β
[Find Leads] [Send Emails] [Track Responses] [Optimize]
β β β β
Human Approval Human Approval Auto (with memory) Store Recommendations
β β β β
Deduplicate Log Interactions Track Engagement Historical Learning
ChromaDB-based memory system with 4 collections:
WorkflowMemory (ChromaDB)
βββ leads_collection # All discovered and contacted leads (deduplication)
βββ campaigns_collection # Campaign execution history and metrics
βββ interactions_collection # Email interactions (opened, clicked, replied)
βββ recommendations_collection # AI-generated improvement suggestions
Memory enables:
- β Lead deduplication (never contact the same person twice)
- β Performance tracking (monitor metrics across campaigns)
- β Historical learning (FeedbackTrainer uses past data to improve)
- β Full audit trail (complete interaction history)
See docs/MEMORY.md for complete memory documentation.
4 specialized agents working collaboratively:
-
ProspectSearchAgent: Discovers prospects using multiple APIs
- Sources: Apollo.io, Clay, Clearbit
- Memory integration: Automatically deduplicates leads before returning
- Output: Enriched lead profiles
-
OutreachExecutorAgent: Generates and sends personalized emails
- AI-powered personalization using GPT-4o-mini
- SendGrid delivery with rate limiting
- Memory integration: Logs "email_sent" interactions
-
ResponseTrackerAgent: Monitors email engagement
- Tracks opens, clicks, replies in real-time
- Calculates campaign metrics
- Memory integration: Logs all engagement interactions
-
FeedbackTrainerAgent: Analyzes performance and optimizes
- Compares current performance to historical trends
- Learns from best-performing campaigns
- Memory integration: Retrieves trends, stores recommendations
- Python 3.9+
- API Keys for:
- OpenAI (required for AI agents)
- Apollo.io (for prospect search - free tier available)
- Clay (optional - free trial)
- Clearbit or PeopleDataLabs (optional - for enrichment)
- SendGrid (for email delivery - free tier available)
- Google Cloud (optional - for Sheets logging)
# Navigate to project directory
cd x:\Project\ProspectToLead
# Create virtual environment
python -m venv venv
# Activate virtual environment
.\venv\Scripts\Activate.ps1
# Install dependencies
pip install -r requirements.txt# Copy example environment file
copy .env.example .env
# Edit .env with your API keys
notepad .envMinimum required configuration:
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-4o-mini
APOLLO_API_KEY=your_apollo_api_key_here
SENDGRID_API_KEY=your_sendgrid_api_key_here
SENDGRID_FROM_EMAIL=your_email@company.com# Execute the full workflow
python langgraph_builder.py
# With custom config
python langgraph_builder.py --config ./workflow.json
# Generate workflow visualization
python langgraph_builder.py --visualizeAfter the workflow runs, prospects are automatically stored in memory and can be viewed/exported:
# View prospects in console
python scripts\view_prospects.py
# Export to CSV (saved to prospects/csv/ folder)
python scripts\view_prospects.py --csv
# Export to JSON (saved to prospects/json/ folder)
python scripts\view_prospects.py --json
# Export to Excel (saved to prospects/xlsx/ folder)
python scripts\view_prospects.py --xlsx
# Export all formats at once
python scripts\view_prospects.py --csv --json --xlsxExported files are organized by format:
prospects/csv/prospects_20251019_143025.csv- Spreadsheet formatprospects/json/prospects_20251019_143025.json- Structured data formatprospects/xlsx/prospects_20251019_143025.xlsx- Excel format with formatting
Check campaign history and memory statistics:
# View all memory statistics
python scripts\view_memory.py --all
# View specific information
python scripts\view_memory.py --stats # Overall statistics
python scripts\view_memory.py --campaigns # Recent campaigns
python scripts\view_memory.py --trends # Performance trends
# Check if a lead exists
python scripts\view_memory.py --check-lead john.doe@company.comRemove data from memory when needed:
# Show current memory statistics
python scripts\clear_memory.py --stats
# Clear only prospects/leads
python scripts\clear_memory.py --leads
# Clear all data (with confirmation)
python scripts\clear_memory.py --allProvide feedback on campaign performance and approve AI-generated recommendations:
# Interactive menu
python scripts\interactive_feedback.py
# List all campaigns
python scripts\interactive_feedback.py --list-campaigns
# Analyze a specific campaign with interactive prompts
python scripts\interactive_feedback.py --analyze campaign_20251019_143025
# View pending recommendations
python scripts\interactive_feedback.py --view-pending
# Review and approve/reject pending recommendations
python scripts\interactive_feedback.py --approve-pending
# Force clear without confirmation (use with caution!)
python scripts\clear_memory.py --all --forcepython scripts\view_prospects.py --csv --json
python scripts\clear_memory.py --leadsSee docs/CLEAR_MEMORY.md for detailed documentation.
ProspectToLead/
βββ agents/
β βββ __init__.py # Agent factory
β βββ base_agent.py # Base agent with ReAct pattern
β βββ prospect_search_agent.py # Prospect discovery with memory deduplication
β βββ enrichment_agent.py # Data enrichment
β βββ scoring_agent.py # Lead scoring
β βββ outreach_content_agent.py # Content generation
β βββ outreach_executor_agent.py # Email sending with interaction logging
β βββ response_tracker_agent.py # Response tracking with memory
β βββ feedback_trainer_agent.py # Performance analysis with historical learning
βββ utils/
β βββ config.py # Configuration loader
β βββ logger.py # Logging utilities
β βββ llm.py # OpenAI GPT-4o-mini integration
β βββ memory.py # ChromaDB memory persistence layer
β βββ tools.py # API client integrations
βββ scripts/
β βββ check_config.py # Configuration validation
β βββ view_memory.py # Memory statistics and querying
β βββ view_prospects.py # View and export prospects
β βββ clear_memory.py # Clear data from memory
βββ docs/
β βββ MEMORY.md # Memory system documentation
β βββ CLEAR_MEMORY.md # Clear memory documentation
β βββ BUGFIX_UNICODE.md # Unicode encoding fix documentation
β βββ PROJECT_SUMMARY.md # Project overview
β βββ QUICKSTART.md # Quick start guide
β βββ SETUP.md # Detailed setup instructions
βββ tests/
β βββ conftest.py # Pytest fixtures
β βββ test_agents.py # Agent unit tests
β βββ test_workflow.py # End-to-end workflow tests
βββ data/
β βββ chroma/ # ChromaDB memory storage (auto-created)
βββ logs/
β βββ workflow.log # Workflow execution logs
βββ prospects/ # Exported prospect data (organized by format)
β βββ README.md # Prospects folder documentation
β βββ csv/ # CSV exports with timestamps
β βββ json/ # JSON exports with timestamps
β βββ xlsx/ # Excel exports with timestamps
βββ langgraph_builder.py # Main workflow builder
βββ workflow.json # Workflow configuration
βββ workflow_simple.json # Simplified workflow config
βββ demo.py # Example workflow execution
βββ requirements.txt # Python dependencies
βββ .env.example # Example environment config
βββ .gitignore # Git ignore rules
βββ README.md # This file
The workflow.json file defines the entire workflow. Each step includes:
- id: Unique identifier
- agent: Agent class name
- inputs: Input parameters (can reference previous steps)
- instructions: Natural language instructions for the agent
- tools: API configurations
- output_schema: Expected output structure
- next: Next step in the workflow (or null for final step)
{
"id": "prospect_search",
"agent": "ProspectSearchAgent",
"inputs": {
"icp": {
"industry": "SaaS",
"location": "USA",
"employee_count": { "min": 100, "max": 1000 }
}
},
"instructions": "Search for companies matching ICP criteria",
"tools": [
{
"name": "ApolloAPI",
"config": { "api_key": "{{APOLLO_API_KEY}}" }
}
],
"next": "enrichment"
}Steps can reference outputs from previous steps:
{
"inputs": {
"leads": "{{prospect_search.output.leads}}",
"workflow_config": "{{workflow}}"
}
}- Sign up at https://platform.openai.com
- Create an API key
- Add to
.env:OPENAI_API_KEY=sk-...
- Sign up at https://www.apollo.io (free tier available)
- Get API key from Settings β Integrations
- Add to
.env:APOLLO_API_KEY=...
- Sign up at https://sendgrid.com (free tier: 100 emails/day)
- Create API key with Mail Send permissions
- Verify sender email
- Add to
.env:SENDGRID_API_KEY=SG... SENDGRID_FROM_EMAIL=verified@yourdomain.com
- Create a Google Cloud project
- Enable Google Sheets API
- Create service account and download
credentials.json - Place
credentials.jsonin project root - Create a Google Sheet and share with service account email
- Add Sheet ID to
.env:GOOGLE_SHEET_ID=...
The workflow provides real-time progress updates:
π Starting workflow: OutboundLeadGeneration
β Created agent: ProspectSearchAgent
π€ ProspectSearchAgent reasoning: Analyzing ICP criteria...
β
Completed step: prospect_search (2.3s)
...
π Workflow completed in 45.2s
Complete results are saved to workflow_results.json:
{
"status": "completed",
"outputs": {
"prospect_search": { "leads": [...] },
"enrichment": { "enriched_leads": [...] },
"scoring": { "ranked_leads": [...] },
...
}
}The FeedbackTrainer provides actionable insights:
Campaign Performance Summary
===========================
π§ Total Sent: 20
π Open Rate: 35.0%
π Click Rate: 8.5%
π¬ Reply Rate: 4.2%
π Recommendations:
β’ outreach_content - tone
Open rate is 35%, above average. Continue with current approach.
β’ scoring - min_score_threshold
High engagement suggests we can expand targeting.
Test without sending actual emails:
# In workflow.json, set dry_run: true
{
"id": "send",
"inputs": {
"dry_run": true
}
}Agents automatically use mock data when APIs are unavailable, allowing development without all API keys.
Logs are written to ./logs/workflow.log with detailed execution traces.
- Create agent class in
agents/:
from agents.base_agent import BaseAgent
class MyCustomAgent(BaseAgent):
def _act(self, inputs, reasoning):
# Your logic here
return {"result": "success"}- Register in factory (
agents/__init__.py):
AGENT_CLASSES = {
...
"MyCustomAgent": MyCustomAgent
}- Add to workflow.json:
{
"id": "my_step",
"agent": "MyCustomAgent",
"inputs": {...},
"instructions": "...",
"next": "next_step"
}Edit workflow.json to:
- Change ICP criteria
- Adjust scoring weights
- Modify outreach tone
- Add/remove workflow steps
- Change step sequencing
No code changes required!
Configure delays between API calls:
{
"inputs": {
"send_delay_seconds": 60
}
}Limit prospects per run:
{
"inputs": {
"max_results": 50,
"max_leads_to_contact": 20
}
}The system automatically caches enrichment data in the state object.
Import errors for langgraph/langchain:
pip install --upgrade langgraph langchain langchain-openaiAPI authentication failures:
- Verify API keys in
.env - Check API key permissions/scopes
- Ensure API quotas not exceeded
Google Sheets errors:
- Verify
credentials.jsonexists - Check service account email has edit access to sheet
- Confirm Sheet ID is correct
Email sending failures:
- Verify sender email in SendGrid
- Check SendGrid API key permissions
- Ensure not exceeding rate limits
- Start with dry run to test workflow without sending emails
- Use mock data during development to avoid API costs
- Monitor API quotas to avoid service interruptions
- Review feedback regularly to optimize performance
- Version control your
workflow.jsonconfigurations - Rotate API keys periodically for security
This is a demonstration project. To extend or customize:
- Fork the repository
- Create a feature branch
- Implement changes
- Test thoroughly
- Submit pull request with clear description
This project is provided as-is for demonstration purposes.
For questions about this implementation:
- Email: maskedvirus@owo.family
For more detailed information, see the docs folder:
- Quick Start Guide - Get running in 5 minutes
- Setup Instructions - Detailed setup and API configuration
- Memory System - Complete guide to ChromaDB memory persistence
- System Architecture - In-depth architecture diagrams and explanations
- Project Summary - Complete feature list and deliverables
- Demo Script - Guide for recording demo video
- Bug Fixes - Unicode encoding fix for Windows compatibility
The prospects/ folder contains exported lead data organized by format:
prospects/csv/- CSV files with timestamp (e.g.,prospects_20251019_143025.csv)prospects/json/- JSON files with timestamp (e.g.,prospects_20251019_143025.json)prospects/xlsx/- Excel files with timestamp (e.g.,prospects_20251019_143025.xlsx)
Use python scripts\view_prospects.py --help for export options.
A demo video walkthrough is available showing:
- Complete workflow execution
- Agent reasoning and decision-making
- Performance analysis and recommendations
- Architecture and design choices
Built with: LangGraph, LangChain, OpenAI GPT-4, Python 3.9+
Targets: B2B SaaS companies, $20M-$200M revenue, 100-1000 employees, USA
Approach: AI-assisted development using Cursor and Claude (vibe coding encouraged!)